Search | arXiv e-print repository

arXiv:2405.20838 [pdf, other]

einspace: Searching for Neural Architectures from Fundamental Operations

Authors: Linus Ericsson, Miguel Espinosa, Chenhongyi Yang, Antreas Antoniou, Amos Storkey, Shay B. Cohen, Steven McDonagh, Elliot J. Crowley

Abstract: Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. create a shift from convolutional structures to transformers. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shift… ▽ More Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. create a shift from convolutional structures to transformers. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shifts, we need a novel expressive search space design which is built from more fundamental operations. To this end, we introduce einspace, a search space based on a parameterised probabilistic context-free grammar. Our space is versatile, supporting architectures of various sizes and complexities, while also containing diverse network operations which allow it to model convolutions, attention components and more. It contains many existing competitive architectures, and provides flexibility for discovering new ones. Using this search space, we perform experiments to find novel architectures as well as improvements on existing ones on the diverse Unseen NAS datasets. We show that competitive architectures can be obtained by searching from scratch, and we consistently find large improvements when initialising the search with strong baselines. We believe that this work is an important advancement towards a transformative NAS paradigm where search space expressivity and strategic search initialisation play key roles. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: Project page at https://linusericsson.github.io/einspace/

arXiv:2402.07025 [pdf, other]

Generalization Error of Graph Neural Networks in the Mean-field Regime

Authors: Gholamali Aminian, Yixuan He, Gesine Reinert, Łukasz Szpruch, Samuel N. Cohen

Abstract: This work provides a theoretical framework for assessing the generalization error of graph neural networks in the over-parameterized regime, where the number of parameters surpasses the quantity of data points. We explore two widely utilized types of graph neural networks: graph convolutional neural networks and message passing graph neural networks. Prior to this study, existing bounds on the gen… ▽ More This work provides a theoretical framework for assessing the generalization error of graph neural networks in the over-parameterized regime, where the number of parameters surpasses the quantity of data points. We explore two widely utilized types of graph neural networks: graph convolutional neural networks and message passing graph neural networks. Prior to this study, existing bounds on the generalization error in the over-parametrized regime were uninformative, limiting our understanding of over-parameterized network performance. Our novel approach involves deriving upper bounds within the mean-field regime for evaluating the generalization error of these graph neural networks. We establish upper bounds with a convergence rate of $O(1/n)$, where $n$ is the number of graph samples. These upper bounds offer a theoretical assurance of the networks' performance on unseen data in the challenging over-parameterized regime and overall contribute to our understanding of their performance. △ Less

Submitted 1 July, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

Comments: Accepted in ICML 2024

arXiv:2306.11623 [pdf, ps, other]

Mean-field Analysis of Generalization Errors

Authors: Gholamali Aminian, Samuel N. Cohen, Łukasz Szpruch

Abstract: We propose a novel framework for exploring weak and $L_2$ generalization errors of algorithms through the lens of differential calculus on the space of probability measures. Specifically, we consider the KL-regularized empirical risk minimization problem and establish generic conditions under which the generalization error convergence rate, when training on a sample of size $n$, is… ▽ More We propose a novel framework for exploring weak and $L_2$ generalization errors of algorithms through the lens of differential calculus on the space of probability measures. Specifically, we consider the KL-regularized empirical risk minimization problem and establish generic conditions under which the generalization error convergence rate, when training on a sample of size $n$, is $\mathcal{O}(1/n)$. In the context of supervised learning with a one-hidden layer neural network in the mean-field regime, these conditions are reflected in suitable integrability and regularity assumptions on the loss and activation functions. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 49 pages

MSC Class: 62B10; 60F99; 49N80; 46N30

arXiv:2305.10256 [pdf, other]

Nowcasting with signature methods

Authors: Samuel N. Cohen, Silvia Lui, Will Malpass, Giulia Mantoan, Lars Nesheim, Áureo de Paula, Andrew Reeves, Craig Scott, Emma Small, Lingyi Yang

Abstract: Key economic variables are often published with a significant delay of over a month. The nowcasting literature has arisen to provide fast, reliable estimates of delayed economic indicators and is closely related to filtering methods in signal processing. The path signature is a mathematical object which captures geometric properties of sequential data; it naturally handles missing data from mixed… ▽ More Key economic variables are often published with a significant delay of over a month. The nowcasting literature has arisen to provide fast, reliable estimates of delayed economic indicators and is closely related to filtering methods in signal processing. The path signature is a mathematical object which captures geometric properties of sequential data; it naturally handles missing data from mixed frequency and/or irregular sampling -- issues often encountered when merging multiple data sources -- by embedding the observed data in continuous time. Calculating path signatures and using them as features in models has achieved state-of-the-art results in fields such as finance, medicine, and cyber security. We look at the nowcasting problem by applying regression on signatures, a simple linear model on these nonlinear objects that we show subsumes the popular Kalman filter. We quantify the performance via a simulation exercise, and through application to nowcasting US GDP growth, where we see a lower error than a dynamic factor model based on the New York Fed staff nowcasting model. Finally we demonstrate the flexibility of this method by applying regression on signatures to nowcast weekly fuel prices using daily data. Regression on signatures is an easy-to-apply approach that allows great flexibility for data with complex sampling patterns. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: Our implementation of this algorithm in Python to reproduce the results is available at https://github.com/alan-turing-institute/Nowcasting_with_signatures To apply this method for your application, see also SigNow at https://github.com/datasciencecampus/SigNow_ONS_Turing

MSC Class: 60L10; 60L90; 60G35; 62M10; 62M20

arXiv:2207.04711 [pdf, other]

Matching Normalizing Flows and Probability Paths on Manifolds

Authors: Heli Ben-Hamu, Samuel Cohen, Joey Bose, Brandon Amos, Aditya Grover, Maximilian Nickel, Ricky T. Q. Chen, Yaron Lipman

Abstract: Continuous Normalizing Flows (CNFs) are a class of generative models that transform a prior distribution to a model distribution by solving an ordinary differential equation (ODE). We propose to train CNFs on manifolds by minimizing probability path divergence (PPD), a novel family of divergences between the probability density path generated by the CNF and a target probability density path. PPD i… ▽ More Continuous Normalizing Flows (CNFs) are a class of generative models that transform a prior distribution to a model distribution by solving an ordinary differential equation (ODE). We propose to train CNFs on manifolds by minimizing probability path divergence (PPD), a novel family of divergences between the probability density path generated by the CNF and a target probability density path. PPD is formulated using a logarithmic mass conservation formula which is a linear first order partial differential equation relating the log target probabilities and the CNF's defining vector field. PPD has several key benefits over existing methods: it sidesteps the need to solve an ODE per iteration, readily applies to manifold data, scales to high dimensions, and is compatible with a large family of target paths interpolating pure noise and data in finite time. Theoretically, PPD is shown to bound classical probability divergences. Empirically, we show that CNFs learned by minimizing PPD achieve state-of-the-art results in likelihoods and sample quality on existing low-dimensional manifold benchmarks, and is the first example of a generative model to scale to moderately high dimensional manifolds. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: ICML 2022

arXiv:2206.05262 [pdf, other]

Meta Optimal Transport

Authors: Brandon Amos, Samuel Cohen, Giulia Luise, Ievgen Redko

Abstract: We study the use of amortized optimization to predict optimal transport (OT) maps from the input measures, which we call Meta OT. This helps repeatedly solve similar OT problems between different measures by leveraging the knowledge and information present from past problems to rapidly predict and solve new problems. Otherwise, standard methods ignore the knowledge of the past solutions and subopt… ▽ More We study the use of amortized optimization to predict optimal transport (OT) maps from the input measures, which we call Meta OT. This helps repeatedly solve similar OT problems between different measures by leveraging the knowledge and information present from past problems to rapidly predict and solve new problems. Otherwise, standard methods ignore the knowledge of the past solutions and suboptimally re-solve each problem from scratch. We instantiate Meta OT models in discrete and continuous settings between grayscale images, spherical data, classification labels, and color palettes and use them to improve the computational time of standard OT solvers. Our source code is available at http://github.com/facebookresearch/meta-ot △ Less

Submitted 2 June, 2023; v1 submitted 10 June, 2022; originally announced June 2022.

Comments: ICML 2023

arXiv:2205.15991 [pdf, other]

Hedging option books using neural-SDE market models

Authors: Samuel N. Cohen, Christoph Reisinger, Sheng Wang

Abstract: We study the capability of arbitrage-free neural-SDE market models to yield effective strategies for hedging options. In particular, we derive sensitivity-based and minimum-variance-based hedging strategies using these models and examine their performance when applied to various option portfolios using real-world data. Through backtesting analysis over typical and stressed market periods, we show… ▽ More We study the capability of arbitrage-free neural-SDE market models to yield effective strategies for hedging options. In particular, we derive sensitivity-based and minimum-variance-based hedging strategies using these models and examine their performance when applied to various option portfolios using real-world data. Through backtesting analysis over typical and stressed market periods, we show that neural-SDE market models achieve lower hedging errors than Black--Scholes delta and delta-vega hedging consistently over time, and are less sensitive to the tenor choice of hedging instruments. In addition, hedging using market models leads to similar performance to hedging using Heston models, while the former tends to be more robust during stressed market periods. △ Less

Submitted 31 May, 2022; originally announced May 2022.

MSC Class: 91B28; 91B70; 62M45; 62P05

arXiv:2203.17128 [pdf, other]

Neural Q-learning for solving PDEs

Authors: Samuel N. Cohen, Deqing Jiang, Justin Sirignano

Abstract: Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. Our "Q-PDE" algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove… ▽ More Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. Our "Q-PDE" algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove that the neural network approximator for the PDE solution, trained with the Q-PDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential equation (ODE) as the number of hidden units $\rightarrow \infty$. For monotone PDE (i.e. those given by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional ODE, converges in $L^2$ to the PDE solution as the training time $\rightarrow \infty$. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm is a solution of the PDE (not necessarily under the monotone condition). The numerical performance of the Q-PDE algorithm is studied for several elliptic PDEs. △ Less

Submitted 24 June, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

MSC Class: 65N12; 62M45; 37N30; 60H30

arXiv:2202.07148 [pdf, other]

Estimating risks of option books using neural-SDE market models

Authors: Samuel N. Cohen, Christoph Reisinger, Sheng Wang

Abstract: In this paper, we examine the capacity of an arbitrage-free neural-SDE market model to produce realistic scenarios for the joint dynamics of multiple European options on a single underlying. We subsequently demonstrate its use as a risk simulation engine for option portfolios. Through backtesting analysis, we show that our models are more computationally efficient and accurate for evaluating the V… ▽ More In this paper, we examine the capacity of an arbitrage-free neural-SDE market model to produce realistic scenarios for the joint dynamics of multiple European options on a single underlying. We subsequently demonstrate its use as a risk simulation engine for option portfolios. Through backtesting analysis, we show that our models are more computationally efficient and accurate for evaluating the Value-at-Risk (VaR) of option portfolios, with better coverage performance and less procyclicality than standard filtered historical simulation approaches. △ Less

Submitted 14 February, 2022; originally announced February 2022.

MSC Class: 91B28; 91B70; 62M45; 62P05

arXiv:2111.10637 [pdf, other]

Gradient-based estimation of linear Hawkes processes with general kernels

Authors: Álvaro Cartea, Samuel N. Cohen, Saad Labyad

Abstract: Linear multivariate Hawkes processes (MHP) are a fundamental class of point processes with self-excitation. When estimating parameters for these processes, a difficulty is that the two main error functionals, the log-likelihood and the least squares error (LSE), as well as the evaluation of their gradients, have a quadratic complexity in the number of observed events. In practice, this prohibits t… ▽ More Linear multivariate Hawkes processes (MHP) are a fundamental class of point processes with self-excitation. When estimating parameters for these processes, a difficulty is that the two main error functionals, the log-likelihood and the least squares error (LSE), as well as the evaluation of their gradients, have a quadratic complexity in the number of observed events. In practice, this prohibits the use of exact gradient-based algorithms for parameter estimation. We construct an adaptive stratified sampling estimator of the gradient of the LSE. This results in a fast parametric estimation method for MHP with general kernels, applicable to large datasets, which compares favourably with existing methods. △ Less

Submitted 20 November, 2021; originally announced November 2021.

Comments: 51 pages, 17 figures, 1 table

MSC Class: 60G55; 62M09; 90C52; 93E10

arXiv:2110.03684 [pdf, other]

Cross-Domain Imitation Learning via Optimal Transport

Authors: Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos

Abstract: Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology. Comparing trajectories and stationary distributions between the expert and imitation agents is challenging because they live on different systems that may not even have the same dimensionality. We propose Gromov-Wasserstein Imitation Lear… ▽ More Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology. Comparing trajectories and stationary distributions between the expert and imitation agents is challenging because they live on different systems that may not even have the same dimensionality. We propose Gromov-Wasserstein Imitation Learning (GWIL), a method for cross-domain imitation that uses the Gromov-Wasserstein distance to align and compare states between the different spaces of the agents. Our theory formally characterizes the scenarios where GWIL preserves optimality, revealing its possibilities and limitations. We demonstrate the effectiveness of GWIL in non-trivial continuous control domains ranging from simple rigid transformation of the expert domain to arbitrary transformation of the state-action space. △ Less

Submitted 25 April, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: ICLR 2022

arXiv:2106.10272 [pdf, other]

Riemannian Convex Potential Maps

Authors: Samuel Cohen, Brandon Amos, Yaron Lipman

Abstract: Modeling distributions on Riemannian manifolds is a crucial component in understanding non-Euclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited by representational and computational tradeoffs. We propose and study a class of flows that uses convex potentials from Riemannian optimal transport. These are universal and can model distributions on a… ▽ More Modeling distributions on Riemannian manifolds is a crucial component in understanding non-Euclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited by representational and computational tradeoffs. We propose and study a class of flows that uses convex potentials from Riemannian optimal transport. These are universal and can model distributions on any compact Riemannian manifold without requiring domain knowledge of the manifold to be integrated into the architecture. We demonstrate that these flows can model standard distributions on spheres, and tori, on synthetic and geological data. Our source code is freely available online at http://github.com/facebookresearch/rcpm △ Less

Submitted 18 June, 2021; originally announced June 2021.

Comments: ICML 2021

arXiv:2105.11053 [pdf, other]

Arbitrage-free neural-SDE market models

Authors: Samuel N. Cohen, Christoph Reisinger, Sheng Wang

Abstract: Modelling joint dynamics of liquid vanilla options is crucial for arbitrage-free pricing of illiquid derivatives and managing risks of option trade books. This paper develops a nonparametric model for the European options book respecting underlying financial constraints and while being practically implementable. We derive a state space for prices which are free from static (or model-independent) a… ▽ More Modelling joint dynamics of liquid vanilla options is crucial for arbitrage-free pricing of illiquid derivatives and managing risks of option trade books. This paper develops a nonparametric model for the European options book respecting underlying financial constraints and while being practically implementable. We derive a state space for prices which are free from static (or model-independent) arbitrage and study the inference problem where a model is learnt from discrete time series data of stock and option prices. We use neural networks as function approximators for the drift and diffusion of the modelled SDE system, and impose constraints on the neural nets such that no-arbitrage conditions are preserved. In particular, we give methods to calibrate \textit{neural SDE} models which are guaranteed to satisfy a set of linear inequalities. We validate our approach with numerical experiments using data generated from a Heston stochastic local volatility model. △ Less

Submitted 23 August, 2021; v1 submitted 23 May, 2021; originally announced May 2021.

MSC Class: 91B28; 91B70; 62M45; 62P05

arXiv:2102.07115 [pdf, other]

Sliced Multi-Marginal Optimal Transport

Authors: Samuel Cohen, Alexander Terenin, Yannik Pitcan, Brandon Amos, Marc Peter Deisenroth, K S Sesh Kumar

Abstract: Multi-marginal optimal transport enables one to compare multiple probability measures, which increasingly finds application in multi-task learning problems. One practical limitation of multi-marginal transport is computational scalability in the number of measures, samples and dimensionality. In this work, we propose a multi-marginal optimal transport paradigm based on random one-dimensional proje… ▽ More Multi-marginal optimal transport enables one to compare multiple probability measures, which increasingly finds application in multi-task learning problems. One practical limitation of multi-marginal transport is computational scalability in the number of measures, samples and dimensionality. In this work, we propose a multi-marginal optimal transport paradigm based on random one-dimensional projections, whose (generalized) distance we term the sliced multi-marginal Wasserstein distance. To construct this distance, we introduce a characterization of the one-dimensional multi-marginal Kantorovich problem and use it to highlight a number of properties of the sliced multi-marginal Wasserstein distance. In particular, we show that (i) the sliced multi-marginal Wasserstein distance is a (generalized) metric that induces the same topology as the standard Wasserstein distance, (ii) it admits a dimension-free sample complexity, (iii) it is tightly connected with the problem of barycentric averaging under the sliced-Wasserstein metric. We conclude by illustrating the sliced multi-marginal Wasserstein on multi-task density estimation and multi-dynamics reinforcement learning problems. △ Less

Submitted 23 November, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

Journal ref: NeurIPS Workshop on Optimal Transport and Machine Learning, 2021

arXiv:2102.07106 [pdf, other]

Healing Products of Gaussian Processes

Authors: Samuel Cohen, Rendani Mbuvha, Tshilidzi Marwala, Marc Peter Deisenroth

Abstract: Gaussian processes (GPs) are nonparametric Bayesian models that have been applied to regression and classification problems. One of the approaches to alleviate their cubic training cost is the use of local GP experts trained on subsets of the data. In particular, product-of-expert models combine the predictive distributions of local experts through a tractable product operation. While these expert… ▽ More Gaussian processes (GPs) are nonparametric Bayesian models that have been applied to regression and classification problems. One of the approaches to alleviate their cubic training cost is the use of local GP experts trained on subsets of the data. In particular, product-of-expert models combine the predictive distributions of local experts through a tractable product operation. While these expert models allow for massively distributed computation, their predictions typically suffer from erratic behaviour of the mean or uncalibrated uncertainty quantification. By calibrating predictions via a tempered softmax weighting, we provide a solution to these problems for multiple product-of-expert models, including the generalised product of experts and the robust Bayesian committee machine. Furthermore, we leverage the optimal transport literature and propose a new product-of-expert model that combines predictions of local experts by computing their Wasserstein barycenter, which can be applied to both regression and classification. △ Less

Submitted 14 February, 2021; originally announced February 2021.

Comments: ICML 2020

arXiv:2102.04263 [pdf, other]

Generalised correlated batched bandits via the ARC algorithm with application to dynamic pricing

Authors: Samuel Cohen, Tanut Treetanthiploet

Abstract: The Asymptotic Randomised Control (ARC) algorithm provides a rigorous approximation to the optimal strategy for a wide class of Bayesian bandits, while retaining low computational complexity. In particular, the ARC approach provides nearly optimal choices even when the payoffs are correlated or more than the reward is observed. The algorithm is guaranteed to asymptotically optimise the expected di… ▽ More The Asymptotic Randomised Control (ARC) algorithm provides a rigorous approximation to the optimal strategy for a wide class of Bayesian bandits, while retaining low computational complexity. In particular, the ARC approach provides nearly optimal choices even when the payoffs are correlated or more than the reward is observed. The algorithm is guaranteed to asymptotically optimise the expected discounted payoff, with error depending on the initial uncertainty of the bandit. In this paper, we extend the ARC framework to consider a batched bandit problem where observations arrive from a generalised linear model. In particular, we develop a large sample approximation to allow correlated and generally distributed observation. We apply this to a classic dynamic pricing problem based on a Bayesian hierarchical model and demonstrate that the ARC algorithm outperforms alternative approaches. △ Less

Submitted 12 October, 2022; v1 submitted 8 February, 2021; originally announced February 2021.

MSC Class: 62J12; 90B50; 91B38; 93C41

arXiv:2010.07252 [pdf, other]

Asymptotic Randomised Control with applications to bandits

Authors: Samuel N. Cohen, Tanut Treetanthiploet

Abstract: We consider a general multi-armed bandit problem with correlated (and simple contextual and restless) elements, as a relaxed control problem. By introducing an entropy regularisation, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process. This semi-index can be interpreted as explicitly balancing an explorati… ▽ More We consider a general multi-armed bandit problem with correlated (and simple contextual and restless) elements, as a relaxed control problem. By introducing an entropy regularisation, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process. This semi-index can be interpreted as explicitly balancing an exploration-exploitation trade-off as in the optimistic (UCB) principle where the learning premium explicitly describes asymmetry of information available in the environment and non-linearity in the reward function. Performance of the resulting Asymptotic Randomised Control (ARC) algorithm compares favourably well with other approaches to correlated multi-armed bandits. △ Less

Submitted 3 September, 2022; v1 submitted 14 October, 2020; originally announced October 2020.

MSC Class: 60J20; 93E35; 90C40; 41A58

arXiv:2008.07648 [pdf, other]

Nonparametric Learning of Two-Layer ReLU Residual Units

Authors: Zhunxuan Wang, Linyun He, Chunchuan Lyu, Shay B. Cohen

Abstract: We describe an algorithm that learns two-layer residual units using rectified linear unit (ReLU) activation: suppose the input $\mathbf{x}$ is from a distribution with support space $\mathbb{R}^d$ and the ground-truth generative model is a residual unit of this type, given by $\mathbf{y} = \boldsymbol{B}^\ast\left[\left(\boldsymbol{A}^\ast\mathbf{x}\right)^+ + \mathbf{x}\right]$, where ground-trut… ▽ More We describe an algorithm that learns two-layer residual units using rectified linear unit (ReLU) activation: suppose the input $\mathbf{x}$ is from a distribution with support space $\mathbb{R}^d$ and the ground-truth generative model is a residual unit of this type, given by $\mathbf{y} = \boldsymbol{B}^\ast\left[\left(\boldsymbol{A}^\ast\mathbf{x}\right)^+ + \mathbf{x}\right]$, where ground-truth network parameters $\boldsymbol{A}^\ast \in \mathbb{R}^{d\times d}$ represent a full-rank matrix with nonnegative entries and $\boldsymbol{B}^\ast \in \mathbb{R}^{m\times d}$ is full-rank with $m \geq d$ and for $\boldsymbol{c} \in \mathbb{R}^d$, $[\boldsymbol{c}^{+}]_i = \max\{0, c_i\}$. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. Following this objective landscape, learning residual units from finite samples can be formulated using convex optimization of a nonparametric function: for each layer, we first formulate the corresponding empirical risk minimization (ERM) as a positive semi-definite quadratic program (QP), then we show the solution space of the QP can be equivalently determined by a set of linear inequalities, which can then be efficiently solved by linear programming (LP). We further prove the strong statistical consistency of our algorithm, and demonstrate its robustness and sample efficiency through experimental results on synthetic data and a set of benchmark regression datasets. △ Less

Submitted 10 December, 2022; v1 submitted 17 August, 2020; originally announced August 2020.

Comments: Published in Transactions on Machine Learning Research (11/2022), slightly typographically revised

arXiv:2007.07105 [pdf, other]

Estimating Barycenters of Measures in High Dimensions

Authors: Samuel Cohen, Michael Arbel, Marc Peter Deisenroth

Abstract: Barycentric averaging is a principled way of summarizing populations of measures. Existing algorithms for estimating barycenters typically parametrize them as weighted sums of Diracs and optimize their weights and/or locations. However, these approaches do not scale to high-dimensional settings due to the curse of dimensionality. In this paper, we propose a scalable and general algorithm for estim… ▽ More Barycentric averaging is a principled way of summarizing populations of measures. Existing algorithms for estimating barycenters typically parametrize them as weighted sums of Diracs and optimize their weights and/or locations. However, these approaches do not scale to high-dimensional settings due to the curse of dimensionality. In this paper, we propose a scalable and general algorithm for estimating barycenters of measures in high dimensions. The key idea is to turn the optimization over measures into an optimization over generative models, introducing inductive biases that allow the method to scale while still accurately estimating barycenters. We prove local convergence under mild assumptions on the discrepancy showing that the approach is well-posed. We demonstrate that our method is fast, achieves good performance on low-dimensional problems, and scales to high-dimensional settings. In particular, our approach is the first to be used to estimate barycenters in thousands of dimensions. △ Less

Submitted 14 February, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: In submission

arXiv:2006.12648 [pdf, other]

Aligning Time Series on Incomparable Spaces

Authors: Samuel Cohen, Giulia Luise, Alexander Terenin, Brandon Amos, Marc Peter Deisenroth

Abstract: Dynamic time war** (DTW) is a useful method for aligning, comparing and combining time series, but it requires them to live in comparable spaces. In this work, we consider a setting in which time series live on different spaces without a sensible ground metric, causing DTW to become ill-defined. To alleviate this, we propose Gromov dynamic time war** (GDTW), a distance between time series on p… ▽ More Dynamic time war** (DTW) is a useful method for aligning, comparing and combining time series, but it requires them to live in comparable spaces. In this work, we consider a setting in which time series live on different spaces without a sensible ground metric, causing DTW to become ill-defined. To alleviate this, we propose Gromov dynamic time war** (GDTW), a distance between time series on potentially incomparable spaces that avoids the comparability requirement by instead considering intra-relational geometry. We demonstrate its effectiveness at aligning, combining and comparing time series living on incomparable spaces. We further propose a smoothed version of GDTW as a differentiable loss and assess its properties in a variety of settings, including barycentric averaging, generative modeling and imitation learning. △ Less

Submitted 22 February, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

Journal ref: Artificial Intelligence and Statistics, 2021

arXiv:2005.04064 [pdf, other]

Lossy Compression with Distortion Constrained Optimization

Authors: Ties van Rozendaal, Guillaume Sautière, Taco S. Cohen

Abstract: When training end-to-end learned models for lossy compression, one has to balance the rate and distortion losses. This is typically done by manually setting a tradeoff parameter $β$, an approach called $β$-VAE. Using this approach it is difficult to target a specific rate or distortion value, because the result can be very sensitive to $β$, and the appropriate value for $β$ depends on the model an… ▽ More When training end-to-end learned models for lossy compression, one has to balance the rate and distortion losses. This is typically done by manually setting a tradeoff parameter $β$, an approach called $β$-VAE. Using this approach it is difficult to target a specific rate or distortion value, because the result can be very sensitive to $β$, and the appropriate value for $β$ depends on the model and problem setup. As a result, model comparison requires extensive per-model $β$-tuning, and producing a whole rate-distortion curve (by varying $β$) for each model to be compared. We argue that the constrained optimization method of Rezende and Viola, 2018 is a lot more appropriate for training lossy compression models because it allows us to obtain the best possible rate subject to a distortion constraint. This enables pointwise model comparisons, by training two models with the same distortion target and comparing their rate. We show that the method does manage to satisfy the constraint on a realistic image compression task, outperforms a constrained optimization method based on a hinge-loss, and is more practical to use for model selection than a $β$-VAE. △ Less

Submitted 8 May, 2020; originally announced May 2020.

Comments: Accepted as a CVPR 2020 workshop paper: Workshop and Challenge on Learned Image Compression (CLIC)

arXiv:2004.09691 [pdf, other]

A Data and Compute Efficient Design for Limited-Resources Deep Learning

Authors: Mirgahney Mohamed, Gabriele Cesa, Taco S. Cohen, Max Welling

Abstract: Thanks to their improved data efficiency, equivariant neural networks have gained increased interest in the deep learning community. They have been successfully applied in the medical domain where symmetries in the data can be effectively exploited to build more accurate and robust models. To be able to reach a much larger body of patients, mobile, on-device implementations of deep learning soluti… ▽ More Thanks to their improved data efficiency, equivariant neural networks have gained increased interest in the deep learning community. They have been successfully applied in the medical domain where symmetries in the data can be effectively exploited to build more accurate and robust models. To be able to reach a much larger body of patients, mobile, on-device implementations of deep learning solutions have been developed for medical applications. However, equivariant models are commonly implemented using large and computationally expensive architectures, not suitable to run on mobile devices. In this work, we design and test an equivariant version of MobileNetV2 and further optimize it with model quantization to enable more efficient inference. We achieve close-to state of the art performance on the Patch Camelyon (PCam) medical dataset while being more computationally efficient. △ Less

Submitted 8 July, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

Comments: Accepted for poster presentation at the Practical Machine Learning for Develo** Countries (PML4DC) workshop, ICLR 2020

arXiv:2004.04342 [pdf, other]

Feedback Recurrent Autoencoder for Video Compression

Authors: Adam Golinski, Reza Pourreza, Yang Yang, Guillaume Sautiere, Taco S Cohen

Abstract: Recent advances in deep generative modeling have enabled efficient modeling of high dimensional data distributions and opened up a new horizon for solving data compression problems. Specifically, autoencoder based learned image or video compression solutions are emerging as strong competitors to traditional approaches. In this work, We propose a new network architecture, based on common and well s… ▽ More Recent advances in deep generative modeling have enabled efficient modeling of high dimensional data distributions and opened up a new horizon for solving data compression problems. Specifically, autoencoder based learned image or video compression solutions are emerging as strong competitors to traditional approaches. In this work, We propose a new network architecture, based on common and well studied components, for learned video compression operating in low latency mode. Our method yields state of the art MS-SSIM/rate performance on the high-resolution UVG dataset, among both learned video compression approaches and classical video compression methods (H.265 and H.264) in the rate range of interest for streaming applications. Additionally, we provide an analysis of existing approaches through the lens of their underlying probabilistic graphical models. Finally, we point out issues with temporal consistency and color shift observed in empirical evaluation, and suggest directions forward to alleviate those. △ Less

Submitted 8 April, 2020; originally announced April 2020.

arXiv:2001.11235 [pdf, other]

Learning Discrete Distributions by Dequantization

Authors: Emiel Hoogeboom, Taco S. Cohen, Jakub M. Tomczak

Abstract: Media is generally stored digitally and is therefore discrete. Many successful deep distribution models in deep learning learn a density, i.e., the distribution of a continuous random variable. Naïve optimization on discrete data leads to arbitrarily high likelihoods, and instead, it has become standard practice to add noise to datapoints. In this paper, we present a general framework for dequanti… ▽ More Media is generally stored digitally and is therefore discrete. Many successful deep distribution models in deep learning learn a density, i.e., the distribution of a continuous random variable. Naïve optimization on discrete data leads to arbitrarily high likelihoods, and instead, it has become standard practice to add noise to datapoints. In this paper, we present a general framework for dequantization that captures existing methods as a special case. We derive two new dequantization objectives: importance-weighted (iw) dequantization and Rényi dequantization. In addition, we introduce autoregressive dequantization (ARD) for more flexible dequantization distributions. Empirically we find that iw and Rényi dequantization considerably improve performance for uniform dequantization distributions. ARD achieves a negative log-likelihood of 3.06 bits per dimension on CIFAR10, which to the best of our knowledge is state-of-the-art among distribution models that do not require autoregressive inverses for sampling. △ Less

Submitted 30 January, 2020; originally announced January 2020.

arXiv:1911.04018 [pdf, other]

Feedback Recurrent AutoEncoder

Authors: Yang Yang, Guillaume Sautière, J. Jon Ryu, Taco S Cohen

Abstract: In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in spee… ▽ More In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in speech spectrogram compression. Specifically, we show that the FRAE, paired with a powerful neural vocoder, can produce high-quality speech waveforms at a low, fixed bitrate. We further show that by adding a learned prior for the latent space and using an entropy coder, we can achieve an even lower variable bitrate. △ Less

Submitted 17 February, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:1908.05717 [pdf, other]

doi 10.1109/ICCV.2019.00713

Video Compression With Rate-Distortion Autoencoders

Authors: Amirhossein Habibian, Ties van Rozendaal, Jakub M. Tomczak, Taco S. Cohen

Abstract: In this paper we present a a deep generative model for lossy video compression. We employ a model that consists of a 3D autoencoder with a discrete latent space and an autoregressive prior used for entropy coding. Both autoencoder and prior are trained jointly to minimize a rate-distortion loss, which is closely related to the ELBO used in variational autoencoders. Despite its simplicity, we find… ▽ More In this paper we present a a deep generative model for lossy video compression. We employ a model that consists of a 3D autoencoder with a discrete latent space and an autoregressive prior used for entropy coding. Both autoencoder and prior are trained jointly to minimize a rate-distortion loss, which is closely related to the ELBO used in variational autoencoders. Despite its simplicity, we find that our method outperforms the state-of-the-art learned video compression networks based on motion compensation or interpolation. We systematically evaluate various design choices, such as the use of frame-based or spatio-temporal autoencoders, and the type of autoregressive prior. In addition, we present three extensions of the basic method that demonstrate the benefits over classical approaches to compression. First, we introduce semantic compression, where the model is trained to allocate more bits to objects of interest. Second, we study adaptive compression, where the model is adapted to a domain with limited variability, e.g., videos taken from an autonomous car, to achieve superior compression on that domain. Finally, we introduce multimodal compression, where we demonstrate the effectiveness of our model in joint compression of multiple modalities captured by non-standard imaging sensors, such as quad cameras. We believe that this opens up novel video compression applications, which have not been feasible with classical codecs. △ Less

Submitted 13 November, 2019; v1 submitted 14 August, 2019; originally announced August 2019.

Comments: Accepted to ICCV 2019

arXiv:1906.02481 [pdf, ps, other]

Covariance in Physics and Convolutional Neural Networks

Authors: Miranda C. N. Cheng, Vassilis Anagiannis, Maurice Weiler, Pim de Haan, Taco S. Cohen, Max Welling

Abstract: In this proceeding we give an overview of the idea of covariance (or equivariance) featured in the recent development of convolutional neural networks (CNNs). We study the similarities and differences between the use of covariance in theoretical physics and in the CNN context. Additionally, we demonstrate that the simple assumption of covariance, together with the required properties of locality,… ▽ More In this proceeding we give an overview of the idea of covariance (or equivariance) featured in the recent development of convolutional neural networks (CNNs). We study the similarities and differences between the use of covariance in theoretical physics and in the CNN context. Additionally, we demonstrate that the simple assumption of covariance, together with the required properties of locality, linearity and weight sharing, is sufficient to uniquely determine the form of the convolution. △ Less

Submitted 6 June, 2019; originally announced June 2019.

arXiv:1905.10144 [pdf, other]

Adaptive Symmetric Reward Noising for Reinforcement Learning

Authors: Refael Vivanti, Talya D. Sohlberg-Baris, Shlomo Cohen, Orna Cohen

Abstract: Recent reinforcement learning algorithms, though achieving impressive results in various fields, suffer from brittle training effects such as regression in results and high sensitivity to initialization and parameters. We claim that some of the brittleness stems from variance differences, i.e. when different environment areas - states and/or actions - have different rewards variance. This causes t… ▽ More Recent reinforcement learning algorithms, though achieving impressive results in various fields, suffer from brittle training effects such as regression in results and high sensitivity to initialization and parameters. We claim that some of the brittleness stems from variance differences, i.e. when different environment areas - states and/or actions - have different rewards variance. This causes two problems: First, the "Boring Areas Trap" in algorithms such as Q-learning, where moving between areas depends on the current area variance, and getting out of a boring area is hard due to its low variance. Second, the "Manipulative Consultant" problem, when value-estimation functions used in DQN and Actor-Critic algorithms influence the agent to prefer boring areas, regardless of the mean rewards return, as they maximize estimation precision rather than rewards. This sheds a new light on how exploration contribute to training, as it helps with both challenges. Cognitive experiments in humans showed that noised reward signals may paradoxically improve performance. We explain this using the two mentioned problems, claiming that both humans and algorithms may share similar challenges. Inspired by this result, we propose the Adaptive Symmetric Reward Noising (ASRN), by which we mean adding Gaussian noise to rewards according to their states' estimated variance, thus avoiding the two problems while not affecting the environment's mean rewards behavior. We conduct our experiments in a Multi Armed Bandit problem with variance differences. We demonstrate that a Q-learning algorithm shows the brittleness effect in this problem, and that the ASRN scheme can dramatically improve the results. We show that ASRN helps a DQN algorithm training process reach better results in an end to end autonomous driving task using the AirSim driving simulator. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: 9 pages, 7 figures, conference

arXiv:1904.09585 [pdf, other]

Obfuscation for Privacy-preserving Syntactic Parsing

Authors: Zhifeng Hu, Serhii Havrylov, Ivan Titov, Shay B. Cohen

Abstract: The goal of homomorphic encryption is to encrypt data such that another party can operate on it without being explicitly exposed to the content of the original data. We introduce an idea for a privacy-preserving transformation on natural language data, inspired by homomorphic encryption. Our primary tool is {\em obfuscation}, relying on the properties of natural language. Specifically, a given Eng… ▽ More The goal of homomorphic encryption is to encrypt data such that another party can operate on it without being explicitly exposed to the content of the original data. We introduce an idea for a privacy-preserving transformation on natural language data, inspired by homomorphic encryption. Our primary tool is {\em obfuscation}, relying on the properties of natural language. Specifically, a given English text is obfuscated using a neural model that aims to preserve the syntactic relationships of the original sentence so that the obfuscated sentence can be parsed instead of the original one. The model works at the word level, and learns to obfuscate each word separately by changing it into a new word that has a similar syntactic role. The text obfuscated by our model leads to better performance on three syntactic parsers (two dependency and one constituency parsers) in comparison to an upper-bound random substitution baseline. More specifically, the results demonstrate that as more terms are obfuscated (by their part of speech), the substitution upper bound significantly degrades, while the neural model maintains a relatively high performing parser. All of this is done without much sacrifice of privacy compared to the random substitution upper bound. We also further analyze the results, and discover that the substituted words have similar syntactic properties, but different semantic content, compared to the original words. △ Less

Submitted 27 May, 2020; v1 submitted 21 April, 2019; originally announced April 2019.

Comments: Accepted to IWPT 2020

arXiv:1902.04615 [pdf, other]

Gauge Equivariant Convolutional Networks and the Icosahedral CNN

Authors: Taco S. Cohen, Maurice Weiler, Berkay Kicanaoglu, Max Welling

Abstract: The principle of equivariance to symmetry transformations enables a theoretically grounded approach to neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations. This enables the d… ▽ More The principle of equivariance to symmetry transformations enables a theoretically grounded approach to neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations. This enables the development of a very general class of convolutional neural networks on manifolds that depend only on the intrinsic geometry, and which includes many popular methods from equivariant and geometric deep learning. We implement gauge equivariant CNNs for signals defined on the surface of the icosahedron, which provides a reasonable approximation of the sphere. By choosing to work with this very regular manifold, we are able to implement the gauge equivariant convolution using a single conv2d call, making it a highly scalable and practical alternative to Spherical CNNs. Using this method, we demonstrate substantial improvements over previous methods on the task of segmenting omnidirectional images and global climate patterns. △ Less

Submitted 13 May, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

Comments: Proceedings of the International Conference on Machine Learning (ICML), 2019

arXiv:1807.04689 [pdf, other]

Explorations in Homeomorphic Variational Auto-Encoding

Authors: Luca Falorsi, Pim de Haan, Tim R. Davidson, Nicola De Cao, Maurice Weiler, Patrick Forré, Taco S. Cohen

Abstract: The manifold hypothesis states that many kinds of high-dimensional data are concentrated near a low-dimensional manifold. If the topology of this data manifold is non-trivial, a continuous encoder network cannot embed it in a one-to-one manner without creating holes of low density in the latent space. This is at odds with the Gaussian prior assumption typically made in Variational Auto-Encoders (V… ▽ More The manifold hypothesis states that many kinds of high-dimensional data are concentrated near a low-dimensional manifold. If the topology of this data manifold is non-trivial, a continuous encoder network cannot embed it in a one-to-one manner without creating holes of low density in the latent space. This is at odds with the Gaussian prior assumption typically made in Variational Auto-Encoders (VAEs), because the density of a Gaussian concentrates near a blob-like manifold. In this paper we investigate the use of manifold-valued latent variables. Specifically, we focus on the important case of continuously differentiable symmetry groups (Lie groups), such as the group of 3D rotations $\operatorname{SO}(3)$. We show how a VAE with $\operatorname{SO}(3)$-valued latent variables can be constructed, by extending the reparameterization trick to compact connected Lie groups. Our experiments show that choosing manifold-valued latent variables that match the topology of the latent data manifold, is crucial to preserve the topological structure and learn a well-behaved latent space. △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: 16 pages, 8 figures, ICML workshop on Theoretical Foundations and Applications of Deep Generative Models

arXiv:1807.00583 [pdf, other]

Sample Efficient Semantic Segmentation using Rotation Equivariant Convolutional Networks

Authors: Jasper Linmans, Jim Winkens, Bastiaan S. Veeling, Taco S. Cohen, Max Welling

Abstract: We propose a semantic segmentation model that exploits rotation and reflection symmetries. We demonstrate significant gains in sample efficiency due to increased weight sharing, as well as improvements in robustness to symmetry transformations. The group equivariant CNN framework is extended for segmentation by introducing a new equivariant (G->Z2)-convolution that transforms feature maps on a gro… ▽ More We propose a semantic segmentation model that exploits rotation and reflection symmetries. We demonstrate significant gains in sample efficiency due to increased weight sharing, as well as improvements in robustness to symmetry transformations. The group equivariant CNN framework is extended for segmentation by introducing a new equivariant (G->Z2)-convolution that transforms feature maps on a group to planar feature maps. Also, equivariant transposed convolution is formulated for up-sampling in an encoder-decoder network. To demonstrate improvements in sample efficiency we evaluate on multiple data regimes of a rotation-equivariant segmentation task: cancer metastases detection in histopathology images. We further show the effectiveness of exploiting more symmetries by varying the size of the group. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: Presented at the ICML workshop: Towards learning with limited labels: Equivariance, Invariance, and Beyond, 2018

arXiv:1804.04656 [pdf, other]

3D G-CNNs for Pulmonary Nodule Detection

Authors: Marysia Winkels, Taco S. Cohen

Abstract: Convolutional Neural Networks (CNNs) require a large amount of annotated data to learn from, which is often difficult to obtain in the medical domain. In this paper we show that the sample complexity of CNNs can be significantly improved by using 3D roto-translation group convolutions (G-Convs) instead of the more conventional translational convolutions. These 3D G-CNNs were applied to the problem… ▽ More Convolutional Neural Networks (CNNs) require a large amount of annotated data to learn from, which is often difficult to obtain in the medical domain. In this paper we show that the sample complexity of CNNs can be significantly improved by using 3D roto-translation group convolutions (G-Convs) instead of the more conventional translational convolutions. These 3D G-CNNs were applied to the problem of false positive reduction for pulmonary nodule detection, and proved to be substantially more effective in terms of performance, sensitivity to malignant nodules, and speed of convergence compared to a strong and comparable baseline architecture with regular convolutions, data augmentation and a similar number of parameters. For every dataset size tested, the G-CNN achieved a FROC score close to the CNN trained on ten times more data. △ Less

Submitted 12 April, 2018; originally announced April 2018.

Journal ref: International conference on Medical Imaging with Deep Learning, 2018

arXiv:1803.10743 [pdf, other]

Intertwiners between Induced Representations (with Applications to the Theory of Equivariant Neural Networks)

Authors: Taco S. Cohen, Mario Geiger, Maurice Weiler

Abstract: Group equivariant and steerable convolutional neural networks (regular and steerable G-CNNs) have recently emerged as a very effective model class for learning from signal data such as 2D and 3D images, video, and other data where symmetries are present. In geometrical terms, regular G-CNNs represent data in terms of scalar fields ("feature channels"), whereas the steerable G-CNN can also use vect… ▽ More Group equivariant and steerable convolutional neural networks (regular and steerable G-CNNs) have recently emerged as a very effective model class for learning from signal data such as 2D and 3D images, video, and other data where symmetries are present. In geometrical terms, regular G-CNNs represent data in terms of scalar fields ("feature channels"), whereas the steerable G-CNN can also use vector or tensor fields ("capsules") to represent data. In algebraic terms, the feature spaces in regular G-CNNs transform according to a regular representation of the group G, whereas the feature spaces in Steerable G-CNNs transform according to the more general induced representations of G. In order to make the network equivariant, each layer in a G-CNN is required to intertwine between the induced representations associated with its input and output space. In this paper we present a general mathematical framework for G-CNNs on homogeneous spaces like Euclidean space or the sphere. We show, using elementary methods, that the layers of an equivariant network are convolutional if and only if the input and output feature spaces transform according to an induced representation. This result, which follows from G.W. Mackey's abstract theory on induced representations, establishes G-CNNs as a universal class of equivariant network architectures, and generalizes the important recent work of Kondor & Trivedi on the intertwiners between regular representations. △ Less

Submitted 30 March, 2018; v1 submitted 28 March, 2018; originally announced March 2018.

arXiv:1803.02108 [pdf, other]

HexaConv

Authors: Emiel Hoogeboom, Jorn W. T. Peters, Taco S. Cohen, Max Welling

Abstract: The effectiveness of Convolutional Neural Networks stems in large part from their ability to exploit the translation invariance that is inherent in many learning problems. Recently, it was shown that CNNs can exploit other invariances, such as rotation invariance, by using group convolutions instead of planar convolutions. However, for reasons of performance and ease of implementation, it has been… ▽ More The effectiveness of Convolutional Neural Networks stems in large part from their ability to exploit the translation invariance that is inherent in many learning problems. Recently, it was shown that CNNs can exploit other invariances, such as rotation invariance, by using group convolutions instead of planar convolutions. However, for reasons of performance and ease of implementation, it has been necessary to limit the group convolution to transformations that can be applied to the filters without interpolation. Thus, for images with square pixels, only integer translations, rotations by multiples of 90 degrees, and reflections are admissible. Whereas the square tiling provides a 4-fold rotational symmetry, a hexagonal tiling of the plane has a 6-fold rotational symmetry. In this paper we show how one can efficiently implement planar convolution and group convolution over hexagonal lattices, by re-using existing highly optimized convolution routines. We find that, due to the reduced anisotropy of hexagonal filters, planar HexaConv provides better accuracy than planar convolution with square filters, given a fixed parameter budget. Furthermore, we find that the increased degree of symmetry of the hexagonal grid increases the effectiveness of group convolutions, by allowing for more parameter sharing. We show that our method significantly outperforms conventional CNNs on the AID aerial scene classification dataset, even outperforming ImageNet pre-trained models. △ Less

Submitted 6 March, 2018; originally announced March 2018.

arXiv:1801.10130 [pdf, other]

Spherical CNNs

Authors: Taco S. Cohen, Mario Geiger, Jonas Koehler, Max Welling

Abstract: Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive a… ▽ More Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective. In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression. △ Less

Submitted 25 February, 2018; v1 submitted 30 January, 2018; originally announced January 2018.

Comments: Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018

Journal ref: Proceedings of the International Conference on Learning Representations, 2018

arXiv:1612.08498 [pdf, other]

Steerable CNNs

Authors: Taco S. Cohen, Max Welling

Abstract: It has long been recognized that the invariance and equivariance properties of a representation are critically important for success in many vision tasks. In this paper we present Steerable Convolutional Neural Networks, an efficient and flexible class of equivariant convolutional networks. We show that steerable CNNs achieve state of the art results on the CIFAR image classification benchmark. Th… ▽ More It has long been recognized that the invariance and equivariance properties of a representation are critically important for success in many vision tasks. In this paper we present Steerable Convolutional Neural Networks, an efficient and flexible class of equivariant convolutional networks. We show that steerable CNNs achieve state of the art results on the CIFAR image classification benchmark. The mathematical theory of steerable representations reveals a type system in which any steerable representation is a composition of elementary feature types, each one associated with a particular kind of symmetry. We show how the parameter cost of a steerable filter bank depends on the types of the input and output features, and show how to use this knowledge to construct CNNs that utilize parameters effectively. △ Less

Submitted 26 December, 2016; originally announced December 2016.

Journal ref: Proceedings of the International Conference on Learning Representations, 2017

arXiv:1606.00229 [pdf, ps, other]

Uncertainty and filtering of hidden Markov models in discrete time

Authors: Samuel N. Cohen

Abstract: We consider the problem of filtering an unseen Markov chain from noisy observations, in the presence of uncertainty regarding the parameters of the processes involved. Using the theory of nonlinear expectations, we describe the uncertainty in terms of a penalty function, which can be propagated forward in time in the place of the filter. We consider the problem of filtering an unseen Markov chain from noisy observations, in the presence of uncertainty regarding the parameters of the processes involved. Using the theory of nonlinear expectations, we describe the uncertainty in terms of a penalty function, which can be propagated forward in time in the place of the filter. △ Less

Submitted 12 May, 2018; v1 submitted 1 June, 2016; originally announced June 2016.

MSC Class: 62M20; 60G35; 93E11

arXiv:1602.07576 [pdf, ps, other]

Group Equivariant Convolutional Networks

Authors: Taco S. Cohen, Max Welling

Abstract: We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without inc… ▽ More We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CIFAR10 and rotated MNIST. △ Less

Submitted 3 June, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

Journal ref: Proceedings of the International Conference on Machine Learning (ICML), 2016

arXiv:1505.04413 [pdf, other]

Harmonic Exponential Families on Manifolds

Authors: Taco S. Cohen, Max Welling

Abstract: In a range of fields including the geosciences, molecular biology, robotics and computer vision, one encounters problems that involve random variables on manifolds. Currently, there is a lack of flexible probabilistic models on manifolds that are fast and easy to train. We define an extremely flexible class of exponential family distributions on manifolds such as the torus, sphere, and rotation gr… ▽ More In a range of fields including the geosciences, molecular biology, robotics and computer vision, one encounters problems that involve random variables on manifolds. Currently, there is a lack of flexible probabilistic models on manifolds that are fast and easy to train. We define an extremely flexible class of exponential family distributions on manifolds such as the torus, sphere, and rotation groups, and show that for these distributions the gradient of the log-likelihood can be computed efficiently using a non-commutative generalization of the Fast Fourier Transform (FFT). We discuss applications to Bayesian camera motion estimation (where harmonic exponential families serve as conjugate priors), and modelling of the spatial distribution of earthquakes on the surface of the earth. Our experimental results show that harmonic densities yield a significantly higher likelihood than the best competing method, while being orders of magnitude faster to train. △ Less

Submitted 20 May, 2015; v1 submitted 17 May, 2015; originally announced May 2015.

Comments: fixed typo

Journal ref: Proceedings of the International Conference on Machine Learning, 2015

arXiv:1311.6257 [pdf, other]

Filters and smoothers for self-exciting Markov modulated counting processes

Authors: Samuel N. Cohen, Robert J. Elliott

Abstract: We consider a self-exciting counting process, the parameters of which depend on a hidden finite-state Markov chain. We derive the optimal filter and smoother for the hidden chain based on observation of the jump process. This filter is in closed form and is finite dimensional. We demonstrate the performance of this filter both with simulated data, and by analysing the `flash crash' of 6th May 2010… ▽ More We consider a self-exciting counting process, the parameters of which depend on a hidden finite-state Markov chain. We derive the optimal filter and smoother for the hidden chain based on observation of the jump process. This filter is in closed form and is finite dimensional. We demonstrate the performance of this filter both with simulated data, and by analysing the `flash crash' of 6th May 2010 in this framework. △ Less

Submitted 25 November, 2013; originally announced November 2013.

MSC Class: 62M05; 60G55; 60J28; 91G70

Showing 1–41 of 41 results for author: Cohen, S