Search | arXiv e-print repository

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

Authors: Aidan Scannell, Kalle Kujanpää, Yi Zhao, Mohammadreza Nakhaei, Arno Solin, Joni Pajarinen

Abstract: Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent rep… ▽ More Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent representation collapse by quantizing the latent representation such that the rank of the representation is empirically preserved. Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm, and demonstrates excellent performance by outperforming other recently proposed representation learning methods in continuous control benchmarks from DeepMind Control Suite. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 9 pages, 11 figures

arXiv:2406.00561 [pdf, other]

Learning to Approximate Particle Smoothing Trajectories via Diffusion Generative Models

Authors: Ella Tamir, Arno Solin

Abstract: Learning dynamical systems from sparse observations is critical in numerous fields, including biology, finance, and physics. Even if tackling such problems is standard in general information fusion, it remains challenging for contemporary machine learning models, such as diffusion models. We introduce a method that integrates conditional particle filtering with ancestral sampling and diffusion mod… ▽ More Learning dynamical systems from sparse observations is critical in numerous fields, including biology, finance, and physics. Even if tackling such problems is standard in general information fusion, it remains challenging for contemporary machine learning models, such as diffusion models. We introduce a method that integrates conditional particle filtering with ancestral sampling and diffusion models, enabling the generation of realistic trajectories that align with observed data. Our approach uses a smoother based on iterating a conditional particle filter with ancestral sampling to first generate plausible trajectories matching observed marginals, and learns the corresponding diffusion model. This approach provides both a generative method for high-quality, smoothed trajectories under complex constraints, and an efficient approximation of the particle smoothing distribution for classical tracking problems. We demonstrate the approach in time-series generation and interpolation tasks, including vehicle tracking and single-cell RNA sequencing data. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.17889 [pdf, other]

Improving Discrete Diffusion Models via Structured Preferential Generation

Authors: Severi Rissanen, Markus Heinonen, Arno Solin

Abstract: In the domains of image and audio, diffusion models have shown impressive performance. However, their application to discrete data types, such as language, has often been suboptimal compared to autoregressive generative models. This paper tackles the challenge of improving discrete diffusion models by introducing a structured forward process that leverages the inherent information hierarchy in dis… ▽ More In the domains of image and audio, diffusion models have shown impressive performance. However, their application to discrete data types, such as language, has often been suboptimal compared to autoregressive generative models. This paper tackles the challenge of improving discrete diffusion models by introducing a structured forward process that leverages the inherent information hierarchy in discrete categories, such as words in text. Our approach biases the generative process to produce certain categories before others, resulting in a notable improvement in log-likelihood scores on the text8 dataset. This work paves the way for more advances in discrete diffusion models with potentially significant enhancements in performance. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 10 pages, 7 figures

arXiv:2405.17656 [pdf, other]

Alignment is Key for Applying Diffusion Models to Retrosynthesis

Authors: Najwa Laabid, Severi Rissanen, Markus Heinonen, Arno Solin, Vikas Garg

Abstract: Retrosynthesis, the task of identifying precursors for a given molecule, can be naturally framed as a conditional graph generation task. Diffusion models are a particularly promising modelling approach, enabling post-hoc conditioning and trading off quality for speed during generation. We show mathematically that permutation equivariant denoisers severely limit the expressiveness of graph diffusio… ▽ More Retrosynthesis, the task of identifying precursors for a given molecule, can be naturally framed as a conditional graph generation task. Diffusion models are a particularly promising modelling approach, enabling post-hoc conditioning and trading off quality for speed during generation. We show mathematically that permutation equivariant denoisers severely limit the expressiveness of graph diffusion models and thus their adaptation to retrosynthesis. To address this limitation, we relax the equivariance requirement such that it only applies to aligned permutations of the conditioning and the generated graphs obtained through atom map**. Our new denoiser achieves the highest top-$1$ accuracy ($54.7$\%) across template-free and template-based methods on USPTO-50k. We also demonstrate the ability for flexible post-training conditioning and good sample quality with small diffusion step counts, highlighting the potential for interactive applications and additional controls for multi-step planning. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 9 figures

arXiv:2404.07696 [pdf, other]

Flatness Improves Backbone Generalisation in Few-shot Classification

Authors: Rui Li, Martin Trapp, Marcus Klasson, Arno Solin

Abstract: Deployment of deep neural networks in real-world settings typically requires adaptation to new tasks with few examples. Few-shot classification (FSC) provides a solution to this problem by leveraging pre-trained backbones for fast adaptation to new classes. Surprisingly, most efforts have only focused on develo** architectures for easing the adaptation to the target domain without considering th… ▽ More Deployment of deep neural networks in real-world settings typically requires adaptation to new tasks with few examples. Few-shot classification (FSC) provides a solution to this problem by leveraging pre-trained backbones for fast adaptation to new classes. Surprisingly, most efforts have only focused on develo** architectures for easing the adaptation to the target domain without considering the importance of backbone training for good generalisation. We show that flatness-aware backbone training with vanilla fine-tuning results in a simpler yet competitive baseline compared to the state-of-the-art. Our results indicate that for in- and cross-domain FSC, backbone training is crucial to achieving good generalisation across different adaptation methods. We advocate more care should be taken when training these models. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.13327 [pdf, other]

Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion

Authors: Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Arno Solin

Abstract: High-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras. We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data suffering from motion blur and rolling shutter distortion. Our approach is based on… ▽ More High-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras. We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data suffering from motion blur and rolling shutter distortion. Our approach is based on detailed modelling of the physical image formation process and utilizes velocities estimated using visual-inertial odometry (VIO). Camera poses are considered non-static during the exposure time of a single image frame and camera poses are further optimized in the reconstruction process. We formulate a differentiable rendering pipeline that leverages screen space approximation to efficiently incorporate rolling-shutter and motion blur effects into the 3DGS framework. Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods, thereby advancing 3DGS in naturalistic settings. △ Less

Submitted 24 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Source code available at https://github.com/SpectacularAI/3dgs-deblur

arXiv:2403.10929 [pdf, other]

Function-space Parameterization of Neural Networks for Sequential Learning

Authors: Aidan Scannell, Riccardo Mereu, Paul Chang, Ella Tamir, Joni Pajarinen, Arno Solin

Abstract: Sequential learning paradigms pose challenges for gradient-based deep learning due to difficulties incorporating new data and retaining prior knowledge. While Gaussian processes elegantly tackle these problems, they struggle with scalability and handling rich inputs, such as images. To address these issues, we introduce a technique that converts neural networks from weight space to function space,… ▽ More Sequential learning paradigms pose challenges for gradient-based deep learning due to difficulties incorporating new data and retaining prior knowledge. While Gaussian processes elegantly tackle these problems, they struggle with scalability and handling rich inputs, such as images. To address these issues, we introduce a technique that converts neural networks from weight space to function space, through a dual parameterization. Our parameterization offers: (i) a way to scale function-space methods to large data sets via sparsification, (ii) retention of prior knowledge when access to past data is limited, and (iii) a mechanism to incorporate new data without retraining. Our experiments demonstrate that we can retain knowledge in continual learning and incorporate new data efficiently. We further show its strengths in uncertainty quantification and guiding exploration in model-based RL. Further information and code is available on the project website. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 29 pages, 8 figures, Published in The Twelfth International Conference on Learning Representations

arXiv:2310.00724 [pdf, other]

Subtractive Mixture Models via Squaring: Representation and Learning

Authors: Lorenzo Loconte, Aleksanteri M. Sladek, Stefan Mengel, Martin Trapp, Arno Solin, Nicolas Gillis, Antonio Vergari

Abstract: Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and pe… ▽ More Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks. △ Less

Submitted 26 April, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2309.15478 [pdf, other]

The Robust Semantic Segmentation UNCV2023 Challenge Results

Authors: Xuanlong Yu, Yi Zuo, Zitao Wang, Xiaowen Zhang, Jiaxuan Zhao, Yuting Yang, Licheng Jiao, Rui Peng, Xinyi Wang, Junpei Zhang, Kexin Zhang, Fang Liu, Roberto Alcover-Couso, Juan C. SanMiguel, Marcos Escudero-Viñolo, Hanlin Tian, Kenta Matsui, Tianhao Wang, Fahmy Adan, Zhitong Gao, Xuming He, Quentin Bouniot, Hossein Moghaddam, Shyam Nandan Rai, Fabio Cermelli , et al. (12 additional authors not shown)

Abstract: This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty q… ▽ More This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty quantification methodologies presented at prominent conferences in the fields of computer vision and machine learning and journals over the past few years. Within this document, the challenge is introduced, shedding light on its purpose and objectives, which primarily revolved around enhancing the robustness of semantic segmentation in urban scenes under varying natural adversarial conditions. The report then delves into the top-performing solutions. Moreover, the document aims to provide a comprehensive overview of the diverse solutions deployed by all participants. By doing so, it seeks to offer readers a deeper insight into the array of strategies that can be leveraged to effectively handle the inherent uncertainties associated with autonomous driving and semantic segmentation, especially within urban environments. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: 11 pages, 4 figures, accepted at ICCV 2023 UNCV workshop

arXiv:2309.02195 [pdf, ps, other]

Sparse Function-space Representation of Neural Networks

Authors: Aidan Scannell, Riccardo Mereu, Paul Chang, Ella Tamir, Joni Pajarinen, Arno Solin

Abstract: Deep neural networks (NNs) are known to lack uncertainty estimates and struggle to incorporate new data. We present a method that mitigates these issues by converting NNs from weight space to function space, via a dual parameterization. Importantly, the dual parameterization enables us to formulate a sparse representation that captures information from the entire data set. This offers a compact an… ▽ More Deep neural networks (NNs) are known to lack uncertainty estimates and struggle to incorporate new data. We present a method that mitigates these issues by converting NNs from weight space to function space, via a dual parameterization. Importantly, the dual parameterization enables us to formulate a sparse representation that captures information from the entire data set. This offers a compact and principled way of capturing uncertainty and enables us to incorporate new data without retraining whilst retaining predictive performance. We provide proof-of-concept demonstrations with the proposed approach for quantifying uncertainty in supervised learning on UCI benchmark tasks. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted to ICML 2023 Workshop on Duality for Modern Machine Learning, Honolulu, Hawaii, USA. 4 pages, 2 figures, 1 table

arXiv:2306.04201 [pdf, other]

Improving Hyperparameter Learning under Approximate Inference in Gaussian Process Models

Authors: Rui Li, ST John, Arno Solin

Abstract: Approximate inference in Gaussian process (GP) models with non-conjugate likelihoods gets entangled with the learning of the model hyperparameters. We improve hyperparameter learning in GP models and focus on the interplay between variational inference (VI) and the learning target. While VI's lower bound to the marginal likelihood is a suitable objective for inferring the approximate posterior, we… ▽ More Approximate inference in Gaussian process (GP) models with non-conjugate likelihoods gets entangled with the learning of the model hyperparameters. We improve hyperparameter learning in GP models and focus on the interplay between variational inference (VI) and the learning target. While VI's lower bound to the marginal likelihood is a suitable objective for inferring the approximate posterior, we show that a direct approximation of the marginal likelihood as in Expectation Propagation (EP) is a better learning objective for hyperparameter optimization. We design a hybrid training procedure to bring the best of both worlds: it leverages conjugate-computation VI for inference and uses an EP-like marginal likelihood approximation for hyperparameter learning. We compare VI, EP, Laplace approximation, and our proposed training procedure and empirically demonstrate the effectiveness of our proposal across a wide range of data sets. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: International Conference on Machine Learning (ICML) 2023

arXiv:2306.03953 [pdf, other]

doi 10.1017/dce.2024.12

Rao-Blackwellized Particle Smoothing for Simultaneous Localization and Map**

Authors: Manon Kok, Arno Solin, Thomas B. Schön

Abstract: Simultaneous localization and map** (SLAM) is the task of building a map representation of an unknown environment while at the same time using it for positioning. A probabilistic interpretation of the SLAM task allows for incorporating prior knowledge and for operation under uncertainty. Contrary to the common practice of computing point estimates of the system states, we capture the full poster… ▽ More Simultaneous localization and map** (SLAM) is the task of building a map representation of an unknown environment while at the same time using it for positioning. A probabilistic interpretation of the SLAM task allows for incorporating prior knowledge and for operation under uncertainty. Contrary to the common practice of computing point estimates of the system states, we capture the full posterior density through approximate Bayesian inference. This dynamic learning task falls under state estimation, where the state-of-the-art is in sequential Monte Carlo methods that tackle the forward filtering problem. In this paper, we introduce a framework for probabilistic SLAM using particle smoothing that does not only incorporate observed data in current state estimates, but it also back-tracks the updated knowledge to correct for past drift and ambiguities in both the map and in the states. Our solution can efficiently handle both dense and sparse map representations by Rao-Blackwellization of conditionally linear and conditionally linearized models. We show through simulations and real-world experiments how the principles apply to radio (BLE/Wi-Fi), magnetic field, and visual SLAM. The proposed solution is general, efficient, and works well under confounding noise. △ Less

Submitted 5 June, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 23 pages, 7 figures

Journal ref: Data-Centric Engineering. 2024;5:e15

arXiv:2306.03566 [pdf, other]

Memory-Based Dual Gaussian Processes for Sequential Learning

Authors: Paul E. Chang, Prakhar Verma, S. T. John, Arno Solin, Mohammad Emtiyaz Khan

Abstract: Sequential learning with Gaussian processes (GPs) is challenging when access to past data is limited, for example, in continual and active learning. In such cases, errors can accumulate over time due to inaccuracies in the posterior, hyperparameters, and inducing points, making accurate learning challenging. Here, we present a method to keep all such errors in check using the recently proposed dua… ▽ More Sequential learning with Gaussian processes (GPs) is challenging when access to past data is limited, for example, in continual and active learning. In such cases, errors can accumulate over time due to inaccuracies in the posterior, hyperparameters, and inducing points, making accurate learning challenging. Here, we present a method to keep all such errors in check using the recently proposed dual sparse variational GP. Our method enables accurate inference for generic likelihoods and improves learning by actively building and updating a memory of past data. We demonstrate its effectiveness in several applications involving Bayesian optimization, active learning, and continual learning. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: International Conference on Machine Learning (ICML) 2023

arXiv:2306.02066 [pdf, other]

Variational Gaussian Process Diffusion Processes

Authors: Prakhar Verma, Vincent Adam, Arno Solin

Abstract: Diffusion processes are a class of stochastic differential equations (SDEs) providing a rich family of expressive models that arise naturally in dynamic modelling tasks. Probabilistic inference and learning under generative models with latent processes endowed with a non-linear diffusion process prior are intractable problems. We build upon work within variational inference, approximating the post… ▽ More Diffusion processes are a class of stochastic differential equations (SDEs) providing a rich family of expressive models that arise naturally in dynamic modelling tasks. Probabilistic inference and learning under generative models with latent processes endowed with a non-linear diffusion process prior are intractable problems. We build upon work within variational inference, approximating the posterior process as a linear diffusion process, and point out pathologies in the approach. We propose an alternative parameterization of the Gaussian variational process using a site-based exponential family description. This allows us to trade a slow inference algorithm with fixed-point iterations for a fast algorithm for convex optimization akin to natural gradient descent, which also provides a better objective for learning model parameters. △ Less

Submitted 27 February, 2024; v1 submitted 3 June, 2023; originally announced June 2023.

Comments: International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

arXiv:2304.04307 [pdf, other]

PriorCVAE: scalable MCMC parameter inference with Bayesian deep generative modelling

Authors: Elizaveta Semenova, Prakhar Verma, Max Cairney-Leeming, Arno Solin, Samir Bhatt, Seth Flaxman

Abstract: Recent advances have shown that GP priors, or their finite realisations, can be encoded using deep generative models such as variational autoencoders (VAEs). These learned generators can serve as drop-in replacements for the original priors during MCMC inference. While this approach enables efficient inference, it loses information about the hyperparameters of the original models, and consequently… ▽ More Recent advances have shown that GP priors, or their finite realisations, can be encoded using deep generative models such as variational autoencoders (VAEs). These learned generators can serve as drop-in replacements for the original priors during MCMC inference. While this approach enables efficient inference, it loses information about the hyperparameters of the original models, and consequently makes inference over hyperparameters impossible and the learned priors indistinct. To overcome this limitation, we condition the VAE on stochastic process hyperparameters. This allows the joint encoding of hyperparameters with GP realizations and their subsequent estimation during inference. Further, we demonstrate that our proposed method, PriorCVAE, is agnostic to the nature of the models which it approximates, and can be used, for instance, to encode solutions of ODEs. It provides a practical tool for approximate inference and shows potential in real-life spatial and spatiotemporal applications. △ Less

Submitted 10 November, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

arXiv:2302.06359 [pdf, other]

Fixing Overconfidence in Dynamic Neural Networks

Authors: Lassi Meronen, Martin Trapp, Andrea Pilzer, Le Yang, Arno Solin

Abstract: Dynamic neural networks are a recent technique that promises a remedy for the increasing size of modern deep learning models by dynamically adapting their computational cost to the difficulty of the inputs. In this way, the model can adjust to a limited computational budget. However, the poor quality of uncertainty estimates in deep learning models makes it difficult to distinguish between hard an… ▽ More Dynamic neural networks are a recent technique that promises a remedy for the increasing size of modern deep learning models by dynamically adapting their computational cost to the difficulty of the inputs. In this way, the model can adjust to a limited computational budget. However, the poor quality of uncertainty estimates in deep learning models makes it difficult to distinguish between hard and easy samples. To address this challenge, we present a computationally efficient approach for post-hoc uncertainty quantification in dynamic neural networks. We show that adequately quantifying and accounting for both aleatoric and epistemic uncertainty through a probabilistic treatment of the last layers improves the predictive performance and aids decision-making when determining the computational budget. In the experiments, we show improvements on CIFAR-100, ImageNet, and Caltech-256 in terms of accuracy, capturing uncertainty, and calibration error. △ Less

Submitted 8 December, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

arXiv:2301.13636 [pdf, other]

Transport with Support: Data-Conditional Diffusion Bridges

Authors: Ella Tamir, Martin Trapp, Arno Solin

Abstract: The dynamic Schrödinger bridge problem provides an appealing setting for solving constrained time-series data generation tasks posed as optimal transport problems. It consists of learning non-linear diffusion processes using efficient iterative solvers. Recent works have demonstrated state-of-the-art results (eg. in modelling single-cell embryo RNA sequences or sampling from complex posteriors) bu… ▽ More The dynamic Schrödinger bridge problem provides an appealing setting for solving constrained time-series data generation tasks posed as optimal transport problems. It consists of learning non-linear diffusion processes using efficient iterative solvers. Recent works have demonstrated state-of-the-art results (eg. in modelling single-cell embryo RNA sequences or sampling from complex posteriors) but are limited to learning bridges with only initial and terminal constraints. Our work extends this paradigm by proposing the Iterative Smoothing Bridge (ISB). We integrate Bayesian filtering and optimal control into learning the diffusion process, enabling the generation of constrained stochastic processes governed by sparse observations at intermediate stages and terminal constraints. We assess the effectiveness of our method on synthetic and real-world data generation tasks and we show that the ISB generalises well to high-dimensional data, is computationally efficient, and provides accurate estimates of the marginals at intermediate and terminal times. △ Less

Submitted 24 November, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: 27 pages, 11 figures

arXiv:2212.13381 [pdf, other]

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

Authors: Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi

Abstract: Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional deri… ▽ More Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy. △ Less

Submitted 15 October, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: 16 pages, Best Student Paper Award at UAI 2023

arXiv:2211.06260 [pdf, other]

Towards Improved Learning in Gaussian Processes: The Best of Two Worlds

Authors: Rui Li, ST John, Arno Solin

Abstract: Gaussian process training decomposes into inference of the (approximate) posterior and learning of the hyperparameters. For non-Gaussian (non-conjugate) likelihoods, two common choices for approximate inference are Expectation Propagation (EP) and Variational Inference (VI), which have complementary strengths and weaknesses. While VI's lower bound to the marginal likelihood is a suitable objective… ▽ More Gaussian process training decomposes into inference of the (approximate) posterior and learning of the hyperparameters. For non-Gaussian (non-conjugate) likelihoods, two common choices for approximate inference are Expectation Propagation (EP) and Variational Inference (VI), which have complementary strengths and weaknesses. While VI's lower bound to the marginal likelihood is a suitable objective for inferring the approximate posterior, it does not automatically imply it is a good learning objective for hyperparameter optimization. We design a hybrid training procedure where the inference leverages conjugate-computation VI and the learning uses an EP-like marginal likelihood approximation. We empirically demonstrate on binary classification that this provides a good learning objective and generalizes better. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: In the 2022 NeurIPS Workshop on Gaussian Processes, Spatiotemporal Modeling, and Decision-making Systems

arXiv:2211.01053 [pdf, other]

Fantasizing with Dual GPs in Bayesian Optimization and Active Learning

Authors: Paul E. Chang, Prakhar Verma, ST John, Victor Picheny, Henry Moss, Arno Solin

Abstract: Gaussian processes (GPs) are the main surrogate functions used for sequential modelling such as Bayesian Optimization and Active Learning. Their drawbacks are poor scaling with data and the need to run an optimization loop when using a non-Gaussian likelihood. In this paper, we focus on `fantasizing' batch acquisition functions that need the ability to condition on new fantasized data computationa… ▽ More Gaussian processes (GPs) are the main surrogate functions used for sequential modelling such as Bayesian Optimization and Active Learning. Their drawbacks are poor scaling with data and the need to run an optimization loop when using a non-Gaussian likelihood. In this paper, we focus on `fantasizing' batch acquisition functions that need the ability to condition on new fantasized data computationally efficiently. By using a sparse Dual GP parameterization, we gain linear scaling with batch size as well as one-step updates for non-Gaussian likelihoods, thus extending sparse models to greedy batch fantasizing acquisition functions. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: In the 2022 NeurIPS Workshop on Gaussian Processes, Spatiotemporal Modeling, and Decision-making Systems

arXiv:2211.00392 [pdf, other]

Expansion of Visual Hints for Improved Generalization in Stereo Matching

Authors: Andrea Pilzer, Yuxin Hou, Niki Loppi, Arno Solin, Juho Kannala

Abstract: We introduce visual hints expansion for guiding stereo matching to improve generalization. Our work is motivated by the robustness of Visual Inertial Odometry (VIO) in computer vision and robotics, where a sparse and unevenly distributed set of feature points characterizes a scene. To improve stereo matching, we propose to elevate 2D hints to 3D points. These sparse and unevenly distributed 3D vis… ▽ More We introduce visual hints expansion for guiding stereo matching to improve generalization. Our work is motivated by the robustness of Visual Inertial Odometry (VIO) in computer vision and robotics, where a sparse and unevenly distributed set of feature points characterizes a scene. To improve stereo matching, we propose to elevate 2D hints to 3D points. These sparse and unevenly distributed 3D visual hints are expanded using a 3D random geometric graph, which enhances the learning and inference process. We evaluate our proposal on multiple widely adopted benchmarks and show improved performance without access to additional sensors other than the image sequence. To highlight practical applicability and symbiosis with visual odometry, we demonstrate how our methods run on embedded hardware. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 2023 IEEE Winter Conference on Applications of Computer Vision (WACV)

arXiv:2208.07591 [pdf, other]

Uncertainty-guided Source-free Domain Adaptation

Authors: Subhankar Roy, Martin Trapp, Andrea Pilzer, Juho Kannala, Nicu Sebe, Elisa Ricci, Arno Solin

Abstract: Source-free domain adaptation (SFDA) aims to adapt a classifier to an unlabelled target data set by only using a pre-trained source model. However, the absence of the source data and the domain shift makes the predictions on the target data unreliable. We propose quantifying the uncertainty in the source model predictions and utilizing it to guide the target adaptation. For this, we construct a pr… ▽ More Source-free domain adaptation (SFDA) aims to adapt a classifier to an unlabelled target data set by only using a pre-trained source model. However, the absence of the source data and the domain shift makes the predictions on the target data unreliable. We propose quantifying the uncertainty in the source model predictions and utilizing it to guide the target adaptation. For this, we construct a probabilistic source model by incorporating priors on the network parameters inducing a distribution over the model predictions. Uncertainties are estimated by employing a Laplace approximation and incorporated to identify target data points that do not lie in the source manifold and to down-weight them when maximizing the mutual information on the target data. Unlike recent works, our probabilistic treatment is computationally lightweight, decouples source training and target adaptation, and requires no specialized source training or changes of the model architecture. We show the advantages of uncertainty-guided SFDA over traditional SFDA in the closed-set and open-set settings and provide empirical evidence that our approach is more robust to strong domain shifts even without tuning. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: ECCV 2022

arXiv:2206.13397 [pdf, other]

Generative Modelling With Inverse Heat Dissipation

Authors: Severi Rissanen, Markus Heinonen, Arno Solin

Abstract: While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new diffusion-like model that generates images through stochastically reversing the heat equatio… ▽ More While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new diffusion-like model that generates images through stochastically reversing the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. We interpret the solution of the forward heat equation with constant additive noise as a variational approximation in the diffusion latent variable model. Our new model shows emergent qualitative properties not seen in standard diffusion models, such as disentanglement of overall colour and shape in images. Spectral analysis on natural images highlights connections to diffusion models and reveals an implicit coarse-to-fine inductive bias in them. △ Less

Submitted 12 April, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.08890 [pdf, other]

Disentangling Model Multiplicity in Deep Learning

Authors: Ari Heljakka, Martin Trapp, Juho Kannala, Arno Solin

Abstract: Model multiplicity is a well-known but poorly understood phenomenon that undermines the generalisation guarantees of machine learning models. It appears when two models with similar training-time performance differ in their predictions and real-world performance characteristics. This observed 'predictive' multiplicity (PM) also implies elusive differences in the internals of the models, their 'rep… ▽ More Model multiplicity is a well-known but poorly understood phenomenon that undermines the generalisation guarantees of machine learning models. It appears when two models with similar training-time performance differ in their predictions and real-world performance characteristics. This observed 'predictive' multiplicity (PM) also implies elusive differences in the internals of the models, their 'representational' multiplicity (RM). We introduce a conceptual and experimental setup for analysing RM by measuring activation similarity via singular vector canonical correlation analysis (SVCCA). We show that certain differences in training methods systematically result in larger RM than others and evaluate RM and PM over a finite sample as predictors for generalizability. We further correlate RM with PM measured by the variance in i.i.d. and out-of-distribution test predictions in four standard image data sets. Finally, instead of attempting to eliminate RM, we call for its systematic measurement and maximal exposure. △ Less

Submitted 31 January, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: 13 pages, 6 figures

arXiv:2205.13821 [pdf, other]

A Look at Improving Robustness in Visual-inertial SLAM by Moment Matching

Authors: Arno Solin, Rui Li, Andrea Pilzer

Abstract: The fusion of camera sensor and inertial data is a leading method for ego-motion tracking in autonomous and smart devices. State estimation techniques that rely on non-linear filtering are a strong paradigm for solving the associated information fusion task. The de facto inference method in this space is the celebrated extended Kalman filter (EKF), which relies on first-order linearizations of bot… ▽ More The fusion of camera sensor and inertial data is a leading method for ego-motion tracking in autonomous and smart devices. State estimation techniques that rely on non-linear filtering are a strong paradigm for solving the associated information fusion task. The de facto inference method in this space is the celebrated extended Kalman filter (EKF), which relies on first-order linearizations of both the dynamical and measurement model. This paper takes a critical look at the practical implications and limitations posed by the EKF, especially under faulty visual feature associations and the presence of strong confounding noise. As an alternative, we revisit the assumed density formulation of Bayesian filtering and employ a moment matching (unscented Kalman filtering) approach to both visual-inertial odometry and visual SLAM. Our results highlight important aspects in robustness both in dynamics propagation and visual measurement updates, and we show state-of-the-art results on EuRoC MAV drone data benchmark. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 8 pages, to appear in Proceedings of FUSION 2022

arXiv:2111.08524 [pdf, other]

Non-separable Spatio-temporal Graph Kernels via SPDEs

Authors: Alexander Nikitin, ST John, Arno Solin, Samuel Kaski

Abstract: Gaussian processes (GPs) provide a principled and direct approach for inference and learning on graphs. However, the lack of justified graph kernels for spatio-temporal modelling has held back their use in graph problems. We leverage an explicit link between stochastic partial differential equations (SPDEs) and GPs on graphs, introduce a framework for deriving graph kernels via SPDEs, and derive n… ▽ More Gaussian processes (GPs) provide a principled and direct approach for inference and learning on graphs. However, the lack of justified graph kernels for spatio-temporal modelling has held back their use in graph problems. We leverage an explicit link between stochastic partial differential equations (SPDEs) and GPs on graphs, introduce a framework for deriving graph kernels via SPDEs, and derive non-separable spatio-temporal graph kernels that capture interaction across space and time. We formulate the graph kernels for the stochastic heat equation and wave equation. We show that by providing novel tools for spatio-temporal GP modelling on graphs, we outperform pre-existing graph kernels in real-world applications that feature diffusion, oscillation, and other complicated interactions. △ Less

Submitted 22 March, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

arXiv:2111.03412 [pdf, other]

Dual Parameterization of Sparse Variational Gaussian Processes

Authors: Vincent Adam, Paul E. Chang, Mohammad Emtiyaz Khan, Arno Solin

Abstract: Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up in… ▽ More Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up inference using natural gradient descent, and provides a tighter evidence lower bound for hyperparameter learning. The approach has the same memory cost as the current SVGP methods, but it is faster and more accurate. △ Less

Submitted 19 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

Comments: Advances in Neural Information Processing Systems (NeurIPS 2021)

arXiv:2111.01732 [pdf, other]

Spatio-Temporal Variational Gaussian Processes

Authors: Oliver Hamelijnck, William J. Wilkinson, Niki A. Loppi, Arno Solin, Theodoros Damoulas

Abstract: We introduce a scalable approach to Gaussian process inference that combines spatio-temporal filtering with natural gradient variational inference, resulting in a non-conjugate GP method for multivariate data that scales linearly with respect to time. Our natural gradient approach enables application of parallel filtering and smoothing, further reducing the temporal span complexity to be logarithm… ▽ More We introduce a scalable approach to Gaussian process inference that combines spatio-temporal filtering with natural gradient variational inference, resulting in a non-conjugate GP method for multivariate data that scales linearly with respect to time. Our natural gradient approach enables application of parallel filtering and smoothing, further reducing the temporal span complexity to be logarithmic in the number of time steps. We derive a sparse approximation that constructs a state-space model over a reduced set of spatial inducing points, and show that for separable Markov kernels the full and sparse cases exactly recover the standard variational GP, whilst exhibiting favourable computational properties. To further improve the spatial scaling we propose a mean-field assumption of independence between spatial locations which, when coupled with sparsity and parallelisation, leads to an efficient and accurate method for large spatio-temporal problems. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2111.01721 [pdf, ps, other]

Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees

Authors: William J. Wilkinson, Simo Särkkä, Arno Solin

Abstract: We formulate natural gradient variational inference (VI), expectation propagation (EP), and posterior linearisation (PL) as extensions of Newton's method for optimising the parameters of a Bayesian posterior distribution. This viewpoint explicitly casts inference algorithms under the framework of numerical optimisation. We show that common approximations to Newton's method from the optimisation li… ▽ More We formulate natural gradient variational inference (VI), expectation propagation (EP), and posterior linearisation (PL) as extensions of Newton's method for optimising the parameters of a Bayesian posterior distribution. This viewpoint explicitly casts inference algorithms under the framework of numerical optimisation. We show that common approximations to Newton's method from the optimisation literature, namely Gauss-Newton and quasi-Newton methods (e.g., the BFGS algorithm), are still valid under this 'Bayes-Newton' framework. This leads to a suite of novel algorithms which are guaranteed to result in positive semi-definite (PSD) covariance matrices, unlike standard VI and EP. Our unifying viewpoint provides new insights into the connections between various inference schemes. All the presented methods apply to any model with a Gaussian prior and non-conjugate likelihood, which we demonstrate with (sparse) Gaussian processes and state space models. △ Less

Submitted 6 December, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: Code for methods and experiments: https://github.com/AaltoML/BayesNewton

arXiv:2110.15739 [pdf, other]

Scalable Inference in SDEs by Direct Matching of the Fokker-Planck-Kolmogorov Equation

Authors: Arno Solin, Ella Tamir, Prakhar Verma

Abstract: Simulation-based techniques such as variants of stochastic Runge-Kutta are the de facto approach for inference with stochastic differential equations (SDEs) in machine learning. These methods are general-purpose and used with parametric and non-parametric models, and neural SDEs. Stochastic Runge-Kutta relies on the use of sampling schemes that can be inefficient in high dimensions. We address thi… ▽ More Simulation-based techniques such as variants of stochastic Runge-Kutta are the de facto approach for inference with stochastic differential equations (SDEs) in machine learning. These methods are general-purpose and used with parametric and non-parametric models, and neural SDEs. Stochastic Runge-Kutta relies on the use of sampling schemes that can be inefficient in high dimensions. We address this issue by revisiting the classical SDE literature and derive direct approximations to the (typically intractable) Fokker-Planck-Kolmogorov equation by matching moments. We show how this workflow is fast, scales to high-dimensional latent spaces, and is applicable to scarce-data applications, where a non-parametric SDE with a driving Gaussian process velocity field specifies the model. △ Less

Submitted 29 October, 2021; originally announced October 2021.

Comments: To appear in Advances in Neural Information Processing Systems (NeurIPS 2021)

arXiv:2110.13572 [pdf, other]

Periodic Activation Functions Induce Stationarity

Authors: Lassi Meronen, Martin Trapp, Arno Solin

Abstract: Neural network models are known to reinforce hidden data biases, making them unreliable and difficult to interpret. We seek to build models that `know what they do not know' by introducing inductive biases in the function space. We show that periodic activation functions in Bayesian neural networks establish a connection between the prior on the network weights and translation-invariant, stationar… ▽ More Neural network models are known to reinforce hidden data biases, making them unreliable and difficult to interpret. We seek to build models that `know what they do not know' by introducing inductive biases in the function space. We show that periodic activation functions in Bayesian neural networks establish a connection between the prior on the network weights and translation-invariant, stationary Gaussian process priors. Furthermore, we show that this link goes beyond sinusoidal (Fourier) activations by also covering triangular wave and periodic ReLU activation functions. In a series of experiments, we show that periodic activation functions obtain comparable performance for in-domain data and capture sensitivity to perturbed inputs in deep neural networks for out-of-domain detection. △ Less

Submitted 20 December, 2021; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Appeared in Advances in Neural Information Processing Systems (NeurIPS 2021)

arXiv:2106.11857 [pdf, other]

doi 10.1109/WACV51458.2022.00036

HybVIO: Pushing the Limits of Real-time Visual-inertial Odometry

Authors: Otto Seiskari, Pekka Rantalankila, Juho Kannala, Jerry Ylilammi, Esa Rahtu, Arno Solin

Abstract: We present HybVIO, a novel hybrid approach for combining filtering-based visual-inertial odometry (VIO) with optimization-based SLAM. The core of our method is highly robust, independent VIO with improved IMU bias modeling, outlier rejection, stationarity detection, and feature track selection, which is adjustable to run on embedded hardware. Long-term consistency is achieved with a loosely-couple… ▽ More We present HybVIO, a novel hybrid approach for combining filtering-based visual-inertial odometry (VIO) with optimization-based SLAM. The core of our method is highly robust, independent VIO with improved IMU bias modeling, outlier rejection, stationarity detection, and feature track selection, which is adjustable to run on embedded hardware. Long-term consistency is achieved with a loosely-coupled SLAM module. In academic benchmarks, our solution yields excellent performance in all categories, especially in the real-time use case, where we outperform the current state-of-the-art. We also demonstrate the feasibility of VIO for vehicular tracking on consumer-grade hardware using a custom dataset, and show good performance in comparison to current commercial VISLAM alternatives. An open-source implementation of the HybVIO method is available at https://github.com/SpectacularAI/HybVIO △ Less

Submitted 25 November, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: 2022 IEEE Winter Conference on Applications of Computer Vision (WACV)

arXiv:2106.10210 [pdf, other]

Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes

Authors: Will Tebbutt, Arno Solin, Richard E. Turner

Abstract: Gaussian processes (GPs) are important probabilistic tools for inference and learning in spatio-temporal modelling problems such as those in climate science and epidemiology. However, existing GP approximations do not simultaneously support large numbers of off-the-grid spatial data-points and long time-series which is a hallmark of many applications. Pseudo-point approximations, one of the gold… ▽ More Gaussian processes (GPs) are important probabilistic tools for inference and learning in spatio-temporal modelling problems such as those in climate science and epidemiology. However, existing GP approximations do not simultaneously support large numbers of off-the-grid spatial data-points and long time-series which is a hallmark of many applications. Pseudo-point approximations, one of the gold-standard methods for scaling GPs to large data sets, are well suited for handling off-the-grid spatial data. However, they cannot handle long temporal observation horizons effectively reverting to cubic computational scaling in the time dimension. State space GP approximations are well suited to handling temporal data, if the temporal GP prior admits a Markov form, leading to linear complexity in the number of temporal observations, but have a cubic spatial cost and cannot handle off-the-grid spatial data. In this work we show that there is a simple and elegant way to combine pseudo-point methods with the state space GP approximation framework to get the best of both worlds. The approach hinges on a surprising conditional independence property which applies to space--time separable GPs. We demonstrate empirically that the combined approach is more scalable and applicable to a greater range of spatio-temporal problems than either method on its own. △ Less

Submitted 18 June, 2021; originally announced June 2021.

arXiv:2103.10710 [pdf, other]

Sparse Algorithms for Markovian Gaussian Processes

Authors: William J. Wilkinson, Arno Solin, Vincent Adam

Abstract: Approximate Bayesian inference methods that scale to very large datasets are crucial in leveraging probabilistic models for real-world time series. Sparse Markovian Gaussian processes combine the use of inducing variables with efficient Kalman filter-like recursions, resulting in algorithms whose computational and memory requirements scale linearly in the number of inducing points, whilst also ena… ▽ More Approximate Bayesian inference methods that scale to very large datasets are crucial in leveraging probabilistic models for real-world time series. Sparse Markovian Gaussian processes combine the use of inducing variables with efficient Kalman filter-like recursions, resulting in algorithms whose computational and memory requirements scale linearly in the number of inducing points, whilst also enabling parallel parameter updates and stochastic optimisation. Under this paradigm, we derive a general site-based approach to approximate inference, whereby we approximate the non-Gaussian likelihood with local Gaussian terms, called sites. Our approach results in a suite of novel sparse extensions to algorithms from both the machine learning and signal processing literature, including variational inference, expectation propagation, and the classical nonlinear Kalman smoothers. The derived methods are suited to large time series, and we also demonstrate their applicability to spatio-temporal data, where the model has separate inducing points in both time and space. △ Less

Submitted 9 June, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: Appearing in the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

arXiv:2101.01619 [pdf, other]

Novel View Synthesis via Depth-guided Skip Connections

Authors: Yuxin Hou, Arno Solin, Juho Kannala

Abstract: We introduce a principled approach for synthesizing new views of a scene given a single source image. Previous methods for novel view synthesis can be divided into image-based rendering methods (e.g. flow prediction) or pixel generation methods. Flow predictions enable the target view to re-use pixels directly, but can easily lead to distorted results. Directly regressing pixels can produce struct… ▽ More We introduce a principled approach for synthesizing new views of a scene given a single source image. Previous methods for novel view synthesis can be divided into image-based rendering methods (e.g. flow prediction) or pixel generation methods. Flow predictions enable the target view to re-use pixels directly, but can easily lead to distorted results. Directly regressing pixels can produce structurally consistent results but generally suffer from the lack of low-level details. In this paper, we utilize an encoder-decoder architecture to regress pixels of a target view. In order to maintain details, we couple the decoder aligned feature maps with skip connections, where the alignment is guided by predicted depth map of the target view. Our experimental results show that our method does not suffer from distortions and successfully preserves texture details with aligned skip connections. △ Less

Submitted 5 January, 2021; originally announced January 2021.

arXiv:2011.03085 [pdf, other]

RealAnt: An Open-Source Low-Cost Quadruped for Education and Research in Real-World Reinforcement Learning

Authors: Rinu Boney, Jussi Sainio, Mikko Kaivola, Arno Solin, Juho Kannala

Abstract: Current robot platforms available for research are either very expensive or unable to handle the abuse of exploratory controls in reinforcement learning. We develop RealAnt, a minimal low-cost physical version of the popular `Ant' benchmark used in reinforcement learning. RealAnt costs only $\sim$350 EUR (\$410) in materials and can be assembled in less than an hour. We validate the platform with… ▽ More Current robot platforms available for research are either very expensive or unable to handle the abuse of exploratory controls in reinforcement learning. We develop RealAnt, a minimal low-cost physical version of the popular `Ant' benchmark used in reinforcement learning. RealAnt costs only $\sim$350 EUR (\$410) in materials and can be assembled in less than an hour. We validate the platform with reinforcement learning experiments and provide baseline results on a set of benchmark tasks. We demonstrate that the RealAnt robot can learn to walk from scratch from less than 10 minutes of experience. We also provide simulator versions of the robot (with the same dimensions, state-action spaces, and delayed noisy observations) in the MuJoCo and PyBullet simulators. We open-source hardware designs, supporting software, and baseline results for educational use and reproducible research. △ Less

Submitted 4 June, 2022; v1 submitted 5 November, 2020; originally announced November 2020.

arXiv:2010.09494 [pdf, other]

Stationary Activations for Uncertainty Calibration in Deep Learning

Authors: Lassi Meronen, Christabella Irwanto, Arno Solin

Abstract: We introduce a new family of non-linear neural network activation functions that mimic the properties induced by the widely-used Matérn family of kernels in Gaussian process (GP) models. This class spans a range of locally stationary models of various degrees of mean-square differentiability. We show an explicit link to the corresponding GP models in the case that the network consists of one infin… ▽ More We introduce a new family of non-linear neural network activation functions that mimic the properties induced by the widely-used Matérn family of kernels in Gaussian process (GP) models. This class spans a range of locally stationary models of various degrees of mean-square differentiability. We show an explicit link to the corresponding GP models in the case that the network consists of one infinitely wide hidden layer. In the limit of infinite smoothness the Matérn family results in the RBF kernel, and in this case we recover RBF activations. Matérn activation functions result in similar appealing properties to their counterparts in GP models, and we demonstrate that the local stationarity property together with limited mean-square differentiability shows both good performance and uncertainty calibration in Bayesian deep learning tasks. In particular, local stationarity helps calibrate out-of-distribution (OOD) uncertainty. We demonstrate these properties on classification and regression benchmarks and a radar emitter classification task. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: To appear in Advances in Neural Information Processing Systems (NeurIPS 2020)

arXiv:2010.09105 [pdf, other]

Movement-induced Priors for Deep Stereo

Authors: Yuxin Hou, Muhammad Kamran Janjua, Juho Kannala, Arno Solin

Abstract: We propose a method for fusing stereo disparity estimation with movement-induced prior information. Instead of independent inference frame-by-frame, we formulate the problem as a non-parametric learning task in terms of a temporal Gaussian process prior with a movement-driven kernel for inter-frame reasoning. We present a hierarchy of three Gaussian process kernels depending on the availability of… ▽ More We propose a method for fusing stereo disparity estimation with movement-induced prior information. Instead of independent inference frame-by-frame, we formulate the problem as a non-parametric learning task in terms of a temporal Gaussian process prior with a movement-driven kernel for inter-frame reasoning. We present a hierarchy of three Gaussian process kernels depending on the availability of motion information, where our main focus is on a new gyroscope-driven kernel for handheld devices with low-quality MEMS sensors, thus also relaxing the requirement of having full 6D camera poses available. We show how our method can be combined with two state-of-the-art deep stereo methods. The method either work in a plug-and-play fashion with pre-trained deep stereo networks, or further improved by jointly training the kernels together with encoder-decoder architectures, leading to consistent improvement. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:2007.05994 [pdf, other]

State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes

Authors: William J. Wilkinson, Paul E. Chang, Michael Riis Andersen, Arno Solin

Abstract: We formulate approximate Bayesian inference in non-conjugate temporal and spatio-temporal Gaussian process models as a simple parameter update rule applied during Kalman smoothing. This viewpoint encompasses most inference schemes, including expectation propagation (EP), the classical (Extended, Unscented, etc.) Kalman smoothers, and variational inference. We provide a unifying perspective on thes… ▽ More We formulate approximate Bayesian inference in non-conjugate temporal and spatio-temporal Gaussian process models as a simple parameter update rule applied during Kalman smoothing. This viewpoint encompasses most inference schemes, including expectation propagation (EP), the classical (Extended, Unscented, etc.) Kalman smoothers, and variational inference. We provide a unifying perspective on these algorithms, showing how replacing the power EP moment matching step with linearisation recovers the classical smoothers. EP provides some benefits over the traditional methods via introduction of the so-called cavity distribution, and we combine these benefits with the computational efficiency of linearisation, providing extensive empirical analysis demonstrating the efficacy of various algorithms under this unifying framework. We provide a fast implementation of all methods in JAX. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: Accepted to International Conference on Machine Learning (ICML) 2020

arXiv:2007.04731 [pdf, other]

Fast Variational Learning in State-Space Gaussian Process Models

Authors: Paul E. Chang, William J. Wilkinson, Mohammad Emtiyaz Khan, Arno Solin

Abstract: Gaussian process (GP) regression with 1D inputs can often be performed in linear time via a stochastic differential equation formulation. However, for non-Gaussian likelihoods, this requires application of approximate inference methods which can make the implementation difficult, e.g., expectation propagation can be numerically unstable and variational inference can be computationally inefficient.… ▽ More Gaussian process (GP) regression with 1D inputs can often be performed in linear time via a stochastic differential equation formulation. However, for non-Gaussian likelihoods, this requires application of approximate inference methods which can make the implementation difficult, e.g., expectation propagation can be numerically unstable and variational inference can be computationally inefficient. In this paper, we propose a new method that removes such difficulties. Building upon an existing method called conjugate-computation variational inference, our approach enables linear-time inference via Kalman recursions while avoiding numerical instabilities and convergence issues. We provide an efficient JAX implementation which exploits just-in-time compilation and allows for fast automatic differentiation through large for-loops. Overall, our approach leads to fast and stable variational inference in state-space GP models that can be scaled to time series with millions of data points. △ Less

Submitted 17 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: To appear in MLSP 2020

arXiv:2006.13856 [pdf, other]

Movement Tracking by Optical Flow Assisted Inertial Navigation

Authors: Lassi Meronen, William J. Wilkinson, Arno Solin

Abstract: Robust and accurate six degree-of-freedom tracking on portable devices remains a challenging problem, especially on small hand-held devices such as smartphones. For improved robustness and accuracy, complementary movement information from an IMU and a camera is often fused. Conventional visual-inertial methods fuse information from IMUs with a sparse cloud of feature points tracked by the device c… ▽ More Robust and accurate six degree-of-freedom tracking on portable devices remains a challenging problem, especially on small hand-held devices such as smartphones. For improved robustness and accuracy, complementary movement information from an IMU and a camera is often fused. Conventional visual-inertial methods fuse information from IMUs with a sparse cloud of feature points tracked by the device camera. We consider a visually dense approach, where the IMU data is fused with the dense optical flow field estimated from the camera data. Learning-based methods applied to the full image frames can leverage visual cues and global consistency of the flow field to improve the flow estimates. We show how a learning-based optical flow model can be combined with conventional inertial navigation, and how ideas from probabilistic deep learning can aid the robustness of the measurement updates. The practical applicability is demonstrated on real-world data acquired by an iPad in a challenging low-texture environment. △ Less

Submitted 24 June, 2020; originally announced June 2020.

arXiv:2006.12063 [pdf, other]

Deep Residual Mixture Models

Authors: Perttu Hämäläinen, Martin Trapp, Tuure Saloheimo, Arno Solin

Abstract: We propose Deep Residual Mixture Models (DRMMs), a novel deep generative model architecture. Compared to other deep models, DRMMs allow more flexible conditional sampling: The model can be trained once with all variables, and then used for sampling with arbitrary combinations of conditioning variables, Gaussian priors, and (in)equality constraints. This provides new opportunities for interactive a… ▽ More We propose Deep Residual Mixture Models (DRMMs), a novel deep generative model architecture. Compared to other deep models, DRMMs allow more flexible conditional sampling: The model can be trained once with all variables, and then used for sampling with arbitrary combinations of conditioning variables, Gaussian priors, and (in)equality constraints. This provides new opportunities for interactive and exploratory machine learning, where one should minimize the user waiting for retraining a model. We demonstrate DRMMs in constrained multi-limb inverse kinematics and controllable generation of animations. △ Less

Submitted 21 July, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Code and examples can be found at https://github.com/PerttuHamalainen/DRMM

arXiv:1912.10321 [pdf, other]

Deep Automodulators

Authors: Ari Heljakka, Yuxin Hou, Juho Kannala, Arno Solin

Abstract: We introduce a new category of generative autoencoders called automodulators. These networks can faithfully reproduce individual real-world input images like regular autoencoders, but also generate a fused sample from an arbitrary combination of several such images, allowing instantaneous 'style-mixing' and other new applications. An automodulator decouples the data flow of decoder operations from… ▽ More We introduce a new category of generative autoencoders called automodulators. These networks can faithfully reproduce individual real-world input images like regular autoencoders, but also generate a fused sample from an arbitrary combination of several such images, allowing instantaneous 'style-mixing' and other new applications. An automodulator decouples the data flow of decoder operations from statistical properties thereof and uses the latent vector to modulate the former by the latter, with a principled approach for mutual disentanglement of decoder layers. Prior work has explored similar decoder architecture with GANs, but their focus has been on random sampling. A corresponding autoencoder could operate on real input images. For the first time, we show how to train such a general-purpose model with sharp outputs in high resolution, using novel training techniques, demonstrated on four image data sets. Besides style-mixing, we show state-of-the-art results in autoencoder comparison, and visual image quality nearly indistinguishable from state-of-the-art GANs. We expect the automodulator variants to become a useful building block for image applications and other data domains. △ Less

Submitted 29 October, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

Comments: To appear in Advances in Neural Information Processing Systems (NeurIPS 2020)

arXiv:1912.03249 [pdf, other]

Gaussian Process Priors for View-Aware Inference

Authors: Yuxin Hou, Ari Heljakka, Arno Solin

Abstract: While frame-independent predictions with deep neural networks have become the prominent solutions to many computer vision tasks, the potential benefits of utilizing correlations between frames have received less attention. Even though probabilistic machine learning provides the ability to encode correlation as prior knowledge for inference, there is a tangible gap between the theory and practice o… ▽ More While frame-independent predictions with deep neural networks have become the prominent solutions to many computer vision tasks, the potential benefits of utilizing correlations between frames have received less attention. Even though probabilistic machine learning provides the ability to encode correlation as prior knowledge for inference, there is a tangible gap between the theory and practice of applying probabilistic methods to modern vision problems. For this, we derive a principled framework to combine information coupling between camera poses (translation and orientation) with deep models. We proposed a novel view kernel that generalizes the standard periodic kernel in $\mathrm{SO}(3)$. We show how this soft-prior knowledge can aid several pose-related vision tasks like novel view synthesis and predict arbitrary points in the latent space of generative models, pointing towards a range of new applications for inter-frame reasoning. △ Less

Submitted 3 March, 2021; v1 submitted 6 December, 2019; originally announced December 2019.

Comments: Appearing in AAAI 2021

arXiv:1911.06287 [pdf, other]

Scalable Exact Inference in Multi-Output Gaussian Processes

Authors: Wessel P. Bruinsma, Eric Perim, Will Tebbutt, J. Scott Hosking, Arno Solin, Richard E. Turner

Abstract: Multi-output Gaussian processes (MOGPs) leverage the flexibility and interpretability of GPs while capturing structure across outputs, which is desirable, for example, in spatio-temporal modelling. The key problem with MOGPs is their computational scaling $O(n^3 p^3)$, which is cubic in the number of both inputs $n$ (e.g., time points or locations) and outputs $p$. For this reason, a popular class… ▽ More Multi-output Gaussian processes (MOGPs) leverage the flexibility and interpretability of GPs while capturing structure across outputs, which is desirable, for example, in spatio-temporal modelling. The key problem with MOGPs is their computational scaling $O(n^3 p^3)$, which is cubic in the number of both inputs $n$ (e.g., time points or locations) and outputs $p$. For this reason, a popular class of MOGPs assumes that the data live around a low-dimensional linear subspace, reducing the complexity to $O(n^3 m^3)$. However, this cost is still cubic in the dimensionality of the subspace $m$, which is still prohibitively expensive for many applications. We propose the use of a sufficient statistic of the data to accelerate inference and learning in MOGPs with orthogonal bases. The method achieves linear scaling in $m$ in practice, allowing these models to scale to large $m$ without sacrificing significant expressivity or requiring approximation. This advance opens up a wide range of real-world tasks and can be combined with existing GP approximations in a plug-and-play way. We demonstrate the efficacy of the method on various synthetic and real-world data sets. △ Less

Submitted 17 July, 2020; v1 submitted 14 November, 2019; originally announced November 2019.

Comments: 31 pages, 12 figures, 5 tables, includes appendix; to appear in ICML 2020

arXiv:1906.00360 [pdf, other]

Iterative Path Reconstruction for Large-Scale Inertial Navigation on Smartphones

Authors: Santiago Cortés Reina, Yuxin Hou, Juho Kannala, Arno Solin

Abstract: Modern smartphones have all the sensing capabilities required for accurate and robust navigation and tracking. In specific environments some data streams may be absent, less reliable, or flat out wrong. In particular, the GNSS signal can become flawed or silent inside buildings or in streets with tall buildings. In this application paper, we aim to advance the current state-of-the-art in motion es… ▽ More Modern smartphones have all the sensing capabilities required for accurate and robust navigation and tracking. In specific environments some data streams may be absent, less reliable, or flat out wrong. In particular, the GNSS signal can become flawed or silent inside buildings or in streets with tall buildings. In this application paper, we aim to advance the current state-of-the-art in motion estimation using inertial measurements in combination with partial GNSS data on standard smartphones. We show how iterative estimation methods help refine the positioning path estimates in retrospective use cases that can cover both fixed-interval and fixed-lag scenarios. We compare estimation results provided by global iterated Kalman filtering methods to those of a visual-inertial tracking scheme (Apple ARKit). The practical applicability is demonstrated on real-world use cases on empirical data acquired from both smartphones and tablet devices. △ Less

Submitted 2 June, 2019; originally announced June 2019.

Comments: To appear in Proceedings FUSION 2019

arXiv:1904.06397 [pdf, other]

Multi-View Stereo by Temporal Nonparametric Fusion

Authors: Yuxin Hou, Juho Kannala, Arno Solin

Abstract: We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder--decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process… ▽ More We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder--decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process prior. We propose a pose-kernel structure that encourages similar poses to have resembling latent spaces. The flexibility of the Gaussian process (GP) prior provides adapting memory for fusing information from previous views. We train the encoder--decoder and the GP hyperparameters jointly end-to-end. In addition to a batch method, we derive a lightweight estimation scheme that circumvents standard pitfalls in scaling Gaussian process inference, and demonstrate how our scheme can run in real-time on smart devices. △ Less

Submitted 16 August, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

Comments: ICCV 2019

arXiv:1904.06145 [pdf, other]

Towards Photographic Image Manipulation with Balanced Growing of Generative Autoencoders

Authors: Ari Heljakka, Arno Solin, Juho Kannala

Abstract: We present a generative autoencoder that provides fast encoding, faithful reconstructions (eg. retaining the identity of a face), sharp generated/reconstructed samples in high resolutions, and a well-structured latent space that supports semantic manipulation of the inputs. There are no current autoencoder or GAN models that satisfactorily achieve all of these. We build on the progressively growin… ▽ More We present a generative autoencoder that provides fast encoding, faithful reconstructions (eg. retaining the identity of a face), sharp generated/reconstructed samples in high resolutions, and a well-structured latent space that supports semantic manipulation of the inputs. There are no current autoencoder or GAN models that satisfactorily achieve all of these. We build on the progressively growing autoencoder model PIONEER, for which we completely alter the training dynamics based on a careful analysis of recently introduced normalization schemes. We show significantly improved visual and quantitative results for face identity conservation in CelebAHQ. Our model achieves state-of-the-art disentanglement of latent space, both quantitatively and via realistic image attribute manipulations. On the LSUN Bedrooms dataset, we improve the disentanglement performance of the vanilla PIONEER, despite having a simpler model. Overall, our results indicate that the PIONEER networks provide a way towards photorealistic face manipulation. △ Less

Submitted 20 February, 2020; v1 submitted 12 April, 2019; originally announced April 2019.

Comments: WACV 2020

arXiv:1904.05207 [pdf, other]

Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features

Authors: Arno Solin, Manon Kok

Abstract: Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification. This paper considers constraining GPs to arbitrarily-shaped domains with boundary conditions. We solve a Fourier-like generalised harmonic feature representation of the GP prior in the domain of interest, which both constrains the GP and attains a low-rank rep… ▽ More Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification. This paper considers constraining GPs to arbitrarily-shaped domains with boundary conditions. We solve a Fourier-like generalised harmonic feature representation of the GP prior in the domain of interest, which both constrains the GP and attains a low-rank representation that is used for speeding up inference. The method scales as $\mathcal{O}(nm^2)$ in prediction and $\mathcal{O}(m^3)$ in hyperparameter learning for regression, where $n$ is the number of data points and $m$ the number of features. Furthermore, we make use of the variational approach to allow the method to deal with non-Gaussian likelihoods. The experiments cover both simulated and empirical data in which the boundary conditions allow for inclusion of additional physical information. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: Appearing in Proceedings of AISTATS 2019

arXiv:1903.03825 [pdf]

doi 10.1016/j.neunet.2021.10.008

Interpolation Consistency Training for Semi-Supervised Learning

Authors: Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, David Lopez-Paz

Abstract: We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density reg… ▽ More We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets. Our theoretical analysis shows that ICT corresponds to a certain type of data-adaptive regularization with unlabeled points which reduces overfitting to labeled points under high confidence values. △ Less

Submitted 19 October, 2022; v1 submitted 9 March, 2019; originally announced March 2019.

Comments: This is the latest version, which is published in the Journal, "Neural Networks", in 2022. All the previous results are unchanged. Keyword: Deep Learning, Semi-supervised Learning, Mixup

Journal ref: Neural Networks, volume 145, pages 90-106 (2022)

Showing 1–50 of 64 results for author: Solin, A