Search | arXiv e-print repository

From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport

Authors: Quentin Bouniot, Ievgen Redko, Anton Mallasto, Charlotte Laclau, Karol Arndt, Oliver Struckmeier, Markus Heinonen, Ville Kyrki, Samuel Kaski

Abstract: In the last decade, we have witnessed the introduction of several novel deep neural network (DNN) architectures exhibiting ever-increasing performance across diverse tasks. Explaining the upward trend of their performance, however, remains difficult as different DNN architectures of comparable depth and width -- common factors associated with their expressive power -- may exhibit a drastically dif… ▽ More In the last decade, we have witnessed the introduction of several novel deep neural network (DNN) architectures exhibiting ever-increasing performance across diverse tasks. Explaining the upward trend of their performance, however, remains difficult as different DNN architectures of comparable depth and width -- common factors associated with their expressive power -- may exhibit a drastically different performance even when trained on the same dataset. In this paper, we introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for approximately measuring the non-linearity of deep neural networks. Built upon a score derived from closed-form optimal transport map**s, this signature provides a better understanding of the inner workings of a wide range of DNN architectures and learning paradigms, with a particular emphasis on the computer vision task. We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature and its potential for long-reaching implications. The code for our work is available at https://github.com/qbouniot/AffScoreDeep △ Less

Submitted 1 July, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: Code available at https://github.com/qbouniot/AffScoreDeep

arXiv:2305.07500 [pdf, other]

Learning representations that are closed-form Monge map** optimal with application to domain adaptation

Authors: Oliver Struckmeier, Ievgen Redko, Anton Mallasto, Karol Arndt, Markus Heinonen, Ville Kyrki

Abstract: Optimal transport (OT) is a powerful geometric tool used to compare and align probability measures following the least effort principle. Despite its widespread use in machine learning (ML), OT problem still bears its computational burden, while at the same time suffering from the curse of dimensionality for measures supported on general high-dimensional spaces. In this paper, we propose to tackle… ▽ More Optimal transport (OT) is a powerful geometric tool used to compare and align probability measures following the least effort principle. Despite its widespread use in machine learning (ML), OT problem still bears its computational burden, while at the same time suffering from the curse of dimensionality for measures supported on general high-dimensional spaces. In this paper, we propose to tackle these challenges using representation learning. In particular, we seek to learn an embedding space such that the samples of the two input measures become alignable in it with a simple affine map** that can be calculated efficiently in closed-form. We then show that such approach leads to results that are comparable to solving the original OT problem when applied to the transfer learning task on which many OT baselines where previously evaluated in both homogeneous and heterogeneous DA settings. The code for our contribution is available at \url{https://github.com/Oleffa/LaOT}. △ Less

Submitted 11 August, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

arXiv:2103.07223 [pdf, other]

Domain Curiosity: Learning Efficient Data Collection Strategies for Domain Adaptation

Authors: Karol Arndt, Oliver Struckmeier, Ville Kyrki

Abstract: Domain adaptation is a common problem in robotics, with applications such as transferring policies from simulation to real world and lifelong learning. Performing such adaptation, however, requires informative data about the environment to be available during the adaptation. In this paper, we present domain curiosity -- a method of training exploratory policies that are explicitly optimized to pro… ▽ More Domain adaptation is a common problem in robotics, with applications such as transferring policies from simulation to real world and lifelong learning. Performing such adaptation, however, requires informative data about the environment to be available during the adaptation. In this paper, we present domain curiosity -- a method of training exploratory policies that are explicitly optimized to provide data that allows a model to learn about the unknown aspects of the environment. In contrast to most curiosity methods, our approach explicitly rewards learning, which makes it robust to environment noise without sacrificing its ability to learn. We evaluate the proposed method by comparing how much a model can learn about environment dynamics given data collected by the proposed approach, compared to standard curious and random policies. The evaluation is performed using a toy environment, two simulated robot setups, and on a real-world haptic exploration task. The results show that the proposed method allows data-efficient and accurate estimation of dynamics. △ Less

Submitted 12 March, 2021; originally announced March 2021.

arXiv:2012.06279 [pdf, other]

Autoencoding Slow Representations for Semi-supervised Data Efficient Regression

Authors: Oliver Struckmeier, Kshitij Tiwari, Ville Kyrki

Abstract: The slowness principle is a concept inspired by the visual cortex of the brain. It postulates that the underlying generative factors of a quickly varying sensory signal change on a slower time scale. Unsupervised learning of intermediate representations utilizing abundant unlabeled sensory data can be leveraged to perform data-efficient supervised downstream regression. In this paper, we propose a… ▽ More The slowness principle is a concept inspired by the visual cortex of the brain. It postulates that the underlying generative factors of a quickly varying sensory signal change on a slower time scale. Unsupervised learning of intermediate representations utilizing abundant unlabeled sensory data can be leveraged to perform data-efficient supervised downstream regression. In this paper, we propose a general formulation of slowness for unsupervised representation learning adding a slowness regularization term to the estimate lower bound of the beta-VAE to encourage temporal similarity in observation and latent space. Within this framework we compare existing slowness regularization terms such as the L1 and L2 loss used in existing end-to-end methods, the SlowVAE and propose a new term based on Brownian motion. We empirically evaluate these slowness regularization terms with respect to their downstream task performance and data efficiency. We find that slow representations lead to equal or better downstream task performance and data efficiency in different experiment domains when compared to representations without slowness regularization. Finally, we discuss how the Frechet Inception Distance (FID), traditionally used to determine the generative capabilities of GANs, can serve as a measure to predict the performance of pre-trained Autoencoder model in a supervised downstream task and accelerate hyperparameter search. △ Less

Submitted 4 July, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: 11 pages, 6 figures

arXiv:1909.07201 [pdf, other]

MuPNet: Multi-modal Predictive Coding Network for Place Recognition by Unsupervised Learning of Joint Visuo-Tactile Latent Representations

Authors: Oliver Struckmeier, Kshitij Tiwari, Shirin Dora, Martin J. Pearson, Sander M. Bohte, Cyriel MA Pennartz, Ville Kyrki

Abstract: Extracting and binding salient information from different sensory modalities to determine common features in the environment is a significant challenge in robotics. Here we present MuPNet (Multi-modal Predictive Coding Network), a biologically plausible network architecture for extracting joint latent features from visuo-tactile sensory data gathered from a biomimetic mobile robot. In this study w… ▽ More Extracting and binding salient information from different sensory modalities to determine common features in the environment is a significant challenge in robotics. Here we present MuPNet (Multi-modal Predictive Coding Network), a biologically plausible network architecture for extracting joint latent features from visuo-tactile sensory data gathered from a biomimetic mobile robot. In this study we evaluate MuPNet applied to place recognition as a simulated biomimetic robot platform explores visually aliased environments. The F1 scores demonstrate that its performance over prior hand-crafted sensory feature extraction techniques is equivalent under controlled conditions, with significant improvement when operating in novel environments. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: Submitted to ICRA 2020. 6+1 Pages with 5 figures

arXiv:1906.06422 [pdf, other]

ViTa-SLAM: A Bio-inspired Visuo-Tactile SLAM for Navigation while Interacting with Aliased Environments

Authors: Oliver Struckmeier, Kshitij Tiwari, Mohammed Salman, Martin J. Pearson, Ville Kyrki

Abstract: RatSLAM is a rat hippocampus-inspired visual Simultaneous Localization and Map** (SLAM) framework capable of generating semi-metric topological representations of indoor and outdoor environments. Whisker-RatSLAM is a 6D extension of the RatSLAM and primarily focuses on object recognition by generating point clouds of objects based on whisking information. This paper introduces a novel extension… ▽ More RatSLAM is a rat hippocampus-inspired visual Simultaneous Localization and Map** (SLAM) framework capable of generating semi-metric topological representations of indoor and outdoor environments. Whisker-RatSLAM is a 6D extension of the RatSLAM and primarily focuses on object recognition by generating point clouds of objects based on whisking information. This paper introduces a novel extension to both former works that is referred to as ViTa-SLAM that harnesses both vision and tactile information for performing SLAM. This not only allows the robot to perform natural interaction with the environment whilst navigating, as is normally seen in nature, but also provides a mechanism to fuse non-unique tactile and unique visual data. Compared to the former works, our approach can handle ambiguous scenes in which one sensor alone is not capable of identifying false-positive loop-closures. △ Less

Submitted 15 August, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

Comments: 7 pages, 10 figures. IEEE Cyborg and Bionic Systems (CBS) 2019

arXiv:1905.13546 [pdf, other]

LeagueAI: Improving object detector performance and flexibility through automatically generated training data and domain randomization

Authors: Oliver Struckmeier

Abstract: In this technical report I present my method for automatic synthetic dataset generation for object detection and demonstrate it on the video game League of Legends. This report furthermore serves as a handbook on how to automatically generate datasets and as an introduction on the dataset generation part of the LeagueAI framework. The LeagueAI framework is a software framework that provides detail… ▽ More In this technical report I present my method for automatic synthetic dataset generation for object detection and demonstrate it on the video game League of Legends. This report furthermore serves as a handbook on how to automatically generate datasets and as an introduction on the dataset generation part of the LeagueAI framework. The LeagueAI framework is a software framework that provides detailed information about the game League of Legends based on the same input a human player would have, namely vision. The framework allows researchers and enthusiasts to develop their own intelligent agents or to extract detailed information about the state of the game. A big problem of machine vision applications usually is the laborious work of gathering large amounts of hand labeled data. Thus, a crucial part of the vision pipeline of the LeagueAI framework, the dataset generation, is presented in this report. The method involves extracting image raw data from the game's 3D models and combining them with the game background to create game-like synthetic images and to generate the corresponding labels automatically. In an experiment I compared a model trained on synthetic data to a model trained on hand labeled data and a model trained on a combined dataset. The model trained on the synthetic data showed higher detection precision on more classes and more reliable tracking performance of the player character. The model trained on the combined dataset did not perform better because of the different formats of the older hand labeled dataset and the synthetic data. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1904.05667 [pdf, other]

ViTa-SLAM: Biologically-Inspired Visuo-Tactile SLAM

Authors: Oliver Struckmeier, Kshitij Tiwari, Martin J. Pearson, Ville Kyrki

Abstract: In this work, we propose a novel, bio-inspired multi-sensory SLAM approach called ViTa-SLAM. Compared to other multisensory SLAM variants, this approach allows for a seamless multi-sensory information fusion whilst naturally interacting with the environment. The algorithm is empirically evaluated in a simulated setting using a biomimetic robot platform called the WhiskEye. Our results show promisi… ▽ More In this work, we propose a novel, bio-inspired multi-sensory SLAM approach called ViTa-SLAM. Compared to other multisensory SLAM variants, this approach allows for a seamless multi-sensory information fusion whilst naturally interacting with the environment. The algorithm is empirically evaluated in a simulated setting using a biomimetic robot platform called the WhiskEye. Our results show promising performance enhancements over existing bio-inspired SLAM approaches in terms of loop-closure detection. △ Less

Submitted 14 May, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

Comments: 2 pages, 5 figures, ICRA 2019 workshop

Showing 1–8 of 8 results for author: Struckmeier, O