-
Social Media Use is Predictable from App Sequences: Using LSTM and Transformer Neural Networks to Model Habitual Behavior
Authors:
Heinrich Peters,
Joseph B. Bayer,
Sandra C. Matz,
Yikun Chi,
Sumer S. Vaid,
Gabriella M. Harari
Abstract:
The present paper introduces a novel approach to studying social media habits through predictive modeling of sequential smartphone user behaviors. While much of the literature on media and technology habits has relied on self-report questionnaires and simple behavioral frequency measures, we examine an important yet understudied aspect of media and technology habits: their embeddedness in repetiti…
▽ More
The present paper introduces a novel approach to studying social media habits through predictive modeling of sequential smartphone user behaviors. While much of the literature on media and technology habits has relied on self-report questionnaires and simple behavioral frequency measures, we examine an important yet understudied aspect of media and technology habits: their embeddedness in repetitive behavioral sequences. Leveraging Long Short-Term Memory (LSTM) and transformer neural networks, we show that (i) social media use is predictable at the within and between-person level and that (ii) there are robust individual differences in the predictability of social media use. We examine the performance of several modeling approaches, including (i) global models trained on the pooled data from all participants, (ii) idiographic person-specific models, and (iii) global models fine-tuned on person-specific data. Neither person-specific modeling nor fine-tuning on person-specific data substantially outperformed the global models, indicating that the global models were able to represent a variety of idiosyncratic behavioral patterns. Additionally, our analyses reveal that the person-level predictability of social media use is not substantially related to the frequency of smartphone use in general or the frequency of social media use, indicating that our approach captures an aspect of habits that is distinct from behavioral frequency. Implications for habit modeling and theoretical development are discussed.
△ Less
Submitted 23 June, 2024; v1 submitted 20 April, 2024;
originally announced April 2024.
-
Modular Graph Extraction for Handwritten Circuit Diagram Images
Authors:
Johannes Bayer,
Leo van Waveren,
Andreas Dengel
Abstract:
As digitization in engineering progressed, circuit diagrams (also referred to as schematics) are typically developed and maintained in computer-aided engineering (CAE) systems, thus allowing for automated verification, simulation and further processing in downstream engineering steps. However, apart from printed legacy schematics, hand-drawn circuit diagrams are still used today in the educational…
▽ More
As digitization in engineering progressed, circuit diagrams (also referred to as schematics) are typically developed and maintained in computer-aided engineering (CAE) systems, thus allowing for automated verification, simulation and further processing in downstream engineering steps. However, apart from printed legacy schematics, hand-drawn circuit diagrams are still used today in the educational domain, where they serve as an easily accessible mean for trainees and students to learn drawing this type of diagrams. Furthermore, hand-drawn schematics are typically used in examinations due to legal constraints. In order to harness the capabilities of digital circuit representations, automated means for extracting the electrical graph from raster graphics are required.
While respective approaches have been proposed in literature, they are typically conducted on small or non-disclosed datasets. This paper describes a modular end-to-end solution on a larger, public dataset, in which approaches for the individual sub-tasks are evaluated to form a new baseline. These sub-tasks include object detection (for electrical symbols and texts), binary segmentation (drafter's stroke vs. background), handwritten character recognition and orientation regression for electrical symbols and texts. Furthermore, computer-vision graph assembly and rectification algorithms are presented. All methods are integrated in a publicly available prototype.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Utilizing dataset affinity prediction in object detection to assess training data
Authors:
Stefan Becker,
Jens Bayer,
Ronny Hug,
Wolfgang Hübner,
Michael Arens
Abstract:
Data pooling offers various advantages, such as increasing the sample size, improving generalization, reducing sampling bias, and addressing data sparsity and quality, but it is not straightforward and may even be counterproductive. Assessing the effectiveness of pooling datasets in a principled manner is challenging due to the difficulty in estimating the overall information content of individual…
▽ More
Data pooling offers various advantages, such as increasing the sample size, improving generalization, reducing sampling bias, and addressing data sparsity and quality, but it is not straightforward and may even be counterproductive. Assessing the effectiveness of pooling datasets in a principled manner is challenging due to the difficulty in estimating the overall information content of individual datasets. Towards this end, we propose incorporating a data source prediction module into standard object detection pipelines. The module runs with minimal overhead during inference time, providing additional information about the data source assigned to individual detections. We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets. The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.
△ Less
Submitted 8 May, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Spot: A Natural Language Interface for Geospatial Searches in OSM
Authors:
Lynn Khellaf,
Ipek Baris Schlicht,
Julia Bayer,
Ruben Bouwmeester,
Tilman Miraß,
Tilman Wagner
Abstract:
Investigative journalists and fact-checkers have found OpenStreetMap (OSM) to be an invaluable resource for their work due to its extensive coverage and intricate details of various locations, which play a crucial role in investigating news scenes. Despite its value, OSM's complexity presents considerable accessibility and usability challenges, especially for those without a technical background.…
▽ More
Investigative journalists and fact-checkers have found OpenStreetMap (OSM) to be an invaluable resource for their work due to its extensive coverage and intricate details of various locations, which play a crucial role in investigating news scenes. Despite its value, OSM's complexity presents considerable accessibility and usability challenges, especially for those without a technical background. To address this, we introduce 'Spot', a user-friendly natural language interface for querying OSM data. Spot utilizes a semantic map** from natural language to OSM tags, leveraging artificially generated sentence queries and a T5 transformer. This approach enables Spot to extract relevant information from user-input sentences and display candidate locations matching the descriptions on a map. To foster collaboration and future advancement, all code and generated data is available as an open-source repository.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Eigenpatches -- Adversarial Patches from Principal Components
Authors:
Jens Bayer,
Stefan Becker,
David Münch,
Michael Arens
Abstract:
Adversarial patches are still a simple yet powerful white box attack that can be used to fool object detectors by suppressing possible detections. The patches of these so-called evasion attacks are computational expensive to produce and require full access to the attacked detector. This paper addresses the problem of computational expensiveness by analyzing 375 generated patches, calculating the p…
▽ More
Adversarial patches are still a simple yet powerful white box attack that can be used to fool object detectors by suppressing possible detections. The patches of these so-called evasion attacks are computational expensive to produce and require full access to the attacked detector. This paper addresses the problem of computational expensiveness by analyzing 375 generated patches, calculating the principal components of these and show, that linear combinations of the resulting "eigenpatches" can be used to fool object detections successfully.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Category Theory in Isabelle/HOL as a Basis for Meta-logical Investigation
Authors:
Jonas Bayer,
Aleksey Gonus,
Christoph Benzmüller,
Dana S. Scott
Abstract:
This paper presents meta-logical investigations based on category theory using the proof assistant Isabelle/HOL. We demonstrate the potential of a free logic based shallow semantic embedding of category theory by providing a formalization of the notion of elementary topoi. Additionally, we formalize symmetrical monoidal closed categories expressing the denotational semantic model of intuitionistic…
▽ More
This paper presents meta-logical investigations based on category theory using the proof assistant Isabelle/HOL. We demonstrate the potential of a free logic based shallow semantic embedding of category theory by providing a formalization of the notion of elementary topoi. Additionally, we formalize symmetrical monoidal closed categories expressing the denotational semantic model of intuitionistic multiplicative linear logic. Next to these meta-logical-investigations, we contribute to building an Isabelle category theory library, with a focus on ease of use in the formalization beyond category theory itself. This work paves the way for future formalizations based on category theory and demonstrates the power of automated reasoning in investigating meta-logical questions.
△ Less
Submitted 16 June, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Filter-Aware Model-Predictive Control
Authors:
Baris Kayalibay,
Atanas Mirchev,
Ahmed Agha,
Patrick van der Smagt,
Justin Bayer
Abstract:
Partially-observable problems pose a trade-off between reducing costs and gathering information. They can be solved optimally by planning in belief space, but that is often prohibitively expensive. Model-predictive control (MPC) takes the alternative approach of using a state estimator to form a belief over the state, and then plan in state space. This ignores potential future observations during…
▽ More
Partially-observable problems pose a trade-off between reducing costs and gathering information. They can be solved optimally by planning in belief space, but that is often prohibitively expensive. Model-predictive control (MPC) takes the alternative approach of using a state estimator to form a belief over the state, and then plan in state space. This ignores potential future observations during planning and, as a result, cannot actively increase or preserve the certainty of its own state estimate. We find a middle-ground between planning in belief space and completely ignoring its dynamics by only reasoning about its future accuracy. Our approach, filter-aware MPC, penalises the loss of information by what we call "trackability", the expected error of the state estimator. We show that model-based simulation allows condensing trackability into a neural network, which allows fast planning. In experiments involving visual navigation, realistic every-day environments and a two-link robot arm, we show that filter-aware MPC vastly improves regular MPC.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Instance Segmentation Based Graph Extraction for Handwritten Circuit Diagram Images
Authors:
Johannes Bayer,
Amit Kumar Roy,
Andreas Dengel
Abstract:
Handwritten circuit diagrams from educational scenarios or historic sources usually exist on analogue media. For deriving their functional principles or flaws automatically, they need to be digitized, extracting their electrical graph. Recently, the base technologies for automated pipelines facilitating this process shifted from computer vision to machine learning. This paper describes an approach…
▽ More
Handwritten circuit diagrams from educational scenarios or historic sources usually exist on analogue media. For deriving their functional principles or flaws automatically, they need to be digitized, extracting their electrical graph. Recently, the base technologies for automated pipelines facilitating this process shifted from computer vision to machine learning. This paper describes an approach for extracting both the electrical components (including their terminals and describing texts) as well their interconnections (including junctions and wire hops) by the means of instance segmentation and keypoint extraction. Consequently, the resulting graph extraction process consists of a simple two-step process of model inference and trivial geometric keypoint matching. The dataset itself, its preparation, model training and post-processing are described and publicly available.
△ Less
Submitted 18 January, 2023; v1 submitted 8 January, 2023;
originally announced January 2023.
-
Study on Domain Name System (DNS) Abuse: Technical Report
Authors:
Jan Bayer,
Yevheniya Nosyk,
Olivier Hureau,
Simon Fernandez,
Ivett Paulovics,
Andrzej Duda,
Maciej Korczyński
Abstract:
A safe and secure Domain Name System (DNS) is of paramount importance for the digital economy and society. Malicious activities on the DNS, generally referred to as "DNS abuse" are frequent and severe problems affecting online security and undermining users' trust in the Internet. The proposed definition of DNS abuse is as follows: Domain Name System (DNS) abuse is any activity that makes use of d…
▽ More
A safe and secure Domain Name System (DNS) is of paramount importance for the digital economy and society. Malicious activities on the DNS, generally referred to as "DNS abuse" are frequent and severe problems affecting online security and undermining users' trust in the Internet. The proposed definition of DNS abuse is as follows: Domain Name System (DNS) abuse is any activity that makes use of domain names or the DNS protocol to carry out harmful or illegal activity. DNS abuse exploits the domain name registration process, the domain name resolution process, or other services associated with the domain name (e.g., shared web hosting service). Notably, we distinguish between: maliciously registered domain names: domain name registered with the malicious intent to carry out harmful or illegal activity compromised domain names: domain name registered by bona fide third-party for legitimate purposes, compromised by malicious actors to carry out harmful and illegal activity. DNS abuse disrupts, damages, or otherwise adversely impacts the DNS and the Internet infrastructure, their users or other persons.
△ Less
Submitted 17 December, 2022;
originally announced December 2022.
-
PRISM: Probabilistic Real-Time Inference in Spatial World Models
Authors:
Atanas Mirchev,
Baris Kayalibay,
Ahmed Agha,
Patrick van der Smagt,
Daniel Cremers,
Justin Bayer
Abstract:
We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model…
▽ More
We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model which combines differentiable rendering and 6-DoF dynamics. Probabilistic inference in this model amounts to simultaneous localisation and map** (SLAM) and is intractable. We use a series of approximations to Bayesian inference to arrive at probabilistic map and state estimates. We take advantage of well-established methods and closed-form updates, preserving accuracy and enabling real-time capability. The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments, with high-speed UAV and handheld camera agents (Blackbird, EuRoC and TUM-RGBD).
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Functional Component Descriptions for Electrical Circuits based on Semantic Technology Reasoning
Authors:
Johannes Bayer,
Mina Karami Zadeh,
Markus Schröder,
Andreas Dengel
Abstract:
Circuit diagrams have been used in electrical engineering for decades to describe the wiring of devices and facilities. They depict electrical components in a symbolic and graph-based manner. While the circuit design is usually performed electronically, there are still legacy paper-based diagrams that require digitization in order to be used in CAE systems. Generally, knowledge on specific circuit…
▽ More
Circuit diagrams have been used in electrical engineering for decades to describe the wiring of devices and facilities. They depict electrical components in a symbolic and graph-based manner. While the circuit design is usually performed electronically, there are still legacy paper-based diagrams that require digitization in order to be used in CAE systems. Generally, knowledge on specific circuits may be lost between engineering projects, making it hard for domain novices to understand a given circuit design. The graph-based nature of these documents can be exploited by semantic technology-based reasoning in order to generate human-understandable descriptions of their functional principles. More precisely, each electrical component (e.g. a diode) of a circuit may be assigned a high-level function label which describes its purpose within the device (e.g. flyback diode for reverse voltage protection). In this paper, forward chaining rules are used for such a generation. The described approach is applicable for both CAE-based circuits as well as raw circuits yielded by an image understanding pipeline. The viability of the approach is demonstrated by application to an existing set of circuits.
△ Less
Submitted 23 June, 2022;
originally announced September 2022.
-
Mathematical Proof Between Generations
Authors:
Jonas Bayer,
Christoph Benzmüller,
Kevin Buzzard,
Marco David,
Leslie Lamport,
Yuri Matiyasevich,
Lawrence Paulson,
Dierk Schleicher,
Benedikt Stock,
Efim Zelmanov
Abstract:
A proof is one of the most important concepts of mathematics. However, there is a striking difference between how a proof is defined in theory and how it is used in practice. This puts the unique status of mathematics as exact science into peril. Now may be the time to reconcile theory and practice, i.e. precision and intuition, through the advent of computer proof assistants. For the most time th…
▽ More
A proof is one of the most important concepts of mathematics. However, there is a striking difference between how a proof is defined in theory and how it is used in practice. This puts the unique status of mathematics as exact science into peril. Now may be the time to reconcile theory and practice, i.e. precision and intuition, through the advent of computer proof assistants. For the most time this has been a topic for experts in specialized communities. However, mathematical proofs have become increasingly sophisticated, stretching the boundaries of what is humanly comprehensible, so that leading mathematicians have asked for formal verification of their proofs. At the same time, major theorems in mathematics have recently been computer-verified by people from outside of these communities, even by beginning students. This article investigates the gap between the different definitions of a proof and possibilities to build bridges. It is written as a polemic or a collage by different members of the communities in mathematics and computer science at different stages of their careers, challenging well-known preconceptions and exploring new perspectives.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Tracking and Planning with Spatial World Models
Authors:
Baris Kayalibay,
Atanas Mirchev,
Patrick van der Smagt,
Justin Bayer
Abstract:
We introduce a method for real-time navigation and tracking with differentiably rendered world models. Learning models for control has led to impressive results in robotics and computer games, but this success has yet to be extended to vision-based navigation. To address this, we transfer advances in the emergent field of differentiable rendering to model-based control. We do this by planning in a…
▽ More
We introduce a method for real-time navigation and tracking with differentiably rendered world models. Learning models for control has led to impressive results in robotics and computer games, but this success has yet to be extended to vision-based navigation. To address this, we transfer advances in the emergent field of differentiable rendering to model-based control. We do this by planning in a learned 3D spatial world model, combined with a pose estimation algorithm previously used in the context of TSDF fusion, but now tailored to our setting and improved to incorporate agent dynamics. We evaluate over six simulated environments based on complex human-designed floor plans and provide quantitative results. We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations under stochastic, continuous dynamics.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
System for multi-robotic exploration of underground environments CTU-CRAS-NORLAB in the DARPA Subterranean Challenge
Authors:
Tomáš Rouček,
Martin Pecka,
Petr Čížek,
Tomáš Petříček,
Jan Bayer,
Vojtěch Šalanský,
Teymur Azayev,
Daniel Heřt,
Matěj Petrlík,
Tomáš Báča,
Vojtěch Spurný,
Vít Krátký,
Pavel Petráček,
Dominic Baril,
Maxime Vaidis,
Vladimír Kubelka,
François Pomerleau,
Jan Faigl,
Karel Zimmermann,
Martin Saska,
Tomáš Svoboda,
Tomáš Krajník
Abstract:
We present a field report of CTU-CRAS-NORLAB team from the Subterranean Challenge (SubT) organised by the Defense Advanced Research Projects Agency (DARPA). The contest seeks to advance technologies that would improve the safety and efficiency of search-and-rescue operations in GPS-denied environments. During the contest rounds, teams of mobile robots have to find specific objects while operating…
▽ More
We present a field report of CTU-CRAS-NORLAB team from the Subterranean Challenge (SubT) organised by the Defense Advanced Research Projects Agency (DARPA). The contest seeks to advance technologies that would improve the safety and efficiency of search-and-rescue operations in GPS-denied environments. During the contest rounds, teams of mobile robots have to find specific objects while operating in environments with limited radio communication, e.g. mining tunnels, underground stations or natural caverns. We present a heterogeneous exploration robotic system of the CTU-CRAS-NORLAB team, which achieved the third rank at the SubT Tunnel and Urban Circuit rounds and surpassed the performance of all other non-DARPA-funded teams. The field report describes the team's hardware, sensors, algorithms and strategies, and discusses the lessons learned by participating at the DARPA SubT contest.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
A Comparison of Deep Saliency Map Generators on Multispectral Data in Object Detection
Authors:
Jens Bayer,
David Münch,
Michael Arens
Abstract:
Deep neural networks, especially convolutional deep neural networks, are state-of-the-art methods to classify, segment or even generate images, movies, or sounds. However, these methods lack of a good semantic understanding of what happens internally. The question, why a COVID-19 detector has classified a stack of lung-ct images as positive, is sometimes more interesting than the overall specifici…
▽ More
Deep neural networks, especially convolutional deep neural networks, are state-of-the-art methods to classify, segment or even generate images, movies, or sounds. However, these methods lack of a good semantic understanding of what happens internally. The question, why a COVID-19 detector has classified a stack of lung-ct images as positive, is sometimes more interesting than the overall specificity and sensitivity. Especially when human domain expert knowledge disagrees with the given output. This way, human domain experts could also be advised to reconsider their choice, regarding the information pointed out by the system. In addition, the deep learning model can be controlled, and a present dataset bias can be found. Currently, most explainable AI methods in the computer vision domain are purely used on image classification, where the images are ordinary images in the visible spectrum. As a result, there is no comparison on how the methods behave with multimodal image data, as well as most methods have not been investigated on how they behave when used for object detection. This work tries to close the gaps. Firstly, investigating three saliency map generator methods on how their maps differ across the different spectra. This is achieved via accurate and systematic training. Secondly, we examine how they behave when used for object detection. As a practical problem, we chose object detection in the infrared and visual spectrum for autonomous driving. The dataset used in this work is the Multispectral Object Detection Dataset, where each scene is available in the FIR, MIR and NIR as well as visual spectrum. The results show that there are differences between the infrared and visual activation maps. Further, an advanced training with both, the infrared and visual data not only improves the network's output, it also leads to more focused spots in the saliency maps.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
A Public Ground-Truth Dataset for Handwritten Circuit Diagram Images
Authors:
Felix Thoma,
Johannes Bayer,
Yakun Li
Abstract:
The development of digitization methods for line drawings (especially in the area of electrical engineering) relies on the availability of publicly available training and evaluation data. This paper presents such an image set along with annotations. The dataset consists of 1152 images of 144 circuits by 12 drafters and 48 563 annotations. Each of these images depicts an electrical circuit diagram,…
▽ More
The development of digitization methods for line drawings (especially in the area of electrical engineering) relies on the availability of publicly available training and evaluation data. This paper presents such an image set along with annotations. The dataset consists of 1152 images of 144 circuits by 12 drafters and 48 563 annotations. Each of these images depicts an electrical circuit diagram, taken by consumer grade cameras under varying lighting conditions and perspectives. A variety of different pencil types and surface materials has been used. For each image, all individual electrical components are annotated with bounding boxes and one out of 45 class labels. In order to simplify a graph extraction process, different helper symbols like junction points and crossovers are introduced, while texts are annotated as well. The geometric and taxonomic problems arising from this task as well as the classes themselves and statistics of their appearances are stated. The performance of a standard Faster RCNN on the dataset is provided as an object detection baseline.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
Beginners' Quest to Formalize Mathematics: A Feasibility Study in Isabelle
Authors:
Jonas Bayer,
Marco David,
Abhik Pal,
Benedikt Stock
Abstract:
How difficult are interactive theorem provers to use? We respond by reviewing the formalization of Hilbert's tenth problem in Isabelle/HOL carried out by an undergraduate research group at Jacobs University Bremen. We argue that, as demonstrated by our example, proof assistants are feasible for beginners to formalize mathematics. With the aim to make the field more accessible, we also survey hurdl…
▽ More
How difficult are interactive theorem provers to use? We respond by reviewing the formalization of Hilbert's tenth problem in Isabelle/HOL carried out by an undergraduate research group at Jacobs University Bremen. We argue that, as demonstrated by our example, proof assistants are feasible for beginners to formalize mathematics. With the aim to make the field more accessible, we also survey hurdles that arise when learning an interactive theorem prover. Broadly, we advocate for an increased adoption of interactive theorem provers in mathematical research and curricula.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable Models
Authors:
Justin Bayer,
Maximilian Soelch,
Atanas Mirchev,
Baris Kayalibay,
Patrick van der Smagt
Abstract:
Amortised inference enables scalable learning of sequential latent-variable models (LVMs) with the evidence lower bound (ELBO). In this setting, variational posteriors are often only partially conditioned. While the true posteriors depend, e.g., on the entire sequence of observations, approximate posteriors are only informed by past observations. This mimics the Bayesian filter -- a mixture of smo…
▽ More
Amortised inference enables scalable learning of sequential latent-variable models (LVMs) with the evidence lower bound (ELBO). In this setting, variational posteriors are often only partially conditioned. While the true posteriors depend, e.g., on the entire sequence of observations, approximate posteriors are only informed by past observations. This mimics the Bayesian filter -- a mixture of smoothing posteriors. Yet, we show that the ELBO objective forces partially-conditioned amortised posteriors to approximate products of smoothing posteriors instead. Consequently, the learned generative model is compromised. We demonstrate these theoretical findings in three scenarios: traffic flow, handwritten digits, and aerial vehicle dynamics. Using fully-conditioned approximate posteriors, performance improves in terms of generative modelling and multi-step prediction.
△ Less
Submitted 17 March, 2021; v1 submitted 18 January, 2021;
originally announced January 2021.
-
Variational State-Space Models for Localisation and Dense 3D Map** in 6 DoF
Authors:
Atanas Mirchev,
Baris Kayalibay,
Patrick van der Smagt,
Justin Bayer
Abstract:
We solve the problem of 6-DoF localisation and 3D dense reconstruction in spatial environments as approximate Bayesian inference in a deep state-space model. Our approach leverages both learning and domain knowledge from multiple-view geometry and rigid-body dynamics. This results in an expressive predictive model of the world, often missing in current state-of-the-art visual SLAM solutions. The c…
▽ More
We solve the problem of 6-DoF localisation and 3D dense reconstruction in spatial environments as approximate Bayesian inference in a deep state-space model. Our approach leverages both learning and domain knowledge from multiple-view geometry and rigid-body dynamics. This results in an expressive predictive model of the world, often missing in current state-of-the-art visual SLAM solutions. The combination of variational inference, neural networks and a differentiable raycaster ensures that our model is amenable to end-to-end gradient-based optimisation. We evaluate our approach on realistic unmanned aerial vehicle flight data, nearing the performance of state-of-the-art visual-inertial odometry systems. We demonstrate the applicability of the model to generative prediction and planning.
△ Less
Submitted 15 March, 2021; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Image-based OoD-Detector Principles on Graph-based Input Data in Human Action Recognition
Authors:
Jens Bayer,
David Münch,
Michael Arens
Abstract:
Living in a complex world like ours makes it unacceptable that a practical implementation of a machine learning system assumes a closed world. Therefore, it is necessary for such a learning-based system in a real world environment, to be aware of its own capabilities and limits and to be able to distinguish between confident and unconfident results of the inference, especially if the sample cannot…
▽ More
Living in a complex world like ours makes it unacceptable that a practical implementation of a machine learning system assumes a closed world. Therefore, it is necessary for such a learning-based system in a real world environment, to be aware of its own capabilities and limits and to be able to distinguish between confident and unconfident results of the inference, especially if the sample cannot be explained by the underlying distribution. This knowledge is particularly essential in safety-critical environments and tasks e.g. self-driving cars or medical applications. Towards this end, we transfer image-based Out-of-Distribution (OoD)-methods to graph-based data and show the applicability in action recognition. The contribution of this work is (i) the examination of the portability of recent image-based OoD-detectors for graph-based input data, (ii) a Metric Learning-based approach to detect OoD-samples, and (iii) the introduction of a novel semi-synthetic action recognition dataset. The evaluation shows that image-based OoD-methods can be applied to graph-based data. Additionally, there is a gap between the performance on intraclass and intradataset results. First methods as the examined baseline or ODIN provide reasonable results. More sophisticated network architectures - in contrast to their image-based application - were surpassed in the intradataset comparison and even lead to less classification accuracy.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Learning Flat Latent Manifolds with VAEs
Authors:
Nutan Chen,
Alexej Klushyn,
Francesco Ferroni,
Justin Bayer,
Patrick van der Smagt
Abstract:
Measuring the similarity between data points often requires domain knowledge, which can in parts be compensated by relying on unsupervised methods such as latent-variable models, where similarity/distance is estimated in a more compact latent space. Prevalent is the use of the Euclidean metric, which has the drawback of ignoring information about similarity of data stored in the decoder, as captur…
▽ More
Measuring the similarity between data points often requires domain knowledge, which can in parts be compensated by relying on unsupervised methods such as latent-variable models, where similarity/distance is estimated in a more compact latent space. Prevalent is the use of the Euclidean metric, which has the drawback of ignoring information about similarity of data stored in the decoder, as captured by the framework of Riemannian geometry. We propose an extension to the framework of variational auto-encoders allows learning flat latent manifolds, where the Euclidean metric is a proxy for the similarity between data points. This is achieved by defining the latent space as a Riemannian manifold and by regularising the metric tensor to be a scaled identity matrix. Additionally, we replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one---and formulate the learning problem as a constrained optimisation problem. We evaluate our method on a range of data-sets, including a video-tracking benchmark, where the performance of our unsupervised approach nears that of state-of-the-art supervised approaches, while retaining the computational efficiency of straight-line-based approaches.
△ Less
Submitted 12 August, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Variational Tracking and Prediction with Generative Disentangled State-Space Models
Authors:
Adnan Akhundov,
Maximilian Soelch,
Justin Bayer,
Patrick van der Smagt
Abstract:
We address tracking and prediction of multiple moving objects in visual data streams as inference and sampling in a disentangled latent state-space model. By encoding objects separately and including explicit position information in the latent state space, we perform tracking via amortized variational Bayesian inference of the respective latent positions. Inference is implemented in a modular neur…
▽ More
We address tracking and prediction of multiple moving objects in visual data streams as inference and sampling in a disentangled latent state-space model. By encoding objects separately and including explicit position information in the latent state space, we perform tracking via amortized variational Bayesian inference of the respective latent positions. Inference is implemented in a modular neural framework tailored towards our disentangled latent space. Generative and inference model are jointly learned from observations only. Comparing to related prior work, we empirically show that our Markovian state-space assumption enables faithful and much improved long-term prediction well beyond the training horizon. Further, our inference model correctly decomposes frames into objects, even in the presence of occlusions. Tracking performance is increased significantly over prior art.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
Increasing the Generalisation Capacity of Conditional VAEs
Authors:
Alexej Klushyn,
Nutan Chen,
Botond Cseke,
Justin Bayer,
Patrick van der Smagt
Abstract:
We address the problem of one-to-many map**s in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation c…
▽ More
We address the problem of one-to-many map**s in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation capacity of conditional variational autoencoders. To this end, we modify the latent variable model by defining the likelihood as a function of the latent variable only and introduce an expressive multimodal prior to enable the model for capturing semantically meaningful features of the data. To validate our approach, we train our model on the Cornell Robot Gras** dataset, and modified versions of MNIST and Fashion-MNIST obtaining results that show a significantly higher generalisation capability.
△ Less
Submitted 10 September, 2019; v1 submitted 23 August, 2019;
originally announced August 2019.
-
On Deep Set Learning and the Choice of Aggregations
Authors:
Maximilian Soelch,
Adnan Akhundov,
Patrick van der Smagt,
Justin Bayer
Abstract:
Recently, it has been shown that many functions on sets can be represented by sum decompositions. These decompositons easily lend themselves to neural approximations, extending the applicability of neural nets to set-valued inputs---Deep Set learning. This work investigates a core component of Deep Set architecture: aggregation functions. We suggest and examine alternatives to commonly used aggreg…
▽ More
Recently, it has been shown that many functions on sets can be represented by sum decompositions. These decompositons easily lend themselves to neural approximations, extending the applicability of neural nets to set-valued inputs---Deep Set learning. This work investigates a core component of Deep Set architecture: aggregation functions. We suggest and examine alternatives to commonly used aggregation functions, including learnable recurrent aggregation functions. Empirically, we show that the Deep Set networks are highly sensitive to the choice of aggregation functions: beyond improved performance, we find that learnable aggregations lower hyper-parameter sensitivity and generalize better to out-of-distribution input size.
△ Less
Submitted 8 April, 2020; v1 submitted 18 March, 2019;
originally announced March 2019.
-
Bayesian Learning of Neural Network Architectures
Authors:
Georgi Dikov,
Patrick van der Smagt,
Justin Bayer
Abstract:
In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The pro…
▽ More
In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus kee** the computational overhead at minimum.
△ Less
Submitted 27 January, 2019; v1 submitted 14 January, 2019;
originally announced January 2019.
-
Fast Approximate Geodesics for Deep Generative Models
Authors:
Nutan Chen,
Francesco Ferroni,
Alexej Klushyn,
Alexandros Paraschos,
Justin Bayer,
Patrick van der Smagt
Abstract:
The length of the geodesic between two data points along a Riemannian manifold, induced by a deep generative model, yields a principled measure of similarity. Current approaches are limited to low-dimensional latent spaces, due to the computational complexity of solving a non-convex optimisation problem. We propose finding shortest paths in a finite graph of samples from the aggregate approximate…
▽ More
The length of the geodesic between two data points along a Riemannian manifold, induced by a deep generative model, yields a principled measure of similarity. Current approaches are limited to low-dimensional latent spaces, due to the computational complexity of solving a non-convex optimisation problem. We propose finding shortest paths in a finite graph of samples from the aggregate approximate posterior, that can be solved exactly, at greatly reduced runtime, and without a notable loss in quality. Our approach, therefore, is hence applicable to high-dimensional problems, e.g., in the visual domain. We validate our approach empirically on a series of experiments using variational autoencoders applied to image data, including the Chair, FashionMNIST, and human movement data sets.
△ Less
Submitted 23 May, 2019; v1 submitted 19 December, 2018;
originally announced December 2018.
-
Approximate Bayesian inference in spatial environments
Authors:
Atanas Mirchev,
Baris Kayalibay,
Maximilian Soelch,
Patrick van der Smagt,
Justin Bayer
Abstract:
Model-based approaches bear great promise for decision making of agents interacting with the physical world. In the context of spatial environments, different types of problems such as localisation, map**, navigation or autonomous exploration are typically adressed with specialised methods, often relying on detailed knowledge of the system at hand. We express these tasks as probabilistic inferen…
▽ More
Model-based approaches bear great promise for decision making of agents interacting with the physical world. In the context of spatial environments, different types of problems such as localisation, map**, navigation or autonomous exploration are typically adressed with specialised methods, often relying on detailed knowledge of the system at hand. We express these tasks as probabilistic inference and planning under the umbrella of deep sequential generative models. Using the frameworks of variational inference and neural networks, our method inherits favourable properties such as flexibility, scalability and the ability to learn from data. The method performs comparably to specialised state-of-the-art methodology in two distinct simulated environments.
△ Less
Submitted 20 June, 2019; v1 submitted 18 May, 2018;
originally announced May 2018.
-
Metrics for Deep Generative Models
Authors:
Nutan Chen,
Alexej Klushyn,
Richard Kurle,
Xueyan Jiang,
Justin Bayer,
Patrick van der Smagt
Abstract:
Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source---the latent space---to samples from a more complex distribution represented by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training crite…
▽ More
Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source---the latent space---to samples from a more complex distribution represented by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training criterions of VAEs and GANs will make the latent space densely covered. Consequently points that are separated by low-density regions in observation space will be pushed together in latent space, making stationary distances poor proxies for similarity. We transfer ideas from Riemannian geometry to this setting, letting the distance between two points be the shortest path on a Riemannian manifold induced by the transformation. The method yields a principled distance measure, provides a tool for visual inspection of deep generative models, and an alternative to linear interpolation in latent space. In addition, it can be applied for robot movement generalization using previously learned skills. The method is evaluated on a synthetic dataset with known ground truth; on a simulated robot arm dataset; on human motion capture data; and on a generative model of handwritten digits.
△ Less
Submitted 8 February, 2018; v1 submitted 3 November, 2017;
originally announced November 2017.
-
Unsupervised preprocessing for Tactile Data
Authors:
Maximilian Karl,
Justin Bayer,
Patrick van der Smagt
Abstract:
Tactile information is important for grip**, stable grasp, and in-hand manipulation, yet the complexity of tactile data prevents widespread use of such sensors. We make use of an unsupervised learning algorithm that transforms the complex tactile data into a compact, latent representation without the need to record ground truth reference data. These compact representations can either be used dir…
▽ More
Tactile information is important for grip**, stable grasp, and in-hand manipulation, yet the complexity of tactile data prevents widespread use of such sensors. We make use of an unsupervised learning algorithm that transforms the complex tactile data into a compact, latent representation without the need to record ground truth reference data. These compact representations can either be used directly in a reinforcement learning based controller or can be used to calibrate the tactile sensor to physical quantities with only a few datapoints. We show the quality of our latent representation by predicting important features and with a simple control task.
△ Less
Submitted 23 June, 2016;
originally announced June 2016.
-
ML-based tactile sensor calibration: A universal approach
Authors:
Maximilian Karl,
Artur Lohrer,
Dhananjay Shah,
Frederik Diehl,
Max Fiedler,
Saahil Ognawala,
Justin Bayer,
Patrick van der Smagt
Abstract:
We study the responses of two tactile sensors, the fingertip sensor from the iCub and the BioTac under different external stimuli. The question of interest is to which degree both sensors i) allow the estimation of force exerted on the sensor and ii) enable the recognition of differing degrees of curvature. Making use of a force controlled linear motor affecting the tactile sensors we acquire seve…
▽ More
We study the responses of two tactile sensors, the fingertip sensor from the iCub and the BioTac under different external stimuli. The question of interest is to which degree both sensors i) allow the estimation of force exerted on the sensor and ii) enable the recognition of differing degrees of curvature. Making use of a force controlled linear motor affecting the tactile sensors we acquire several high-quality data sets allowing the study of both sensors under exactly the same conditions. We also examined the structure of the representation of tactile stimuli in the recorded tactile sensor data using t-SNE embeddings. The experiments show that both the iCub and the BioTac excel in different settings.
△ Less
Submitted 21 June, 2016;
originally announced June 2016.
-
Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data
Authors:
Maximilian Karl,
Maximilian Soelch,
Justin Bayer,
Patrick van der Smagt
Abstract:
We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and identification of latent Markovian state space models. Leveraging recent advances in Stochastic Gradient Variational Bayes, DVBF can overcome intractable inference distributions via variational inference. Thus, it can handle highly nonlinear input data with temporal and spatial dependencies such as image…
▽ More
We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and identification of latent Markovian state space models. Leveraging recent advances in Stochastic Gradient Variational Bayes, DVBF can overcome intractable inference distributions via variational inference. Thus, it can handle highly nonlinear input data with temporal and spatial dependencies such as image sequences without domain knowledge. Our experiments show that enabling backpropagation through transitions enforces state space assumptions and significantly improves information content of the latent embedding. This also enables realistic long-term prediction.
△ Less
Submitted 3 March, 2017; v1 submitted 20 May, 2016;
originally announced May 2016.
-
Theano: A Python framework for fast computation of mathematical expressions
Authors:
The Theano Development Team,
Rami Al-Rfou,
Guillaume Alain,
Amjad Almahairi,
Christof Angermueller,
Dzmitry Bahdanau,
Nicolas Ballas,
Frédéric Bastien,
Justin Bayer,
Anatoly Belikov,
Alexander Belopolsky,
Yoshua Bengio,
Arnaud Bergeron,
James Bergstra,
Valentin Bisson,
Josh Bleecher Snyder,
Nicolas Bouchard,
Nicolas Boulanger-Lewandowski,
Xavier Bouthillier,
Alexandre de Brébisson,
Olivier Breuleux,
Pierre-Luc Carrier,
Kyunghyun Cho,
Jan Chorowski,
Paul Christiano
, et al. (88 additional authors not shown)
Abstract:
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu…
▽ More
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models.
The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
△ Less
Submitted 9 May, 2016;
originally announced May 2016.
-
Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series
Authors:
Maximilian Soelch,
Justin Bayer,
Marvin Ludersdorfer,
Patrick van der Smagt
Abstract:
Approximate variational inference has shown to be a powerful tool for modeling unknown complex probability distributions. Recent advances in the field allow us to learn probabilistic models of sequences that actively exploit spatial and temporal structure. We apply a Stochastic Recurrent Network (STORN) to learn robot time series data. Our evaluation demonstrates that we can robustly detect anomal…
▽ More
Approximate variational inference has shown to be a powerful tool for modeling unknown complex probability distributions. Recent advances in the field allow us to learn probabilistic models of sequences that actively exploit spatial and temporal structure. We apply a Stochastic Recurrent Network (STORN) to learn robot time series data. Our evaluation demonstrates that we can robustly detect anomalies both off- and on-line.
△ Less
Submitted 14 June, 2016; v1 submitted 23 February, 2016;
originally announced February 2016.
-
Efficient Empowerment
Authors:
Maximilian Karl,
Justin Bayer,
Patrick van der Smagt
Abstract:
Empowerment quantifies the influence an agent has on its environment. This is formally achieved by the maximum of the expected KL-divergence between the distribution of the successor state conditioned on a specific action and a distribution where the actions are marginalised out. This is a natural candidate for an intrinsic reward signal in the context of reinforcement learning: the agent will pla…
▽ More
Empowerment quantifies the influence an agent has on its environment. This is formally achieved by the maximum of the expected KL-divergence between the distribution of the successor state conditioned on a specific action and a distribution where the actions are marginalised out. This is a natural candidate for an intrinsic reward signal in the context of reinforcement learning: the agent will place itself in a situation where its action have maximum stability and maximum influence on the future. The limiting factor so far has been the computational complexity of the method: the only way of calculation has so far been a brute force algorithm, reducing the applicability of the method to environments with a small set discrete states. In this work, we propose to use an efficient approximation for marginalising out the actions in the case of continuous environments. This allows fast evaluation of empowerment, paving the way towards challenging environments such as real world robotics. The method is presented on a pendulum swing up problem.
△ Less
Submitted 28 September, 2015;
originally announced September 2015.
-
Fast Adaptive Weight Noise
Authors:
Justin Bayer,
Maximilian Karl,
Daniela Korhammer,
Patrick van der Smagt
Abstract:
Marginalising out uncertain quantities within the internal representations or parameters of neural networks is of central importance for a wide range of learning techniques, such as empirical, variational or full Bayesian methods. We set out to generalise fast dropout (Wang & Manning, 2013) to cover a wider variety of noise processes in neural networks. This leads to an efficient calculation of th…
▽ More
Marginalising out uncertain quantities within the internal representations or parameters of neural networks is of central importance for a wide range of learning techniques, such as empirical, variational or full Bayesian methods. We set out to generalise fast dropout (Wang & Manning, 2013) to cover a wider variety of noise processes in neural networks. This leads to an efficient calculation of the marginal likelihood and predictive distribution which evades sampling and the consequential increase in training time due to highly variant gradient estimates. This allows us to approximate variational Bayes for the parameters of feed-forward neural networks. Inspired by the minimum description length principle, we also propose and experimentally verify the direct optimisation of the regularised predictive distribution. The methods yield results competitive with previous neural network based approaches and Gaussian processes on a wide range of regression tasks.
△ Less
Submitted 19 July, 2015;
originally announced July 2015.
-
Learning Stochastic Recurrent Networks
Authors:
Justin Bayer,
Christian Osendorfer
Abstract:
Leveraging advances in variational inference, we propose to enhance recurrent neural networks with latent variables, resulting in Stochastic Recurrent Networks (STORNs). The model i) can be trained with stochastic gradient methods, ii) allows structured and multi-modal conditionals at each time step, iii) features a reliable estimator of the marginal likelihood and iv) is a generalisation of deter…
▽ More
Leveraging advances in variational inference, we propose to enhance recurrent neural networks with latent variables, resulting in Stochastic Recurrent Networks (STORNs). The model i) can be trained with stochastic gradient methods, ii) allows structured and multi-modal conditionals at each time step, iii) features a reliable estimator of the marginal likelihood and iv) is a generalisation of deterministic recurrent neural networks. We evaluate the method on four polyphonic musical data sets and motion capture data.
△ Less
Submitted 5 March, 2015; v1 submitted 27 November, 2014;
originally announced November 2014.
-
Regularizing Recurrent Networks - On Injected Noise and Norm-based Methods
Authors:
Saahil Ognawala,
Justin Bayer
Abstract:
Advancements in parallel processing have lead to a surge in multilayer perceptrons' (MLP) applications and deep learning in the past decades. Recurrent Neural Networks (RNNs) give additional representational power to feedforward MLPs by providing a way to treat sequential data. However, RNNs are hard to train using conventional error backpropagation methods because of the difficulty in relating in…
▽ More
Advancements in parallel processing have lead to a surge in multilayer perceptrons' (MLP) applications and deep learning in the past decades. Recurrent Neural Networks (RNNs) give additional representational power to feedforward MLPs by providing a way to treat sequential data. However, RNNs are hard to train using conventional error backpropagation methods because of the difficulty in relating inputs over many time-steps. Regularization approaches from MLP sphere, like dropout and noisy weight training, have been insufficiently applied and tested on simple RNNs. Moreover, solutions have been proposed to improve convergence in RNNs but not enough to improve the long term dependency remembering capabilities thereof.
In this study, we aim to empirically evaluate the remembering and generalization ability of RNNs on polyphonic musical datasets. The models are trained with injected noise, random dropout, norm-based regularizers and their respective performances compared to well-initialized plain RNNs and advanced regularization methods like fast-dropout. We conclude with evidence that training with noise does not improve performance as conjectured by a few works in RNN optimization before ours.
△ Less
Submitted 21 October, 2014;
originally announced October 2014.
-
Variational inference of latent state sequences using Recurrent Networks
Authors:
Justin Bayer,
Christian Osendorfer
Abstract:
Recent advances in the estimation of deep directed graphical models and recurrent networks let us contribute to the removal of a blind spot in the area of probabilistc modelling of time series. The proposed methods i) can infer distributed latent state-space trajectories with nonlinear transitions, ii) scale to large data sets thanks to the use of a stochastic objective and fast, approximate infer…
▽ More
Recent advances in the estimation of deep directed graphical models and recurrent networks let us contribute to the removal of a blind spot in the area of probabilistc modelling of time series. The proposed methods i) can infer distributed latent state-space trajectories with nonlinear transitions, ii) scale to large data sets thanks to the use of a stochastic objective and fast, approximate inference, iii) enable the design of rich emission models which iv) will naturally lead to structured outputs. Two different paths of introducing latent state sequences are pursued, leading to the variational recurrent auto encoder (VRAE) and the variational one step predictor (VOSP). The use of independent Wiener processes as priors on the latent state sequence is a viable compromise between efficient computation of the Kullback-Leibler divergence from the variational approximation of the posterior and maintaining a reasonable belief in the dynamics. We verify our methods empirically, obtaining results close or superior to the state of the art. We also show qualitative results for denoising and missing value imputation.
△ Less
Submitted 30 September, 2014; v1 submitted 6 June, 2014;
originally announced June 2014.
-
On Fast Dropout and its Applicability to Recurrent Networks
Authors:
Justin Bayer,
Christian Osendorfer,
Daniela Korhammer,
Nutan Chen,
Sebastian Urban,
Patrick van der Smagt
Abstract:
Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout,…
▽ More
Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout, a recent regularization method for generalized linear models and neural networks from a back-propagation inspired perspective. We show that fast dropout implements a quadratic form of an adaptive, per-parameter regularizer, which rewards large weights in the light of underfitting, penalizes them for overconfident predictions and vanishes at minima of an unregularized training loss. The derivatives of that regularizer are exclusively based on the training error signal. One consequence of this is the absense of a global weight attractor, which is particularly appealing for RNNs, since the dynamics are not biased towards a certain regime. We positively test the hypothesis that this improves the performance of RNNs on four musical data sets.
△ Less
Submitted 5 March, 2014; v1 submitted 4 November, 2013;
originally announced November 2013.
-
Convolutional Neural Networks learn compact local image descriptors
Authors:
Christian Osendorfer,
Justin Bayer,
Patrick van der Smagt
Abstract:
A standard deep convolutional neural network paired with a suitable loss function learns compact local image descriptors that perform comparably to state-of-the art approaches.
A standard deep convolutional neural network paired with a suitable loss function learns compact local image descriptors that perform comparably to state-of-the art approaches.
△ Less
Submitted 2 June, 2013; v1 submitted 30 April, 2013;
originally announced April 2013.
-
Unsupervised Feature Learning for low-level Local Image Descriptors
Authors:
Christian Osendorfer,
Justin Bayer,
Sebastian Urban,
Patrick van der Smagt
Abstract:
Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, it…
▽ More
Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, it has never been \emph{quantitatively} investigated yet how well unsupervised learning methods can find \emph{low-level representations} for image patches without any additional supervision. In this paper we examine the performance of pure unsupervised methods on a low-level correspondence task, a problem that is central to many Computer Vision applications. We find that a special type of Restricted Boltzmann Machines (RBMs) performs comparably to hand-crafted descriptors. Additionally, a simple binarization scheme produces compact representations that perform better than several state-of-the-art descriptors.
△ Less
Submitted 25 April, 2013; v1 submitted 13 January, 2013;
originally announced January 2013.
-
Learning Sequence Neighbourhood Metrics
Authors:
Justin Bayer,
Christian Osendorfer,
Patrick van der Smagt
Abstract:
Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood components analysis (NCA) objective function are able to detect the characterizing dynamics of sequences and embed them into a fixed-length vector space of arbitrary dimensionality. Subsequently, the resulting features are meaningful and can be used for visualization or nearest neighbour classification in…
▽ More
Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood components analysis (NCA) objective function are able to detect the characterizing dynamics of sequences and embed them into a fixed-length vector space of arbitrary dimensionality. Subsequently, the resulting features are meaningful and can be used for visualization or nearest neighbour classification in linear time. This kind of metric learning for sequential data enables the use of algorithms tailored towards fixed length vector spaces such as R^n.
△ Less
Submitted 22 August, 2013; v1 submitted 9 September, 2011;
originally announced September 2011.