-
Theano: A Python framework for fast computation of mathematical expressions
Authors:
The Theano Development Team,
Rami Al-Rfou,
Guillaume Alain,
Amjad Almahairi,
Christof Angermueller,
Dzmitry Bahdanau,
Nicolas Ballas,
Frédéric Bastien,
Justin Bayer,
Anatoly Belikov,
Alexander Belopolsky,
Yoshua Bengio,
Arnaud Bergeron,
James Bergstra,
Valentin Bisson,
Josh Bleecher Snyder,
Nicolas Bouchard,
Nicolas Boulanger-Lewandowski,
Xavier Bouthillier,
Alexandre de Brébisson,
Olivier Breuleux,
Pierre-Luc Carrier,
Kyunghyun Cho,
Jan Chorowski,
Paul Christiano
, et al. (88 additional authors not shown)
Abstract:
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu…
▽ More
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models.
The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
△ Less
Submitted 9 May, 2016;
originally announced May 2016.
-
Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series
Authors:
Maximilian Soelch,
Justin Bayer,
Marvin Ludersdorfer,
Patrick van der Smagt
Abstract:
Approximate variational inference has shown to be a powerful tool for modeling unknown complex probability distributions. Recent advances in the field allow us to learn probabilistic models of sequences that actively exploit spatial and temporal structure. We apply a Stochastic Recurrent Network (STORN) to learn robot time series data. Our evaluation demonstrates that we can robustly detect anomal…
▽ More
Approximate variational inference has shown to be a powerful tool for modeling unknown complex probability distributions. Recent advances in the field allow us to learn probabilistic models of sequences that actively exploit spatial and temporal structure. We apply a Stochastic Recurrent Network (STORN) to learn robot time series data. Our evaluation demonstrates that we can robustly detect anomalies both off- and on-line.
△ Less
Submitted 14 June, 2016; v1 submitted 23 February, 2016;
originally announced February 2016.
-
Strong suppression of shot noise in a feedback-controlled single-electron transistor
Authors:
T. Wagner,
P. Strasberg,
J. C. Bayer,
E. P. Rugeramigabo,
T. Brandes,
R. J. Haug
Abstract:
Feedback control of quantum mechanical systems is rapidly attracting attention not only due to fundamental questions about quantum measurements but also because of its novel applications in many fields in physics. Quantum control has been studied intensively in quantum optics but recently progress has been made in the control of solid-state qubits as well. In quantum transport only a few active an…
▽ More
Feedback control of quantum mechanical systems is rapidly attracting attention not only due to fundamental questions about quantum measurements but also because of its novel applications in many fields in physics. Quantum control has been studied intensively in quantum optics but recently progress has been made in the control of solid-state qubits as well. In quantum transport only a few active and passive feedback experiments have been realized on the level of single-electrons, though theoretical proposals exist. Here we demonstrate the suppression of shot noise in a single-electron transistor, using an exclusively electronic closed-loop feedback to monitor and adjust the counting statistics. With increasing feedback response we observe a stronger suppression and faster freezing of charge current fluctuations. Our technique is analog to the generation of squeezed light with in-loop photodetection as used in quantum optics. Sub-Poisson single-electron sources will pave the way for high precision measurements in quantum transport similar to its optical or opto-mechanical equivalent.
△ Less
Submitted 13 November, 2017; v1 submitted 17 February, 2016;
originally announced February 2016.
-
Efficient Empowerment
Authors:
Maximilian Karl,
Justin Bayer,
Patrick van der Smagt
Abstract:
Empowerment quantifies the influence an agent has on its environment. This is formally achieved by the maximum of the expected KL-divergence between the distribution of the successor state conditioned on a specific action and a distribution where the actions are marginalised out. This is a natural candidate for an intrinsic reward signal in the context of reinforcement learning: the agent will pla…
▽ More
Empowerment quantifies the influence an agent has on its environment. This is formally achieved by the maximum of the expected KL-divergence between the distribution of the successor state conditioned on a specific action and a distribution where the actions are marginalised out. This is a natural candidate for an intrinsic reward signal in the context of reinforcement learning: the agent will place itself in a situation where its action have maximum stability and maximum influence on the future. The limiting factor so far has been the computational complexity of the method: the only way of calculation has so far been a brute force algorithm, reducing the applicability of the method to environments with a small set discrete states. In this work, we propose to use an efficient approximation for marginalising out the actions in the case of continuous environments. This allows fast evaluation of empowerment, paving the way towards challenging environments such as real world robotics. The method is presented on a pendulum swing up problem.
△ Less
Submitted 28 September, 2015;
originally announced September 2015.
-
Fast Adaptive Weight Noise
Authors:
Justin Bayer,
Maximilian Karl,
Daniela Korhammer,
Patrick van der Smagt
Abstract:
Marginalising out uncertain quantities within the internal representations or parameters of neural networks is of central importance for a wide range of learning techniques, such as empirical, variational or full Bayesian methods. We set out to generalise fast dropout (Wang & Manning, 2013) to cover a wider variety of noise processes in neural networks. This leads to an efficient calculation of th…
▽ More
Marginalising out uncertain quantities within the internal representations or parameters of neural networks is of central importance for a wide range of learning techniques, such as empirical, variational or full Bayesian methods. We set out to generalise fast dropout (Wang & Manning, 2013) to cover a wider variety of noise processes in neural networks. This leads to an efficient calculation of the marginal likelihood and predictive distribution which evades sampling and the consequential increase in training time due to highly variant gradient estimates. This allows us to approximate variational Bayes for the parameters of feed-forward neural networks. Inspired by the minimum description length principle, we also propose and experimentally verify the direct optimisation of the regularised predictive distribution. The methods yield results competitive with previous neural network based approaches and Gaussian processes on a wide range of regression tasks.
△ Less
Submitted 19 July, 2015;
originally announced July 2015.
-
Learning Stochastic Recurrent Networks
Authors:
Justin Bayer,
Christian Osendorfer
Abstract:
Leveraging advances in variational inference, we propose to enhance recurrent neural networks with latent variables, resulting in Stochastic Recurrent Networks (STORNs). The model i) can be trained with stochastic gradient methods, ii) allows structured and multi-modal conditionals at each time step, iii) features a reliable estimator of the marginal likelihood and iv) is a generalisation of deter…
▽ More
Leveraging advances in variational inference, we propose to enhance recurrent neural networks with latent variables, resulting in Stochastic Recurrent Networks (STORNs). The model i) can be trained with stochastic gradient methods, ii) allows structured and multi-modal conditionals at each time step, iii) features a reliable estimator of the marginal likelihood and iv) is a generalisation of deterministic recurrent neural networks. We evaluate the method on four polyphonic musical data sets and motion capture data.
△ Less
Submitted 5 March, 2015; v1 submitted 27 November, 2014;
originally announced November 2014.
-
Regularizing Recurrent Networks - On Injected Noise and Norm-based Methods
Authors:
Saahil Ognawala,
Justin Bayer
Abstract:
Advancements in parallel processing have lead to a surge in multilayer perceptrons' (MLP) applications and deep learning in the past decades. Recurrent Neural Networks (RNNs) give additional representational power to feedforward MLPs by providing a way to treat sequential data. However, RNNs are hard to train using conventional error backpropagation methods because of the difficulty in relating in…
▽ More
Advancements in parallel processing have lead to a surge in multilayer perceptrons' (MLP) applications and deep learning in the past decades. Recurrent Neural Networks (RNNs) give additional representational power to feedforward MLPs by providing a way to treat sequential data. However, RNNs are hard to train using conventional error backpropagation methods because of the difficulty in relating inputs over many time-steps. Regularization approaches from MLP sphere, like dropout and noisy weight training, have been insufficiently applied and tested on simple RNNs. Moreover, solutions have been proposed to improve convergence in RNNs but not enough to improve the long term dependency remembering capabilities thereof.
In this study, we aim to empirically evaluate the remembering and generalization ability of RNNs on polyphonic musical datasets. The models are trained with injected noise, random dropout, norm-based regularizers and their respective performances compared to well-initialized plain RNNs and advanced regularization methods like fast-dropout. We conclude with evidence that training with noise does not improve performance as conjectured by a few works in RNN optimization before ours.
△ Less
Submitted 21 October, 2014;
originally announced October 2014.
-
Macro- and microscopic properties of strontium doped indium oxide
Authors:
Y. M. Nikolaenko,
Y. E. Kuzovlev,
Y. V. Medvedev,
N. I. Mezin,
C. Fasel,
A. Gurlo,
L. Schlicker,
T. J. M. Bayer,
Y. A. Genenko
Abstract:
Solid state synthesis and physical mechanisms of electrical conductivity variation in polycrystalline, strontium doped indium oxide In2O3:(SrO)x were investigated for materials with different do** levels at different temperatures (T=20-300 C) and ambient atmosphere content including humidity and low pressure. Gas sensing ability of these compounds as well as the sample resistance appeared to inc…
▽ More
Solid state synthesis and physical mechanisms of electrical conductivity variation in polycrystalline, strontium doped indium oxide In2O3:(SrO)x were investigated for materials with different do** levels at different temperatures (T=20-300 C) and ambient atmosphere content including humidity and low pressure. Gas sensing ability of these compounds as well as the sample resistance appeared to increase by 4 and 8 orders of the magnitude, respectively, with the do** level increase from zero up to x=10%. The conductance variation due to do** is explained by two mechanisms: acceptor-like electrical activity of Sr as a point defect and appearance of an additional phase of SrIn2O4. An unusual property of high level (x=10%) doped samples is a possibility of extraordinarily large and fast oxygen exchange with ambient atmosphere at not very high temperatures (100-200 C). This peculiarity is explained by friable structure of crystallite surface. Friable structure provides relatively fast transition of samples from high to low resistive state at the expense of high conductance of the near surface layer of the grains. Microscopic study of the electro-diffusion process at the surface of oxygen deficient samples allowed estimation of the diffusion coefficient of oxygen vacancies in the friable surface layer at room temperature as 3x10^(-13) cm^2/s, which is by one order of the magnitude smaller than that known for amorphous indium oxide films.
△ Less
Submitted 24 July, 2014;
originally announced July 2014.
-
Variational inference of latent state sequences using Recurrent Networks
Authors:
Justin Bayer,
Christian Osendorfer
Abstract:
Recent advances in the estimation of deep directed graphical models and recurrent networks let us contribute to the removal of a blind spot in the area of probabilistc modelling of time series. The proposed methods i) can infer distributed latent state-space trajectories with nonlinear transitions, ii) scale to large data sets thanks to the use of a stochastic objective and fast, approximate infer…
▽ More
Recent advances in the estimation of deep directed graphical models and recurrent networks let us contribute to the removal of a blind spot in the area of probabilistc modelling of time series. The proposed methods i) can infer distributed latent state-space trajectories with nonlinear transitions, ii) scale to large data sets thanks to the use of a stochastic objective and fast, approximate inference, iii) enable the design of rich emission models which iv) will naturally lead to structured outputs. Two different paths of introducing latent state sequences are pursued, leading to the variational recurrent auto encoder (VRAE) and the variational one step predictor (VOSP). The use of independent Wiener processes as priors on the latent state sequence is a viable compromise between efficient computation of the Kullback-Leibler divergence from the variational approximation of the posterior and maintaining a reasonable belief in the dynamics. We verify our methods empirically, obtaining results close or superior to the state of the art. We also show qualitative results for denoising and missing value imputation.
△ Less
Submitted 30 September, 2014; v1 submitted 6 June, 2014;
originally announced June 2014.
-
On Fast Dropout and its Applicability to Recurrent Networks
Authors:
Justin Bayer,
Christian Osendorfer,
Daniela Korhammer,
Nutan Chen,
Sebastian Urban,
Patrick van der Smagt
Abstract:
Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout,…
▽ More
Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout, a recent regularization method for generalized linear models and neural networks from a back-propagation inspired perspective. We show that fast dropout implements a quadratic form of an adaptive, per-parameter regularizer, which rewards large weights in the light of underfitting, penalizes them for overconfident predictions and vanishes at minima of an unregularized training loss. The derivatives of that regularizer are exclusively based on the training error signal. One consequence of this is the absense of a global weight attractor, which is particularly appealing for RNNs, since the dynamics are not biased towards a certain regime. We positively test the hypothesis that this improves the performance of RNNs on four musical data sets.
△ Less
Submitted 5 March, 2014; v1 submitted 4 November, 2013;
originally announced November 2013.
-
An evaluation of the exposure in nadir observation of the JEM-EUSO mission
Authors:
J. H. Adams,
S. Ahmad,
J. -N. Albert,
D. Allard,
M. Ambrosio,
L. Anchordoqui,
A. Anzalone,
Y. Arai,
C. Aramo,
K. Asano,
M. Ave,
P. Barrillon,
T. Batsch,
J. Bayer,
T. Belenguer,
R. Bellotti,
A. A. Berlind,
M. Bertaina,
P. L. Biermann,
S. Biktemerova,
C. Blaksley,
J. Blecki,
S. Blin-Bondil,
J. Bluemer,
P. Bobik
, et al. (236 additional authors not shown)
Abstract:
We evaluate the exposure during nadir observations with JEM-EUSO, the Extreme Universe Space Observatory, on-board the Japanese Experiment Module of the International Space Station. Designed as a mission to explore the extreme energy Universe from space, JEM-EUSO will monitor the Earth's nighttime atmosphere to record the ultraviolet light from tracks generated by extensive air showers initiated b…
▽ More
We evaluate the exposure during nadir observations with JEM-EUSO, the Extreme Universe Space Observatory, on-board the Japanese Experiment Module of the International Space Station. Designed as a mission to explore the extreme energy Universe from space, JEM-EUSO will monitor the Earth's nighttime atmosphere to record the ultraviolet light from tracks generated by extensive air showers initiated by ultra-high energy cosmic rays. In the present work, we discuss the particularities of space-based observation and we compute the annual exposure in nadir observation. The results are based on studies of the expected trigger aperture and observational duty cycle, as well as, on the investigations of the effects of clouds and different types of background light. We show that the annual exposure is about one order of magnitude higher than those of the presently operating ground-based observatories.
△ Less
Submitted 11 May, 2013;
originally announced May 2013.
-
Convolutional Neural Networks learn compact local image descriptors
Authors:
Christian Osendorfer,
Justin Bayer,
Patrick van der Smagt
Abstract:
A standard deep convolutional neural network paired with a suitable loss function learns compact local image descriptors that perform comparably to state-of-the art approaches.
A standard deep convolutional neural network paired with a suitable loss function learns compact local image descriptors that perform comparably to state-of-the art approaches.
△ Less
Submitted 2 June, 2013; v1 submitted 30 April, 2013;
originally announced April 2013.
-
Unsupervised Feature Learning for low-level Local Image Descriptors
Authors:
Christian Osendorfer,
Justin Bayer,
Sebastian Urban,
Patrick van der Smagt
Abstract:
Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, it…
▽ More
Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, it has never been \emph{quantitatively} investigated yet how well unsupervised learning methods can find \emph{low-level representations} for image patches without any additional supervision. In this paper we examine the performance of pure unsupervised methods on a low-level correspondence task, a problem that is central to many Computer Vision applications. We find that a special type of Restricted Boltzmann Machines (RBMs) performs comparably to hand-crafted descriptors. Additionally, a simple binarization scheme produces compact representations that perform better than several state-of-the-art descriptors.
△ Less
Submitted 25 April, 2013; v1 submitted 13 January, 2013;
originally announced January 2013.
-
The JEM-EUSO Mission: Status and Prospects in 2011
Authors:
The JEM-EUSO Collaboration,
:,
J. H. Adams Jr,
S. Ahmad,
J. -N. Albert,
D. Allard,
M. Ambrosio,
L. Anchordoqui,
A. Anzalone,
Y. Arai,
C. Aramo,
K. Asano,
P. Barrillon,
T. Batsch,
J. Bayer,
T. Belenguer,
R. Bellotti,
A. A. Berlind,
M. Bertaina,
P. L. Biermann,
S. Biktemerova,
C. Blaksley,
J. Blecki,
S. Blin-Bondil,
J. Bluemer
, et al. (235 additional authors not shown)
Abstract:
Contributions of the JEM-EUSO Collaboration to the 32nd International Cosmic Ray Conference, Bei**g, August, 2011.
Contributions of the JEM-EUSO Collaboration to the 32nd International Cosmic Ray Conference, Bei**g, August, 2011.
△ Less
Submitted 23 April, 2012;
originally announced April 2012.
-
Learning Sequence Neighbourhood Metrics
Authors:
Justin Bayer,
Christian Osendorfer,
Patrick van der Smagt
Abstract:
Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood components analysis (NCA) objective function are able to detect the characterizing dynamics of sequences and embed them into a fixed-length vector space of arbitrary dimensionality. Subsequently, the resulting features are meaningful and can be used for visualization or nearest neighbour classification in…
▽ More
Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood components analysis (NCA) objective function are able to detect the characterizing dynamics of sequences and embed them into a fixed-length vector space of arbitrary dimensionality. Subsequently, the resulting features are meaningful and can be used for visualization or nearest neighbour classification in linear time. This kind of metric learning for sequential data enables the use of algorithms tailored towards fixed length vector spaces such as R^n.
△ Less
Submitted 22 August, 2013; v1 submitted 9 September, 2011;
originally announced September 2011.