Search | arXiv e-print repository

Learned Thresholds Token Merging and Pruning for Vision Transformers

Abstract: Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the complexity of transformer models is quadratic with respect to the number of input tokens. Therefore techniques that reduce the number of input tokens that need to… ▽ More Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the complexity of transformer models is quadratic with respect to the number of input tokens. Therefore techniques that reduce the number of input tokens that need to be processed have been proposed. This paper introduces Learned Thresholds token Merging and Pruning (LTMP), a novel approach that leverages the strengths of both token merging and token pruning. LTMP uses learned threshold masking modules that dynamically determine which tokens to merge and which to prune. We demonstrate our approach with extensive experiments on vision transformers on the ImageNet classification task. Our results demonstrate that LTMP achieves state-of-the-art accuracy across reduction rates while requiring only a single fine-tuning epoch, which is an order of magnitude faster than previous methods. Code is available at https://github.com/Mxbonn/ltmp . △ Less

Submitted 17 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Published in Transactions on Machine Learning Research (TMLR)

Journal ref: Transactions on Machine Learning Research (2023)

arXiv:2306.17558 [pdf, other]

Towards the extraction of robust sign embeddings for low resource sign language recognition

Authors: Mathieu De Coster, Ellen Rushe, Ruth Holmes, Anthony Ventresque, Joni Dambre

Abstract: Isolated Sign Language Recognition (SLR) has mostly been applied on datasets containing signs executed slowly and clearly by a limited group of signers. In real-world scenarios, however, we are met with challenging visual conditions, coarticulated signing, small datasets, and the need for signer independent models. To tackle this difficult problem, we require a robust feature extractor to process… ▽ More Isolated Sign Language Recognition (SLR) has mostly been applied on datasets containing signs executed slowly and clearly by a limited group of signers. In real-world scenarios, however, we are met with challenging visual conditions, coarticulated signing, small datasets, and the need for signer independent models. To tackle this difficult problem, we require a robust feature extractor to process the sign language videos. One could expect human pose estimators to be ideal candidates. However, due to a domain mismatch with their training sets and challenging poses in sign language, they lack robustness on sign language data and image-based models often still outperform keypoint-based models. Furthermore, whereas the common practice of transfer learning with image-based models yields even higher accuracy, keypoint-based models are typically trained from scratch on every SLR dataset. These factors limit their usefulness for SLR. From the existing literature, it is also not clear which, if any, pose estimator performs best for SLR. We compare the three most popular pose estimators for SLR: OpenPose, MMPose and MediaPipe. We show that through keypoint normalization, missing keypoint imputation, and learning a pose embedding, we can obtain significantly better results and enable transfer learning. We show that keypoint-based embeddings contain cross-lingual features: they can transfer between sign languages and achieve competitive performance even when fine-tuning only the classifier layer of an SLR model on a target sign language. We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language. The embeddings can also be learned in a multilingual fashion. The application of these embeddings could prove particularly useful for low resource sign languages in the future. △ Less

Submitted 16 August, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

arXiv:2208.12694 [pdf, other]

Hardware-aware mobile building block evaluation for computer vision

Authors: Maxim Bonnaerens, Matthias Freiberger, Marian Verhelst, Joni Dambre

Abstract: In this work we propose a methodology to accurately evaluate and compare the performance of efficient neural network building blocks for computer vision in a hardware-aware manner. Our comparison uses pareto fronts based on randomly sampled networks from a design space to capture the underlying accuracy/complexity trade-offs. We show that our approach allows to match the information obtained by pr… ▽ More In this work we propose a methodology to accurately evaluate and compare the performance of efficient neural network building blocks for computer vision in a hardware-aware manner. Our comparison uses pareto fronts based on randomly sampled networks from a design space to capture the underlying accuracy/complexity trade-offs. We show that our approach allows to match the information obtained by previous comparison paradigms, but provides more insights in the relationship between hardware cost and accuracy. We use our methodology to analyze different building blocks and evaluate their performance on a range of embedded hardware platforms. This highlights the importance of benchmarking building blocks as a preselection step in the design process of a neural network. We show that choosing the right building block can speed up inference by up to a factor of 2x on specific hardware ML accelerators. △ Less

Submitted 26 August, 2022; originally announced August 2022.

arXiv:2202.03086 [pdf, other]

doi 10.1007/s10209-023-00992-1

Machine Translation from Signed to Spoken Languages: State of the Art and Challenges

Authors: Mathieu De Coster, Dimitar Shterionov, Mieke Van Herreweghe, Joni Dambre

Abstract: Automatic translation from signed to spoken languages is an interdisciplinary research domain, lying on the intersection of computer vision, machine translation and linguistics. Nevertheless, research in this domain is performed mostly by computer scientists in isolation. As the domain is becoming increasingly popular - the majority of scientific papers on the topic of sign language translation ha… ▽ More Automatic translation from signed to spoken languages is an interdisciplinary research domain, lying on the intersection of computer vision, machine translation and linguistics. Nevertheless, research in this domain is performed mostly by computer scientists in isolation. As the domain is becoming increasingly popular - the majority of scientific papers on the topic of sign language translation have been published in the past three years - we provide an overview of the state of the art as well as some required background in the different related disciplines. We give a high-level introduction to sign language linguistics and machine translation to illustrate the requirements of automatic sign language translation. We present a systematic literature review to illustrate the state of the art in the domain and then, harking back to the requirements, lay out several challenges for future research. We find that significant advances have been made on the shoulders of spoken language machine translation research. However, current approaches are often not linguistically motivated or are not adapted to the different input modality of sign languages. We explore challenges related to the representation of sign language data, the collection of datasets, the need for interdisciplinary research and requirements for moving beyond research, towards applications. Based on our findings, we advocate for interdisciplinary research and to base future research on linguistic analysis of sign languages. Furthermore, the inclusion of deaf and hearing end users of sign language translation applications in use case identification, data collection and evaluation is of the utmost importance in the creation of useful sign language translation models. We recommend iterative, human-in-the-loop, design and development of sign language translation models. △ Less

Submitted 5 April, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: This is the version of the article submitted to peer review to Universal Access in the Information Society. Please refer to "De Coster, M., Shterionov, D., Van Herreweghe, M. et al. Machine translation from signed to spoken languages: state of the art and challenges. Univ Access Inf Soc (2023)." for the published and updated version

arXiv:2104.00432 [pdf, other]

doi 10.1016/j.cviu.2022.103445

Anchor Pruning for Object Detection

Authors: Maxim Bonnaerens, Matthias Freiberger, Joni Dambre

Abstract: This paper proposes anchor pruning for object detection in one-stage anchor-based detectors. While pruning techniques are widely used to reduce the computational cost of convolutional neural networks, they tend to focus on optimizing the backbone networks where often most computations are. In this work we demonstrate an additional pruning technique, specifically for object detection: anchor prunin… ▽ More This paper proposes anchor pruning for object detection in one-stage anchor-based detectors. While pruning techniques are widely used to reduce the computational cost of convolutional neural networks, they tend to focus on optimizing the backbone networks where often most computations are. In this work we demonstrate an additional pruning technique, specifically for object detection: anchor pruning. With more efficient backbone networks and a growing trend of deploying object detectors on embedded systems where post-processing steps such as non-maximum suppression can be a bottleneck, the impact of the anchors used in the detection head is becoming increasingly more important. In this work, we show that many anchors in the object detection head can be removed without any loss in accuracy. With additional retraining, anchor pruning can even lead to improved accuracy. Extensive experiments on SSD and MS COCO show that the detection head can be made up to 44% more efficient while simultaneously increasing accuracy. Further experiments on RetinaNet and PASCAL VOC show the general effectiveness of our approach. We also introduce `overanchorized' models that can be used together with anchor pruning to eliminate hyperparameters related to the initial shape of anchors. Code and models are available at https://github.com/Mxbonn/anchor_pruning. △ Less

Submitted 1 June, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

arXiv:2102.00428 [pdf, other]

PyTorch-Hebbian: facilitating local learning in a deep learning framework

Authors: Jules Talloen, Joni Dambre, Alexander Vandesompele

Abstract: Recently, unsupervised local learning, based on Hebb's idea that change in synaptic efficacy depends on the activity of the pre- and postsynaptic neuron only, has shown potential as an alternative training mechanism to backpropagation. Unfortunately, Hebbian learning remains experimental and rarely makes it way into standard deep learning frameworks. In this work, we investigate the potential of H… ▽ More Recently, unsupervised local learning, based on Hebb's idea that change in synaptic efficacy depends on the activity of the pre- and postsynaptic neuron only, has shown potential as an alternative training mechanism to backpropagation. Unfortunately, Hebbian learning remains experimental and rarely makes it way into standard deep learning frameworks. In this work, we investigate the potential of Hebbian learning in the context of standard deep learning workflows. To this end, a framework for thorough and systematic evaluation of local learning rules in existing deep learning pipelines is proposed. Using this framework, the potential of Hebbian learned feature extractors for image classification is illustrated. In particular, the framework is used to expand the Krotov-Hopfield learning rule to standard convolutional neural networks without sacrificing accuracy compared to end-to-end backpropagation. The source code is available at https://github.com/Joxis/pytorch-hebbian. △ Less

Submitted 31 January, 2021; originally announced February 2021.

Comments: Presented as a poster at the NeurIPS 2020 Beyond Backpropagation workshop

arXiv:2004.04560 [pdf, other]

doi 10.1016/j.cogsys.2019.08.002

Populations of Spiking Neurons for Reservoir Computing: Closed Loop Control of a Compliant Quadruped

Authors: Alexander Vandesompele, Gabriel Urbain, Francis wyffels, Joni Dambre

Abstract: Compliant robots can be more versatile than traditional robots, but their control is more complex. The dynamics of compliant bodies can however be turned into an advantage using the physical reservoir computing frame-work. By feeding sensor signals to the reservoir and extracting motor signals from the reservoir, closed loop robot control is possible. Here, we present a novel framework for impleme… ▽ More Compliant robots can be more versatile than traditional robots, but their control is more complex. The dynamics of compliant bodies can however be turned into an advantage using the physical reservoir computing frame-work. By feeding sensor signals to the reservoir and extracting motor signals from the reservoir, closed loop robot control is possible. Here, we present a novel framework for implementing central pattern generators with spiking neural networks to obtain closed loop robot control. Using the FORCE learning paradigm, we train a reservoir of spiking neuron populations to act as a central pattern generator. We demonstrate the learning of predefined gait patterns, speed control and gait transition on a simulated model of a compliant quadrupedal robot. △ Less

Submitted 14 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

arXiv:2003.09327 [pdf, other]

Stance Control Inspired by Cerebellum Stabilizes Reflex-Based Locomotion on HyQ Robot

Authors: Gabriel Urbain, Victor Barasuol, Claudio Semini, Joni Dambre, Francis wyffels

Abstract: Advances in legged robotics are strongly rooted in animal observations. A clear illustration of this claim is the generalization of Central Pattern Generators (CPG), first identified in the cat spinal cord, to generate cyclic motion in robotic locomotion. Despite a global endorsement of this model, physiological and functional experiments in mammals have also indicated the presence of descending s… ▽ More Advances in legged robotics are strongly rooted in animal observations. A clear illustration of this claim is the generalization of Central Pattern Generators (CPG), first identified in the cat spinal cord, to generate cyclic motion in robotic locomotion. Despite a global endorsement of this model, physiological and functional experiments in mammals have also indicated the presence of descending signals from the cerebellum, and reflex feedback from the lower limb sensory cells, that closely interact with CPGs. To this day, these interactions are not fully understood. In some studies, it was demonstrated that pure reflex-based locomotion in the absence of oscillatory signals could be achieved in realistic musculoskeletal simulation models or small compliant quadruped robots. At the same time, biological evidence has attested the functional role of the cerebellum for predictive control of balance and stance within mammals. In this paper, we promote both approaches and successfully apply reflex-based dynamic locomotion, coupled with a balance and gravity compensation mechanism, on the state-of-art HyQ robot. We discuss the importance of this stability module to ensure a correct foot lift-off and maintain a reliable gait. The robotic platform is further used to test two different architectural hypotheses inspired by the cerebellum. An analysis of experimental results demonstrates that the most biologically plausible alternative also leads to better results for robust locomotion. △ Less

Submitted 20 March, 2020; originally announced March 2020.

arXiv:1910.13332 [pdf, other]

Towards Deep Physical Reservoir Computing Through Automatic Task Decomposition And Map**

Authors: Matthias Freiberger, Peter Bienstman, Joni Dambre

Abstract: Photonic reservoir computing is a promising candidate for low-energy computing at high bandwidths. Despite recent successes, there are bounds to what one can achieve simply by making photonic reservoirs larger. Therefore, a switch from single-reservoir computing to multi-reservoir and even deep physical reservoir computing is desirable. Given that backpropagation can not be used directly to train… ▽ More Photonic reservoir computing is a promising candidate for low-energy computing at high bandwidths. Despite recent successes, there are bounds to what one can achieve simply by making photonic reservoirs larger. Therefore, a switch from single-reservoir computing to multi-reservoir and even deep physical reservoir computing is desirable. Given that backpropagation can not be used directly to train multi-reservoir systems in our targeted setting, we propose an alternative approach that still uses its power to derive intermediate targets. In this work we report our findings on a conducted experiment to evaluate the general feasibility of our approach by training a network of 3 Echo State Networks to perform the well-known NARMA-10 task using targets derived through backpropagation. Our results indicate that our proposed method is well-suited to train multi-reservoir systems in a efficient way. △ Less

Submitted 25 October, 2019; originally announced October 2019.

Comments: Submitted to the IEEE International Conference on Rebooting Computing 2019; accepted as a poster, will not be presented though

arXiv:1908.02728 [pdf, other]

Addressing Limited Weight Resolution in a Fully Optical Neuromorphic Reservoir Computing Readout

Authors: Chonghuai Ma, Floris Laporte, Joni Dambre, Peter Bienstman

Abstract: Using optical hardware for neuromorphic computing has become more and more popular recently due to its efficient high-speed data processing capabilities and low power consumption. However, there are still some remaining obstacles to realizing the vision of a completely optical neuromorphic computer. One of them is that, depending on the technology used, optical weighting elements may not share the… ▽ More Using optical hardware for neuromorphic computing has become more and more popular recently due to its efficient high-speed data processing capabilities and low power consumption. However, there are still some remaining obstacles to realizing the vision of a completely optical neuromorphic computer. One of them is that, depending on the technology used, optical weighting elements may not share the same resolution as in the electrical domain. Moreover, noise and drift are important considerations as well. In this article, we investigate a new method for improving the performance of optical weighting, even in the presence of noise and in the case of very low resolution. Even with only 8 to 32 levels of resolution, the method can outperform the naive traditional low-resolution weighting by several orders of magnitude in terms of bit error rate and can deliver performance very close to full-resolution weighting elements, also in noisy environments. △ Less

Submitted 6 June, 2019; originally announced August 2019.

Comments: 10 pages, 8 figures

arXiv:1810.03377 [pdf, other]

doi 10.1109/TNNLS.2018.2874571

Training Passive Photonic Reservoirs with Integrated Optical Readout

Authors: Matthias Freiberger, Andrew Katumba, Peter Bienstman, Joni Dambre

Abstract: As Moore's law comes to an end, neuromorphic approaches to computing are on the rise. One of these, passive photonic reservoir computing, is a strong candidate for computing at high bitrates (> 10 Gbps) and with low energy consumption. Currently though, both benefits are limited by the necessity to perform training and readout operations in the electrical domain. Thus, efforts are currently underw… ▽ More As Moore's law comes to an end, neuromorphic approaches to computing are on the rise. One of these, passive photonic reservoir computing, is a strong candidate for computing at high bitrates (> 10 Gbps) and with low energy consumption. Currently though, both benefits are limited by the necessity to perform training and readout operations in the electrical domain. Thus, efforts are currently underway in the photonic community to design an integrated optical readout, which allows to perform all operations in the optical domain. In addition to the technological challenge of designing such a readout, new algorithms have to be designed in order to train it. Foremost, suitable algorithms need to be able to deal with the fact that the actual on-chip reservoir states are not directly observable. In this work, we investigate several options for such a training algorithm and propose a solution in which the complex states of the reservoir can be observed by appropriately setting the readout weights, while iterating over a predefined input sequence. We perform numerical simulations in order to compare our method with an ideal baseline requiring full observability as well as with an established black-box optimization approach (CMA-ES). △ Less

Submitted 8 October, 2018; originally announced October 2018.

Comments: Accepted for publication in IEEE Transactions on Neural Networks and Learning Systems (TNNLS-2017-P-8539.R1), copyright 2018 IEEE. This research was funded by the EU Horizon 2020 PHRESCO Grant (Grant No. 688579) and the BELSPO IAP P7-35 program Photonics@be. 11 pages, 9 figures

arXiv:1808.09551 [pdf, ps, other]

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Authors: Fréderic Godin, Kris Demuynck, Joni Dambre, Wesley De Neve, Thomas Demeester

Abstract: Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns those models learn. Moreover, models are often compared only quantitatively while a qualitative analysis is missing. In this paper, we investigate which character-level patterns neural networks learn and if those patterns… ▽ More Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns those models learn. Moreover, models are often compared only quantitatively while a qualitative analysis is missing. In this paper, we investigate which character-level patterns neural networks learn and if those patterns coincide with manually-defined word segmentations and annotations. To that end, we extend the contextual decomposition technique (Murdoch et al. 2018) to convolutional neural networks which allows us to compare convolutional neural networks and bidirectional long short-term memory networks. We evaluate and compare these models for the task of morphological tagging on three morphologically different languages and show that these models implicitly discover understandable linguistic rules. Our implementation can be found at https://github.com/FredericGodin/ContextualDecomposition-NLP . △ Less

Submitted 28 August, 2018; originally announced August 2018.

Comments: Accepted at EMNLP 2018

arXiv:1707.08214 [pdf, ps, other]

doi 10.1016/j.patrec.2018.09.006

Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

Authors: Fréderic Godin, Jonas Degrave, Joni Dambre, Wesley De Neve

Abstract: In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a drop-in replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less prone to the vanishi… ▽ More In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a drop-in replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. (2017) and compare our DReLU-based QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to improve the current state-of-the-art in character-level language modeling over shallow architectures based on LSTMs. △ Less

Submitted 31 October, 2017; v1 submitted 25 July, 2017; originally announced July 2017.

arXiv:1707.06130 [pdf, ps, other]

Improving Language Modeling using Densely Connected Recurrent Neural Networks

Authors: Fréderic Godin, Joni Dambre, Wesley De Neve

Abstract: In this paper, we introduce the novel concept of densely connected layers into recurrent neural networks. We evaluate our proposed architecture on the Penn Treebank language modeling task. We show that we can obtain similar perplexity scores with six times fewer parameters compared to a standard stacked 2-layer LSTM model trained with dropout (Zaremba et al. 2014). In contrast with the current usa… ▽ More In this paper, we introduce the novel concept of densely connected layers into recurrent neural networks. We evaluate our proposed architecture on the Penn Treebank language modeling task. We show that we can obtain similar perplexity scores with six times fewer parameters compared to a standard stacked 2-layer LSTM model trained with dropout (Zaremba et al. 2014). In contrast with the current usage of skip connections, we show that densely connecting only a few stacked layers with skip connections already yields significant perplexity reductions. △ Less

Submitted 19 July, 2017; originally announced July 2017.

Comments: Accepted at Workshop on Representation Learning, ACL2017

arXiv:1611.09577 [pdf, other]

Fast Face-swap Using Convolutional Neural Networks

Authors: Iryna Korshunova, Wenzhe Shi, Joni Dambre, Lucas Theis

Abstract: We consider the problem of face swap** in images, where an input identity is transformed into a target identity while preserving pose, facial expression, and lighting. To perform this map**, we use convolutional neural networks trained to capture the appearance of the target identity from an unstructured collection of his/her photographs.This approach is enabled by framing the face swap** pr… ▽ More We consider the problem of face swap** in images, where an input identity is transformed into a target identity while preserving pose, facial expression, and lighting. To perform this map**, we use convolutional neural networks trained to capture the appearance of the target identity from an unstructured collection of his/her photographs.This approach is enabled by framing the face swap** problem in terms of style transfer, where the goal is to render an image in the style of another one. Building on recent advances in this area, we devise a new loss function that enables the network to produce highly photorealistic results. By combining neural networks with simple pre- and post-processing steps, we aim at making face swap work in real-time with no input from the user. △ Less

Submitted 27 July, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

arXiv:1611.01652 [pdf, other]

A Differentiable Physics Engine for Deep Learning in Robotics

Authors: Jonas Degrave, Michiel Hermans, Joni Dambre, Francis wyffels

Abstract: An important field in robotics is the optimization of controllers. Currently, robots are often treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. When gradient-based methods are used, models are kept small or rely on finite difference approximations for the Jaco… ▽ More An important field in robotics is the optimization of controllers. Currently, robots are often treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. When gradient-based methods are used, models are kept small or rely on finite difference approximations for the Jacobian. This method quickly grows expensive with increasing numbers of parameters, such as found in deep learning. We propose the implementation of a modern physics engine, which can differentiate control parameters. This engine is implemented for both CPU and GPU. Firstly, this paper shows how such an engine speeds up the optimization process, even for small problems. Furthermore, it explains why this is an alternative approach to deep Q-learning, for using deep learning in robotics. Finally, we argue that this is a big step for deep learning in robotics, as it opens up new possibilities to optimize robots, both in hardware and software. △ Less

Submitted 24 November, 2018; v1 submitted 5 November, 2016; originally announced November 2016.

Comments: Submitted for International Conference on Learning Representations 2017

arXiv:1506.01911 [pdf, other]

Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

Authors: Lionel Pigou, Aäron van den Oord, Sander Dieleman, Mieke Van Herreweghe, Joni Dambre

Abstract: Recent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. For the task of capturing temporal structure in video, however, there still remain numerous open research questions. Current research suggests using a simple temporal feature pooling strategy to take into account the temporal aspect of video. We demonstrate that… ▽ More Recent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. For the task of capturing temporal structure in video, however, there still remain numerous open research questions. Current research suggests using a simple temporal feature pooling strategy to take into account the temporal aspect of video. We demonstrate that this method is not sufficient for gesture recognition, where temporal information is more discriminative compared to general video classification tasks. We explore deep architectures for gesture recognition in video and propose a new end-to-end trainable neural network architecture incorporating temporal convolutions and bidirectional recurrence. Our main contributions are twofold; first, we show that recurrence is crucial for this task; second, we show that adding temporal convolutions leads to significant improvements. We evaluate the different approaches on the Montalbano gesture recognition dataset, where we achieve state-of-the-art results. △ Less

Submitted 10 February, 2016; v1 submitted 5 June, 2015; originally announced June 2015.

arXiv:1503.07077 [pdf, other]

doi 10.1093/mnras/stv632

Rotation-invariant convolutional neural networks for galaxy morphology prediction

Authors: Sander Dieleman, Kyle W. Willett, Joni Dambre

Abstract: Measuring the morphological parameters of galaxies is a key requirement for studying their formation and evolution. Surveys such as the Sloan Digital Sky Survey (SDSS) have resulted in the availability of very large collections of images, which have permitted population-wide analyses of galaxy morphology. Morphological analysis has traditionally been carried out mostly via visual inspection by tra… ▽ More Measuring the morphological parameters of galaxies is a key requirement for studying their formation and evolution. Surveys such as the Sloan Digital Sky Survey (SDSS) have resulted in the availability of very large collections of images, which have permitted population-wide analyses of galaxy morphology. Morphological analysis has traditionally been carried out mostly via visual inspection by trained experts, which is time-consuming and does not scale to large ($\gtrsim10^4$) numbers of images. Although attempts have been made to build automated classification systems, these have not been able to achieve the desired level of accuracy. The Galaxy Zoo project successfully applied a crowdsourcing strategy, inviting online users to classify images by answering a series of questions. Unfortunately, even this approach does not scale well enough to keep up with the increasing availability of galaxy images. We present a deep neural network model for galaxy morphology classification which exploits translational and rotational symmetry. It was developed in the context of the Galaxy Challenge, an international competition to build the best model for morphology classification based on annotated images from the Galaxy Zoo project. For images with high agreement among the Galaxy Zoo participants, our model is able to reproduce their consensus with near-perfect accuracy ($> 99\%$) for most questions. Confident model predictions are highly accurate, which makes the model suitable for filtering large collections of images and forwarding challenging images to experts for manual annotation. This approach greatly reduces the experts' workload without affecting accuracy. The application of these algorithms to larger sets of training data will be critical for analysing results from future surveys such as the LSST. △ Less

Submitted 24 March, 2015; originally announced March 2015.

Comments: Accepted for publication in MNRAS. 20 pages, 14 figures

arXiv:1501.02592 [pdf, other]

Photonic Delay Systems as Machine Learning Implementations

Authors: Michiel Hermans, Miguel Soriano, Joni Dambre, Peter Bienstman, Ingo Fischer

Abstract: Nonlinear photonic delay systems present interesting implementation platforms for machine learning models. They can be extremely fast, offer great degrees of parallelism and potentially consume far less power than digital processors. So far they have been successfully employed for signal processing using the Reservoir Computing paradigm. In this paper we show that their range of applicability can… ▽ More Nonlinear photonic delay systems present interesting implementation platforms for machine learning models. They can be extremely fast, offer great degrees of parallelism and potentially consume far less power than digital processors. So far they have been successfully employed for signal processing using the Reservoir Computing paradigm. In this paper we show that their range of applicability can be greatly extended if we use gradient descent with backpropagation through time on a model of the system to optimize the input encoding of such systems. We perform physical experiments that demonstrate that the obtained input encodings work well in reality, and we show that optimized systems perform significantly better than the common Reservoir Computing approach. The results presented here demonstrate that common gradient descent techniques from machine learning may well be applicable on physical neuro-inspired analog computers. △ Less

Submitted 12 January, 2015; originally announced January 2015.

Journal ref: Journal of Machine Learning Research, vol. 16, pp. 2081-2097 (2015)

arXiv:1407.6637 [pdf, other]

doi 10.1038/ncomms7729

Trainable and Dynamic Computing: Error Backpropagation through Physical Media

Authors: Michiel Hermans, Michaël Burm, Joni Dambre, Peter Bienstman

Abstract: Machine learning algorithms, and more in particular neural networks, arguably experience a revolution in terms of performance. Currently, the best systems we have for speech recognition, computer vision and similar problems are based on neural networks, trained using the half-century old backpropagation algorithm. Despite the fact that neural networks are a form of analog computers, they are still… ▽ More Machine learning algorithms, and more in particular neural networks, arguably experience a revolution in terms of performance. Currently, the best systems we have for speech recognition, computer vision and similar problems are based on neural networks, trained using the half-century old backpropagation algorithm. Despite the fact that neural networks are a form of analog computers, they are still implemented digitally for reasons of convenience and availability. In this paper we demonstrate how we can design physical linear dynamic systems with non-linear feedback as a generic platform for dynamic, neuro-inspired analog computing. We show that a crucial advantage of this setup is that the error backpropagation can be performed physically as well, which greatly speeds up the optimisation process. As we show in this paper, using one experimentally validated and one conceptual example, such systems may be the key to providing a relatively straightforward mechanism for constructing highly scalable, fully dynamic analog computers. △ Less

Submitted 24 July, 2014; originally announced July 2014.

arXiv:1406.2210 [pdf, other]

doi 10.1162/NECO_a_00694

Memristor models for machine learning

Authors: Juan Pablo Carbajal, Joni Dambre, Michiel Hermans, Benjamin Schrauwen

Abstract: In the quest for alternatives to traditional CMOS, it is being suggested that digital computing efficiency and power can be improved by matching the precision to the application. Many applications do not need the high precision that is being used today. In particular, large gains in area- and power efficiency could be achieved by dedicated analog realizations of approximate computing engines. In t… ▽ More In the quest for alternatives to traditional CMOS, it is being suggested that digital computing efficiency and power can be improved by matching the precision to the application. Many applications do not need the high precision that is being used today. In particular, large gains in area- and power efficiency could be achieved by dedicated analog realizations of approximate computing engines. In this work, we explore the use of memristor networks for analog approximate computation, based on a machine learning framework called reservoir computing. Most experimental investigations on the dynamics of memristors focus on their nonvolatile behavior. Hence, the volatility that is present in the developed technologies is usually unwanted and it is not included in simulation models. In contrast, in reservoir computing, volatility is not only desirable but necessary. Therefore, in this work, we propose two different ways to incorporate it into memristor simulation models. The first is an extension of Strukov's model and the second is an equivalent Wiener model approximation. We analyze and compare the dynamical properties of these models and discuss their implications for the memory and the nonlinear processing capacity of memristor networks. Our results indicate that device variability, increasingly causing problems in traditional computer design, is an asset in the context of reservoir computing. We conclude that, although both models could lead to useful memristor based reservoir computing systems, their computational performance will differ. Therefore, experimental modeling research is required for the development of accurate volatile memristor models. △ Less

Submitted 14 July, 2014; v1 submitted 9 June, 2014; originally announced June 2014.

Comments: 4 figures, no tables. Submitted to neural computation

arXiv:1111.7219 [pdf, other]

doi 10.1038/srep00287

Optoelectronic Reservoir Computing

Authors: Yvan Paquot, François Duport, Anteo Smerieri, Joni Dambre, Benjamin Schrauwen, Marc Haelterman, Serge Massar

Abstract: Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent dynamical system coupled to a single input layer and a single output layer. Within these constraints many implementations are possible. Here we report an opto-electronic implementation of reservoir comput… ▽ More Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent dynamical system coupled to a single input layer and a single output layer. Within these constraints many implementations are possible. Here we report an opto-electronic implementation of reservoir computing based on a recently proposed architecture consisting of a single non linear node and a delay line. Our implementation is sufficiently fast for real time information processing. We illustrate its performance on tasks of practical importance such as nonlinear channel equalization and speech recognition, and obtain results comparable to state of the art digital implementations. △ Less

Submitted 30 November, 2011; originally announced November 2011.

Comments: Contains main paper and two Supplementary Materials

Journal ref: Scientific Reports 2, Article number: 287, (2012)

Showing 1–22 of 22 results for author: Dambre, J