Skip to main content

Showing 1–27 of 27 results for author: Breuel, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.09358  [pdf, other

    cs.CV cs.AI cs.LG

    Investigating the Nature of 3D Generalization in Deep Neural Networks

    Authors: Shoaib Ahmed Siddiqui, David Krueger, Thomas Breuel

    Abstract: Visual object recognition systems need to generalize from a set of 2D training views to novel views. The question of how the human visual system can generalize to novel views has been studied and modeled in psychology, computer vision, and neuroscience. Modern deep learning architectures for object recognition generalize well to novel views, but the mechanisms are not well understood. In this pape… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: 15 pages, 15 figures, CVPR format

  2. arXiv:2202.11094  [pdf, other

    cs.CV

    GroupViT: Semantic Segmentation Emerges from Text Supervision

    Authors: Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang

    Abstract: Grou** and recognition are important components of visual scene understanding, e.g., for object detection and semantic segmentation. With end-to-end deep learning systems, grou** of image regions usually happens implicitly via top-down supervision from pixel-level recognition labels. Instead, in this paper, we propose to bring back the grou** mechanism into deep networks, which allows semant… ▽ More

    Submitted 18 July, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: CVPR 2022. Project page and code: https://jerryxu.net/GroupViT

  3. arXiv:2107.04827  [pdf, other

    cs.LG cs.AI cs.CV

    Identifying Layers Susceptible to Adversarial Attacks

    Authors: Shoaib Ahmed Siddiqui, Thomas Breuel

    Abstract: In this paper, we investigate the use of pretraining with adversarial networks, with the objective of discovering the relationship between network depth and robustness. For this purpose, we selectively retrain different portions of VGG and ResNet architectures on CIFAR-10, Imagenette, and ImageNet using non-adversarial and adversarial data. Experimental results show that susceptibility to adversar… ▽ More

    Submitted 28 October, 2021; v1 submitted 10 July, 2021; originally announced July 2021.

  4. arXiv:2101.10803  [pdf, other

    cs.CV

    ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning

    Authors: Sangho Lee, Jiwan Chung, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song

    Abstract: The natural association between visual observations and their corresponding sound provides powerful self-supervisory signals for learning video representations, which makes the ever-growing amount of online videos an attractive source of training data. However, large portions of online videos contain irrelevant audio-visual signals because of edited/overdubbed audio, and models trained on such unc… ▽ More

    Submitted 16 August, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

    Comments: Published to ICCV2021

  5. arXiv:2012.04124  [pdf, other

    cs.CV

    Parameter Efficient Multimodal Transformers for Video Representation Learning

    Authors: Sangho Lee, Youngjae Yu, Gunhee Kim, Thomas Breuel, Jan Kautz, Yale Song

    Abstract: The recent success of Transformers in the language domain has motivated adapting it to a multimodal setting, where a new visual model is trained in tandem with an already pretrained language model. However, due to the excessive memory requirements from Transformers, existing work typically fixes the language model and train only the vision module, which limits its ability to learn cross-modal info… ▽ More

    Submitted 22 September, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: Accepted to ICLR 2021

  6. arXiv:2012.00899  [pdf, other

    cs.CV

    Displacement-Invariant Cost Computation for Efficient Stereo Matching

    Authors: Yiran Zhong, Charles Loop, Wonmin Byeon, Stan Birchfield, Yuchao Dai, Kaihao Zhang, Alexey Kamenev, Thomas Breuel, Hongdong Li, Jan Kautz

    Abstract: Although deep learning-based methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy, their inference time is typically slow, on the order of seconds for a pair of 540p images. The main reason is that the leading methods employ time-consuming 3D convolutions applied to a 4D feature volume. A common way to speed up the computation is to downsample the featur… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 8 pages

  7. arXiv:2001.01885  [pdf, other

    cs.LG stat.ML

    Discovering Nonlinear Relations with Minimum Predictive Information Regularization

    Authors: Tailin Wu, Thomas Breuel, Michael Skuhersky, Jan Kautz

    Abstract: Identifying the underlying directional relations from observational time series with nonlinear interactions and complex relational structures is key to a wide range of applications, yet remains a hard problem. In this work, we introduce a novel minimum predictive information regularization method to infer directional relations from time series, allowing deep learning models to discover nonlinear r… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: 26 pages, 11 figures; ICML'19 Time Series Workshop

  8. arXiv:2001.01858  [pdf

    cs.DC

    High Performance I/O For Large Scale Deep Learning

    Authors: Alex Aizman, Gavin Maltby, Thomas Breuel

    Abstract: Training deep learning (DL) models on petascale datasets is essential for achieving competitive and state-of-the-art performance in applications such as speech, video analytics, and object recognition. However, existing distributed filesystems were not developed for the access patterns and usability requirements of DL jobs. In this paper, we describe AIStore, a highly scalable, easy-to-deploy stor… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: 6 pages, 8 figures

  9. arXiv:1804.10123  [pdf, other

    cs.CV cs.NE

    IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification

    Authors: Sam Leroux, Pavlo Molchanov, Pieter Simoens, Bart Dhoedt, Thomas Breuel, Jan Kautz

    Abstract: Deep residual networks (ResNets) made a recent breakthrough in deep learning. The core idea of ResNets is to have shortcut connections between layers that allow the network to be much deeper while still being easy to optimize avoiding vanishing gradients. These shortcut connections have interesting side-effects that make ResNets behave differently from other typical network architectures. In this… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

    Comments: ICLR 2018 Workshop track

  10. arXiv:1804.09534  [pdf, other

    cs.CV cs.LG

    Hand Pose Estimation via Latent 2.5D Heatmap Regression

    Authors: Umar Iqbal, Pavlo Molchanov, Thomas Breuel, Juergen Gall, Jan Kautz

    Abstract: Estimating the 3D pose of a hand is an essential part of human-computer interaction. Estimating 3D pose using depth or multi-view sensors has become easier with recent advances in computer vision, however, regressing pose from a single RGB image is much less straightforward. The main difficulty arises from the fact that 3D pose requires some form of depth estimates, which are ambiguous given only… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

  11. arXiv:1703.00848  [pdf, other

    cs.CV cs.AI

    Unsupervised Image-to-Image Translation Networks

    Authors: Ming-Yu Liu, Thomas Breuel, Jan Kautz

    Abstract: Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumpti… ▽ More

    Submitted 22 July, 2018; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: NIPS 2017, 11 pages, 6 figures

  12. arXiv:1610.09565  [pdf, other

    cs.CL

    Sequence-to-sequence neural network models for transliteration

    Authors: Mihaela Rosca, Thomas Breuel

    Abstract: Transliteration is a key component of machine translation systems and software internationalization. This paper demonstrates that neural sequence-to-sequence models obtain state of the art or close to state of the art results on existing datasets. In an effort to make machine transliteration accessible, we open source a new Arabic to English transliteration dataset and our trained models.

    Submitted 29 October, 2016; originally announced October 2016.

  13. arXiv:1609.03415  [pdf, ps, other

    cs.CV cs.MM

    Active Canny: Edge Detection and Recovery with Open Active Contour Models

    Authors: Muhammet Bastan, S. Saqib Bukhari, Thomas M. Breuel

    Abstract: We introduce an edge detection and recovery framework based on open active contour models (snakelets). This is motivated by the noisy or broken edges output by standard edge detection algorithms, like Canny. The idea is to utilize the local continuity and smoothness cues provided by strong edges and grow them to recover the missing edges. This way, the strong edges are used to recover weak or miss… ▽ More

    Submitted 12 September, 2016; originally announced September 2016.

  14. Efficient Estimation of k for the Nearest Neighbors Class of Methods

    Authors: Aleksander Lodwich, Faisal Shafait, Thomas Breuel

    Abstract: The k Nearest Neighbors (kNN) method has received much attention in the past decades, where some theoretical bounds on its performance were identified and where practical optimizations were proposed for making it work fairly well in high dimensional spaces and on large datasets. From countless experiments of the past it became widely accepted that the value of k has a significant impact on the per… ▽ More

    Submitted 13 June, 2016; v1 submitted 8 June, 2016; originally announced June 2016.

    Comments: Technical Report, 16p, alternative source: http://lodwich.net/Science.html

  15. arXiv:1603.04871  [pdf, other

    cs.CV

    Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation

    Authors: Zhicheng Yan, Hao Zhang, Yangqing Jia, Thomas Breuel, Yizhou Yu

    Abstract: State-of-the-art results of semantic segmentation are established by Fully Convolutional neural Networks (FCNs). FCNs rely on cascaded convolutional and pooling layers to gradually enlarge the receptive fields of neurons, resulting in an indirect way of modeling the distant contextual dependence. In this work, we advocate the use of spatially recurrent layers (i.e. ReNet layers) which directly cap… ▽ More

    Submitted 15 March, 2016; originally announced March 2016.

    Comments: 14 pages

  16. arXiv:1511.04401  [pdf, other

    cs.CV cs.CL cs.LG cs.NE

    Symbol Grounding Association in Multimodal Sequences with Missing Elements

    Authors: Federico Raue, Andreas Dengel, Thomas M. Breuel, Marcus Liwicki

    Abstract: In this paper, we extend a symbolic association framework for being able to handle missing elements in multimodal sequences. The general scope of the work is the symbolic associations of object-word map**s as it happens in language development in infants. In other words, two different representations of the same abstract concepts can associate in both directions. This scenario has been long inte… ▽ More

    Submitted 7 December, 2017; v1 submitted 13 November, 2015; originally announced November 2015.

    Comments: Under review on Journal of Artificial Intelligence Research (JAIR) -- Special Track on Deep Learning, Knowledge Representation, and Reasoning

  17. arXiv:1508.02792  [pdf, ps, other

    cs.NE q-bio.NC

    Possible Mechanisms for Neural Reconfigurability and their Implications

    Authors: Thomas M. Breuel

    Abstract: The paper introduces a biologically and evolutionarily plausible neural architecture that allows a single group of neurons, or an entire cortical pathway, to be dynamically reconfigured to perform multiple, potentially very different computations. The paper shows that reconfigurability can account for the observed stochastic and distributed coding behavior of neurons and provides a parsimonious ex… ▽ More

    Submitted 11 August, 2015; originally announced August 2015.

    ACM Class: K.3.2

  18. arXiv:1508.02790  [pdf, other

    cs.NE cs.LG

    On the Convergence of SGD Training of Neural Networks

    Authors: Thomas M. Breuel

    Abstract: Neural networks are usually trained by some form of stochastic gradient descent (SGD)). A number of strategies are in common use intended to improve SGD optimization, such as learning rate schedules, momentum, and batching. These are motivated by ideas about the occurrence of local minima at different scales, valleys, and other phenomena in the objective function. Empirical results presented here… ▽ More

    Submitted 11 August, 2015; originally announced August 2015.

    ACM Class: K.3.2

  19. arXiv:1508.02788  [pdf, other

    cs.NE cs.LG

    The Effects of Hyperparameters on SGD Training of Neural Networks

    Authors: Thomas M. Breuel

    Abstract: The performance of neural network classifiers is determined by a number of hyperparameters, including learning rate, batch size, and depth. A number of attempts have been made to explore these parameters in the literature, and at times, to develop methods for optimizing them. However, exploration of parameter spaces has often been limited. In this note, I report the results of large scale experime… ▽ More

    Submitted 11 August, 2015; originally announced August 2015.

    ACM Class: K.3.2

  20. arXiv:1508.02774  [pdf, other

    cs.NE

    Benchmarking of LSTM Networks

    Authors: Thomas M. Breuel

    Abstract: LSTM (Long Short-Term Memory) recurrent neural networks have been highly successful in a number of application areas. This technical report describes the use of the MNIST and UW3 databases for benchmarking LSTM networks and explores the effect of different architectural and hyperparameter choices on performance. Significant findings include: (1) LSTM performance depends smoothly on learning rates,… ▽ More

    Submitted 11 August, 2015; originally announced August 2015.

    ACM Class: K.3.2

  21. arXiv:1009.3589  [pdf, other

    cs.LG cs.CV cs.NE

    Deep Self-Taught Learning for Handwritten Character Recognition

    Authors: Frédéric Bastien, Yoshua Bengio, Arnaud Bergeron, Nicolas Boulanger-Lewandowski, Thomas Breuel, Youssouf Chherawala, Moustapha Cisse, Myriam Côté, Dumitru Erhan, Jeremy Eustache, Xavier Glorot, Xavier Muller, Sylvain Pannetier Lebeuf, Razvan Pascanu, Salah Rifai, Francois Savard, Guillaume Sicard

    Abstract: Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of… ▽ More

    Submitted 18 September, 2010; originally announced September 2010.

    Report number: 1353, Dept. IRO, U. Montreal MSC Class: 68T05 ACM Class: I.2.6

  22. arXiv:0712.0137  [pdf, ps, other

    cs.CV

    View Based Methods can achieve Bayes-Optimal 3D Recognition

    Authors: Thomas M. Breuel

    Abstract: This paper proves that visual object recognition systems using only 2D Euclidean similarity measurements to compare object views against previously seen views can achieve the same recognition performance as observers having access to all coordinate information and able of using arbitrary 3D models internally. Furthermore, it demonstrates that such systems do not require more training views than… ▽ More

    Submitted 2 December, 2007; originally announced December 2007.

    ACM Class: I.4.8; I.5

  23. arXiv:0712.0136  [pdf, other

    cs.CV

    Learning View Generalization Functions

    Authors: Thomas M. Breuel

    Abstract: Learning object models from views in 3D visual object recognition is usually formulated either as a function approximation problem of a function describing the view-manifold of an object, or as that of learning a class-conditional density. This paper describes an alternative framework for learning in visual object recognition, that of learning the view-generalization function. Using the view-gen… ▽ More

    Submitted 2 December, 2007; originally announced December 2007.

    ACM Class: I.4.8; I.2.10

  24. arXiv:0712.0131  [pdf, other

    cs.CV

    Learning Similarity for Character Recognition and 3D Object Recognition

    Authors: Thomas M. Breuel

    Abstract: I describe an approach to similarity motivated by Bayesian methods. This yields a similarity function that is learnable using a standard Bayesian methods. The relationship of the approach to variable kernel and variable metric methods is discussed. The approach is related to variable kernel Experimental results on character recognition and 3D object recognition are presented..

    Submitted 2 December, 2007; originally announced December 2007.

  25. arXiv:0712.0130  [pdf, ps, other

    cs.LG

    On the Relationship between the Posterior and Optimal Similarity

    Authors: Thomas M. Breuel

    Abstract: For a classification problem described by the joint density $P(ω,x)$, models of $P(ω\eqω'|x,x')$ (the ``Bayesian similarity measure'') have been shown to be an optimal similarity measure for nearest neighbor classification. This paper analyzes demonstrates several additional properties of that conditional distribution. The paper first shows that we can reconstruct, up to class labels, the class… ▽ More

    Submitted 2 December, 2007; originally announced December 2007.

    ACM Class: I.5.3; I.5.2

  26. arXiv:0712.0121  [pdf, other

    cs.GR

    Efficient Binary and Run Length Morphology and its Application to Document Image Processing

    Authors: Thomas M. Breuel

    Abstract: This paper describes the implementation and evaluation of an open source library for mathematical morphology based on packed binary and run-length compressed images for document imaging applications. Abstractions and patterns useful in the implementation of the interval operations are described. A number of benchmarks and comparisons to bit-blit based implementations on standard document images… ▽ More

    Submitted 2 December, 2007; originally announced December 2007.

    ACM Class: I.4; I.4.10; I.7.4; I.7.5

  27. arXiv:cs/0703101  [pdf, other

    cs.IR cs.CC cs.CV

    A Note on Approximate Nearest Neighbor Methods

    Authors: Thomas M. Breuel

    Abstract: A number of authors have described randomized algorithms for solving the epsilon-approximate nearest neighbor problem. In this note I point out that the epsilon-approximate nearest neighbor property often fails to be a useful approximation property, since epsilon-approximate solutions fail to satisfy the necessary preconditions for using nearest neighbors for classification and related tasks.

    Submitted 21 March, 2007; originally announced March 2007.

    Comments: The report was originally written in 2005 and does not reference information after that date