Skip to main content

Showing 1–9 of 9 results for author: Heigold, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.11093  [pdf, other

    cs.CV cs.AI cs.LG

    Video OWL-ViT: Temporally-consistent open-world localization in video

    Authors: Georg Heigold, Matthias Minderer, Alexey Gritsenko, Alex Bewley, Daniel Keysers, Mario Lučić, Fisher Yu, Thomas Kipf

    Abstract: We present an architecture and a training recipe that adapts pre-trained open-world image models to localization in videos. Understanding the open visual world (without being constrained by fixed label spaces) is crucial for many real-world vision tasks. Contrastive pre-training on large image-text datasets has recently led to significant improvements for image-level tasks. For more structured tas… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  2. arXiv:2111.12594  [pdf, other

    cs.CV cs.LG stat.ML

    Conditional Object-Centric Learning from Video

    Authors: Thomas Kipf, Gamaleldin F. Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff

    Abstract: Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models with object-centric inductive biases can learn to segment and represent meaningful objects from the statistical structure of the data alone without the need for… ▽ More

    Submitted 15 March, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: Published at ICLR 2022. Project page at https://slot-attention-video.github.io/

  3. arXiv:2103.15691  [pdf, other

    cs.CV

    ViViT: A Video Vision Transformer

    Authors: Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid

    Abstract: We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification. Our model extracts spatio-temporal tokens from the input video, which are then encoded by a series of transformer layers. In order to handle the long sequences of tokens encountered in video, we propose several, efficient variants of our model which factorise t… ▽ More

    Submitted 1 November, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: ICCV 2021. Code at https://github.com/google-research/scenic/tree/main/scenic/projects/vivit

  4. arXiv:2010.11929  [pdf, other

    cs.CV cs.AI cs.LG

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Authors: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

    Abstract: While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while kee** their overall structure in place. We show that this reliance on CNNs is not nece… ▽ More

    Submitted 3 June, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: Fine-tuning code and pre-trained models are available at https://github.com/google-research/vision_transformer. ICLR camera-ready version with 2 small modifications: 1) Added a discussion of CLS vs GAP classifier in the appendix, 2) Fixed an error in exaFLOPs computation in Figure 5 and Table 6 (relative performance of models is basically not affected)

  5. arXiv:2006.15055  [pdf, other

    cs.LG cs.CV stat.ML

    Object-Centric Learning with Slot Attention

    Authors: Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf

    Abstract: Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with pe… ▽ More

    Submitted 14 October, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/google-research/google-research/tree/master/slot_attention

  6. arXiv:1708.09157  [pdf, other

    cs.CL

    Cross-lingual, Character-Level Neural Morphological Tagging

    Authors: Ryan Cotterell, Georg Heigold

    Abstract: Even for common NLP tasks, sufficient supervision is not available in many languages -- morphological tagging is no exception. In the work presented here, we explore a transfer learning scheme, whereby we train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together. Learning joint character representations among mu… ▽ More

    Submitted 6 June, 2024; v1 submitted 30 August, 2017; originally announced August 2017.

    Comments: Published as a conference paper at EMNLP 2017; Fixed minor typos and cleaned up formatting

  7. arXiv:1704.04441  [pdf, other

    cs.CL

    How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?

    Authors: Georg Heigold, Günter Neumann, Josef van Genabith

    Abstract: This paper investigates the robustness of NLP against perturbed word forms. While neural approaches can achieve (almost) human-like accuracy for certain tasks and conditions, they often are sensitive to small changes in the input such as non-canonical input (e.g., typos). Yet both stability and robustness are desired properties in applications involving user-generated content, and the more as huma… ▽ More

    Submitted 14 April, 2017; originally announced April 2017.

    Comments: 9 pages

  8. arXiv:1606.06640  [pdf, other

    cs.CL

    Neural Morphological Tagging from Characters for Morphologically Rich Languages

    Authors: Georg Heigold, Guenter Neumann, Josef van Genabith

    Abstract: This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. We systematically explore a variety of neural architectures (DNN, CNN, CNNHighway, LSTM, BLSTM) to obtain character-based word vectors combined with bidirectional LSTMs to model across-word context in an end-to-end setting. We explore supplementary use of word-based vector… ▽ More

    Submitted 21 June, 2016; originally announced June 2016.

  9. arXiv:1509.08062  [pdf, ps, other

    cs.LG cs.SD

    End-to-End Text-Dependent Speaker Verification

    Authors: Georg Heigold, Ignacio Moreno, Samy Bengio, Noam Shazeer

    Abstract: In this paper we present a data-driven, integrated approach to speaker verification, which maps a test utterance and a few reference utterances directly to a single score for verification and jointly optimizes the system's components using the same evaluation protocol and metric as at test time. Such an approach will result in simple and efficient systems, requiring little domain-specific knowledg… ▽ More

    Submitted 27 September, 2015; originally announced September 2015.

    Comments: submitted to ICASSP 2016