Skip to main content

Showing 1–36 of 36 results for author: Teney, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17139  [pdf, other

    cs.CV cs.AI cs.LG

    Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

    Authors: Cristian Rodriguez-Opazo, Ehsan Abbasnejad, Damien Teney, Edison Marrese-Taylor, Hamed Damirchi, Anton van den Hengel

    Abstract: Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various architectures, from vision transformers (ViTs) to convolutional networks (ResNets) have been trained with CLIP to serve as general solutions to diverse vision tasks. This paper explores the differences across various CLIP-trained vision backbones. Despite using the same data an… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2312.14400

  2. arXiv:2405.15145  [pdf, other

    cs.AI cs.CL cs.MA

    CulturePark: Boosting Cross-cultural Understanding in Large Language Models

    Authors: Cheng Li, Damien Teney, Linyi Yang, Qingsong Wen, Xing Xie, **dong Wang

    Abstract: Cultural bias is pervasive in many large language models (LLMs), largely due to the deficiency of data representative of different cultures. Typically, cultural datasets and benchmarks are constructed either by extracting subsets of existing datasets or by aggregating from platforms such as Wikipedia and social media. However, these approaches are highly dependent on real-world data and human anno… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Technical report; 28 pages

  3. arXiv:2403.02241  [pdf, other

    cs.LG cs.AI cs.CV

    Neural Redshift: Random Networks are not Random Functions

    Authors: Damien Teney, Armand Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad

    Abstract: Our understanding of the generalization capabilities of neural networks (NNs) is still incomplete. Prevailing explanations are based on implicit biases of gradient descent (GD) but they cannot account for the capabilities of models from gradient-free methods nor the simplicity bias recently observed in untrained networks. This paper seeks other sources of generalization in NNs. Findings. To unde… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  4. arXiv:2311.17949  [pdf, other

    cs.CV

    Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines

    Authors: Hamed Damirchi, Cristian Rodríguez-Opazo, Ehsan Abbasnejad, Damien Teney, Javen Qinfeng Shi, Stephen Gould, Anton van den Hengel

    Abstract: Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. The Web likely contains the information necessary to excel on any specific application, but identifying the right data a priori is challenging. This paper shows how to leverage recent advances in NLP and multi-modal le… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  5. arXiv:2311.16176  [pdf, other

    cs.LG cs.AI cs.CV

    Mitigating Biases with Diverse Ensembles and Diffusion Models

    Authors: Luca Scimeca, Alexander Rubinstein, Damien Teney, Seong Joon Oh, Armand Mihai Nicolicioiu, Yoshua Bengio

    Abstract: Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as shortcut learning, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs) to mitigate this form of bias. We show that at particular… ▽ More

    Submitted 6 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2310.02230

  6. arXiv:2310.05143  [pdf, other

    cs.AI cs.LG

    ZooPFL: Exploring Black-box Foundation Models for Personalized Federated Learning

    Authors: Wang Lu, Hao Yu, **dong Wang, Damien Teney, Haohan Wang, Yiqiang Chen, Qiang Yang, Xing Xie, Xiangyang Ji

    Abstract: When personalized federated learning (FL) meets large foundation models, new challenges arise from various limitations in resources. In addition to typical limitations such as data, computation, and communication costs, access to the models is also often limited. This paper endeavors to solve both the challenges of limited resources and personalization. i.e., distribution shifts between clients. T… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Technical report; 26 pages; code will be available at https://github.com/microsoft/PersonalizedFL

  7. arXiv:2310.02230  [pdf, other

    cs.CV cs.AI

    Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks

    Authors: Luca Scimeca, Alexander Rubinstein, Armand Mihai Nicolicioiu, Damien Teney, Yoshua Bengio

    Abstract: Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to shortcut learning phenomena, where a model may rely on erroneous, easy-to-learn, cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs). We discover tha… ▽ More

    Submitted 18 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at Neural Information Processing Systems(NeurIPS) 2023 - Workshop on Diffusion Models

  8. arXiv:2308.16274  [pdf, other

    cs.CV cs.LG

    Learning Diverse Features in Vision Transformers for Improved Generalization

    Authors: Armand Mihai Nicolicioiu, Andrei Liviu Nicolicioiu, Bogdan Alexe, Damien Teney

    Abstract: Deep learning models often rely only on a small set of features even when there is a rich set of predictive signals in the training data. This makes models brittle and sensitive to distribution shifts. In this work, we first examine vision transformers (ViTs) and find that they tend to extract robust and spurious features with distinct attention heads. As a result of this modularity, their perform… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 2023 ICML Workshop on Spurious Correlations, Invariance and Stability

    Report number: R01

  9. arXiv:2305.16817  [pdf, other

    cs.LG

    Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup

    Authors: Damien Teney, **dong Wang, Ehsan Abbasnejad

    Abstract: Mixup is a highly successful technique to improve generalization of neural networks by augmenting the training data with combinations of random pairs. Selective mixup is a family of methods that apply mixup to specific pairs, e.g. only combining examples across classes or domains. These methods have claimed remarkable improvements on benchmarks with distribution shifts, but their mechanisms and li… ▽ More

    Submitted 2 June, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  10. arXiv:2305.16304  [pdf, other

    cs.CV cs.IR cs.LG

    Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

    Authors: Zheyuan Liu, Weixuan Sun, Damien Teney, Stephen Gould

    Abstract: Composed image retrieval aims to find an image that best matches a given multi-modal user query consisting of a reference image and text pair. Existing methods commonly pre-compute image embeddings over the entire corpus and compare these to a reference image embedding modified by the query text at test time. Such a pipeline is very efficient at test time since fast vector distances can be used to… ▽ More

    Submitted 29 January, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted at TMLR, 19 pages, 8 figures

  11. arXiv:2305.12563  [pdf, other

    cs.CL cs.LG

    A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers

    Authors: Jordan Meadows, Marco Valentino, Damien Teney, Andre Freitas

    Abstract: This paper proposes a methodology for generating and perturbing detailed derivations of equations at scale, aided by a symbolic engine, to evaluate the generalisability of Transformers to out-of-distribution mathematical reasoning problems. Instantiating the framework in the context of sequence classification tasks, we compare the capabilities of GPT-4, GPT-3.5, and a canon of fine-tuned BERT mode… ▽ More

    Submitted 8 April, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: NAACL 2024

  12. arXiv:2303.16604  [pdf, other

    cs.CV cs.IR cs.LG

    Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

    Authors: Zheyuan Liu, Weixuan Sun, Yicong Hong, Damien Teney, Stephen Gould

    Abstract: Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes. Existing approaches to solving this challenging task learn a map** from the (reference image, modification text)-pair to an image embedding that is then matched against a large image corpus. One area that has not yet been expl… ▽ More

    Submitted 5 November, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: WACV 2024 accepted. 12 pages, 7 figures

  13. arXiv:2211.02291  [pdf, other

    cs.CV cs.AI cs.LG

    SelecMix: Debiased Learning by Contradicting-pair Sampling

    Authors: Inwoo Hwang, Sangjun Lee, Yunhyeok Kwak, Seong Joon Oh, Damien Teney, **-Hwa Kim, Byoung-Tak Zhang

    Abstract: Neural networks trained with ERM (empirical risk minimization) sometimes learn unintended decision rules, in particular when their training data is biased, i.e., when training labels are strongly correlated with undesirable features. To prevent a network from learning such features, recent methods augment training data such that examples displaying spurious correlations (i.e., bias-aligned example… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022

  14. arXiv:2209.00613  [pdf, other

    cs.LG cs.CV

    ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets

    Authors: Damien Teney, Yong Lin, Seong Joon Oh, Ehsan Abbasnejad

    Abstract: Several studies have compared the in-distribution (ID) and out-of-distribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation and some surprisingly never even observe an inverse correlation indicative of a necessary trade-off. The possibility of inverse patterns is important to determine whether ID performance can serve as a proxy for OOD genera… ▽ More

    Submitted 19 May, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

  15. arXiv:2207.02598  [pdf, other

    cs.LG cs.CV

    Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

    Authors: Damien Teney, Maxime Peyrard, Ehsan Abbasnejad

    Abstract: Machine learning (ML) models are typically optimized for their accuracy on a given dataset. However, this predictive criterion rarely captures all desirable properties of a model, in particular how well it matches a domain expert's understanding of a task. Underspecification refers to the existence of multiple models that are indistinguishable in their in-domain accuracy, even though they differ i… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Long version of a paper accepted at the 2022 European Conference on Computer Vision (ECCV)

  16. arXiv:2206.14355  [pdf, other

    cs.CV cs.CL cs.LG

    EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering

    Authors: Violetta Shevchenko, Ehsan Abbasnejad, Anthony Dick, Anton van den Hengel, Damien Teney

    Abstract: The availability of clean and diverse labeled data is a major roadblock for training models on complex tasks such as visual question answering (VQA). The extensive work on large vision-and-language models has shown that self-supervised learning is effective for pretraining multimodal interactions. In this technical report, we focus on visual representations. We review and evaluate self-supervised… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

  17. arXiv:2203.07034  [pdf, other

    cs.CV

    Active Learning by Feature Mixing

    Authors: Amin Parvaneh, Ehsan Abbasnejad, Damien Teney, Reza Haffari, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: The promise of active learning (AL) is to reduce labelling costs by selecting the most valuable examples to annotate from a pool of unlabelled data. Identifying these examples is especially challenging with high-dimensional data (e.g. images, videos) and in low-data regimes. In this paper, we propose a novel method for batch AL called ALFA-Mix. We identify unlabelled instances with sufficiently-di… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  18. arXiv:2108.04024  [pdf, other

    cs.CV cs.CL cs.IR

    Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models

    Authors: Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, Stephen Gould

    Abstract: We extend the task of composed image retrieval, where an input query consists of an image and short textual description of how to modify the image. Existing methods have only been applied to non-complex images within narrow domains, such as fashion products, thereby limiting the scope of study on in-depth visual reasoning in rich image and language contexts. To address this issue, we collect the C… ▽ More

    Submitted 9 August, 2021; originally announced August 2021.

    Comments: ICCV 2021. Dataset, code, and pre-trained models are released at https://cuberick-orion.github.io/CIRR/

  19. arXiv:2105.05612  [pdf, other

    cs.LG cs.CV

    Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization

    Authors: Damien Teney, Ehsan Abbasnejad, Simon Lucey, Anton van den Hengel

    Abstract: Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mecha… ▽ More

    Submitted 11 September, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

    Comments: CVPR 2022

  20. arXiv:2104.03149  [pdf, other

    cs.CV cs.AI

    Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

    Authors: Corentin Dancette, Remi Cadene, Damien Teney, Matthieu Cord

    Abstract: We introduce an evaluation methodology for visual question answering (VQA) to better diagnose cases of shortcut learning. These cases happen when a model exploits spurious statistical regularities to produce correct answers but does not actually deploy the desired behavior. There is a need to identify possible shortcuts in a dataset and assess their use before deploying a model in the real world.… ▽ More

    Submitted 1 September, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted at ICCV 2021. Code is available at https://github.com/cdancette/detect-shortcuts

  21. arXiv:2101.06013  [pdf, other

    cs.CV cs.LG

    Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge

    Authors: Violetta Shevchenko, Damien Teney, Anthony Dick, Anton van den Hengel

    Abstract: The limits of applicability of vision-and-language models are defined by the coverage of their training data. Tasks like vision question answering (VQA) often require commonsense and factual information beyond what can be learned from task-specific datasets. This paper investigates the injection of knowledge from general-purpose knowledge bases (KBs) into vision-and-language transformers. We use a… ▽ More

    Submitted 15 January, 2021; originally announced January 2021.

  22. arXiv:2005.09241  [pdf, other

    cs.CV cs.LG

    On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

    Authors: Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel

    Abstract: Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  23. arXiv:2005.01239  [pdf, other

    cs.CV cs.LG

    Visual Question Answering with Prior Class Semantics

    Authors: Violetta Shevchenko, Damien Teney, Anthony Dick, Anton van den Hengel

    Abstract: We present a novel mechanism to embed prior knowledge in a model for visual question answering. The open-set nature of the task is at odds with the ubiquitous approach of training of a fixed classifier. We show how to exploit additional information pertaining to the semantics of candidate answers. We extend the answer prediction process with a regression objective in a semantic space, in which we… ▽ More

    Submitted 3 May, 2020; originally announced May 2020.

  24. arXiv:2004.09034  [pdf, other

    cs.CV cs.LG

    Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

    Authors: Damien Teney, Ehsan Abbasnedjad, Anton van den Hengel

    Abstract: One of the primary challenges limiting the applicability of deep learning is its susceptibility to learning spurious correlations rather than the underlying mechanisms of the task of interest. The resulting failure to generalise cannot be addressed by simply using more data from the same distribution. We propose an auxiliary training objective that improves the generalization capabilities of neura… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

  25. arXiv:2002.11894  [pdf, other

    cs.CV

    Unshuffling Data for Improved Generalization

    Authors: Damien Teney, Ehsan Abbasnejad, Anton van den Hengel

    Abstract: Generalization beyond the training distribution is a core challenge in machine learning. The common practice of mixing and shuffling examples when training neural networks may not be optimal in this regard. We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization… ▽ More

    Submitted 20 November, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

  26. arXiv:1909.13471  [pdf, other

    cs.CV cs.LG

    On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints

    Authors: Damien Teney, Ehsan Abbasnejad, Anton van den Hengel

    Abstract: The knowledge that humans hold about a problem often extends far beyond a set of training data and output labels. While the success of deep learning mostly relies on supervised training, important properties cannot be inferred efficiently from end-to-end annotations alone, for example causal relations or domain-specific invariances. We present a general technique to supplement supervised training… ▽ More

    Submitted 16 November, 2019; v1 submitted 30 September, 2019; originally announced September 2019.

  27. arXiv:1907.12271  [pdf, other

    cs.CV

    V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices

    Authors: Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton van den Hengel

    Abstract: One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced. This is a critical concern because generalisation enables robust reasoning over unseen data, whereas leveraging superficial statistics is fragile to even small changes… ▽ More

    Submitted 29 July, 2019; originally announced July 2019.

  28. arXiv:1904.02865  [pdf, other

    cs.CV

    Actively Seeking and Learning from Live Data

    Authors: Damien Teney, Anton van den Hengel

    Abstract: One of the key limitations of traditional machine learning methods is their requirement for training data that exemplifies all the information to be learned. This is a particular problem for visual question answering methods, which may be asked questions about virtually anything. The approach we propose is a step toward overcoming this limitation by searching for the information required at test t… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

  29. arXiv:1711.08105  [pdf, other

    cs.CV

    Visual Question Answering as a Meta Learning Task

    Authors: Damien Teney, Anton van den Hengel

    Abstract: The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set seems unlikely, and representing it in a reasonable number of weights doubly so. We propose instead to approach VQA as a meta learning task, thus separating the q… ▽ More

    Submitted 21 November, 2017; originally announced November 2017.

  30. arXiv:1711.07280  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

    Authors: Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel

    Abstract: A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a… ▽ More

    Submitted 5 April, 2018; v1 submitted 20 November, 2017; originally announced November 2017.

    Comments: CVPR 2018 Spotlight presentation

  31. arXiv:1708.02711  [pdf, other

    cs.CV cs.CL

    Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

    Authors: Damien Teney, Peter Anderson, Xiaodong He, Anton van den Hengel

    Abstract: This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge. VQA is a task of significant importance for research in artificial intelligence, given its multimodal nature, clear evaluation protocol, and potential real-world applications. The performance of deep neural networks for VQA is very dependent on choices of architect… ▽ More

    Submitted 9 August, 2017; originally announced August 2017.

    Comments: Winner of the 2017 Visual Question Answering (VQA) Challenge at CVPR

  32. arXiv:1707.07998  [pdf, other

    cs.CV

    Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

    Authors: Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

    Abstract: Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions.… ▽ More

    Submitted 14 March, 2018; v1 submitted 25 July, 2017; originally announced July 2017.

    Comments: CVPR 2018 full oral, winner of the 2017 Visual Question Answering challenge

  33. arXiv:1611.05546  [pdf, other

    cs.CV cs.AI cs.CL

    Zero-Shot Visual Question Answering

    Authors: Damien Teney, Anton van den Hengel

    Abstract: Part of the appeal of Visual Question Answering (VQA) is its promise to answer new questions about previously unseen images. Most current methods demand training questions that illustrate every possible concept, and will therefore never achieve this capability, since the volume of required training data would be prohibitive. Answering general questions about images requires methods capable of Zero… ▽ More

    Submitted 20 November, 2016; v1 submitted 16 November, 2016; originally announced November 2016.

  34. arXiv:1609.05600  [pdf, other

    cs.CV cs.AI cs.CL

    Graph-Structured Representations for Visual Question Answering

    Authors: Damien Teney, Lingqiao Liu, Anton van den Hengel

    Abstract: This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the form of the question. CNN featu… ▽ More

    Submitted 30 March, 2017; v1 submitted 19 September, 2016; originally announced September 2016.

  35. arXiv:1607.05910  [pdf, other

    cs.CV

    Visual Question Answering: A Survey of Methods and Datasets

    Authors: Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton van den Hengel

    Abstract: Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires reasoning over visual elements of the image and general knowledge to infer the correct answer. In the first part of this survey, we examine the state of the art by c… ▽ More

    Submitted 20 July, 2016; originally announced July 2016.

    Comments: 25 pages

  36. arXiv:1601.07532  [pdf, other

    cs.CV

    Learning to Extract Motion from Videos in Convolutional Neural Networks

    Authors: Damien Teney, Martial Hebert

    Abstract: This paper shows how to extract dense optical flow from videos with a convolutional neural network (CNN). The proposed model constitutes a potential building block for deeper architectures to allow using motion without resorting to an external algorithm, \eg for recognition in videos. We derive our network architecture from signal processing principles to provide desired invariances to image contr… ▽ More

    Submitted 27 January, 2016; originally announced January 2016.