Skip to main content

Showing 1–22 of 22 results for author: Arandjelovic, R

.
  1. arXiv:2306.01667  [pdf, other

    cs.CV

    Towards In-context Scene Understanding

    Authors: Ivana Balažević, David Steiner, Nikhil Parthasarathy, Relja Arandjelović, Olivier J. Hénaff

    Abstract: In-context learning$\unicode{x2013}$the ability to configure a model's behavior with different prompts$\unicode{x2013}$has revolutionized the field of natural language processing, alleviating the need for task-specific models and paving the way for generalist models capable of assisting with any query. Computer vision, in contrast, has largely stayed in the former regime: specialized decoders and… ▽ More

    Submitted 31 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

  2. arXiv:2303.13518  [pdf, other

    cs.CV cs.AI cs.LG

    Three ways to improve feature alignment for open vocabulary detection

    Authors: Relja Arandjelović, Alex Andonian, Arthur Mensch, Olivier J. Hénaff, Jean-Baptiste Alayrac, Andrew Zisserman

    Abstract: The core problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes. Previous approaches train the feature pyramid and detection head from scratch, which breaks the vision-text feature alignment established during pretraining, and struggles to prevent the language model from forgetting unseen classes. We propose t… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

  3. arXiv:2209.15589  [pdf, other

    cs.CV cs.LG

    Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

    Authors: Skanda Koppula, Yazhe Li, Evan Shelhamer, Andrew Jaegle, Nikhil Parthasarathy, Relja Arandjelovic, João Carreira, Olivier Hénaff

    Abstract: Self-supervised methods have achieved remarkable success in transfer learning, often achieving the same or better accuracy than supervised pre-training. Most prior work has done so by increasing pre-training computation by adding complex data augmentation, multiple views, or lengthy training schedules. In this work, we investigate a related, but orthogonal question: given a fixed FLOP budget, what… ▽ More

    Submitted 18 October, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: 11 pages. 36th Conference on Neural Information Processing Systems, Workshop on Self-Supervised Learning (2022)

  4. arXiv:2203.08777  [pdf, other

    cs.CV cs.AI cs.LG

    Object discovery and representation networks

    Authors: Olivier J. Hénaff, Skanda Koppula, Evan Shelhamer, Daniel Zoran, Andrew Jaegle, Andrew Zisserman, João Carreira, Relja Arandjelović

    Abstract: The promise of self-supervised learning (SSL) is to leverage large amounts of unlabeled data to solve complex tasks. While there has been excellent progress with simple, image-level learning, recent methods have shown the advantage of including knowledge of image structure. However, by introducing hand-crafted image segmentations to define regions of interest, or specialized augmentation strategie… ▽ More

    Submitted 27 July, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: European Conference on Computer Vision (ECCV) 2022

  5. arXiv:2202.10890  [pdf, other

    cs.CV

    HiP: Hierarchical Perceiver

    Authors: Joao Carreira, Skanda Koppula, Daniel Zoran, Adria Recasens, Catalin Ionescu, Olivier Henaff, Evan Shelhamer, Relja Arandjelovic, Matt Botvinick, Oriol Vinyals, Karen Simonyan, Andrew Zisserman, Andrew Jaegle

    Abstract: General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by using exclusively global attention operations. This however hinders them from scaling up to the inputs sizes required to process raw high-resolution images or video. In this paper, we show that some degree of l… ▽ More

    Submitted 3 November, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

  6. arXiv:2112.03243  [pdf, other

    cs.CV

    Input-level Inductive Biases for 3D Reconstruction

    Authors: Wang Yifan, Carl Doersch, Relja Arandjelović, João Carreira, Andrew Zisserman

    Abstract: Much of the recent progress in 3D vision has been driven by the development of specialized architectures that incorporate geometrical inductive biases. In this paper we tackle 3D reconstruction using a domain agnostic architecture and study how instead to inject the same type of inductive biases directly as extra inputs to the model. This approach makes it possible to apply existing general models… ▽ More

    Submitted 19 March, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: CVPR 2022, including supplemental material

  7. arXiv:2106.05264  [pdf, other

    cs.CV cs.GR cs.LG

    NeRF in detail: Learning to sample for view synthesis

    Authors: Relja Arandjelović, Andrew Zisserman

    Abstract: Neural radiance fields (NeRF) methods have demonstrated impressive novel view synthesis performance. The core approach is to render individual rays by querying a neural network at points sampled along the ray to obtain the density and colour of the sampled points, and integrating this information using the rendering equation. Since dense sampling is computationally prohibitive, a common solution i… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

  8. arXiv:2006.16228  [pdf, other

    cs.CV

    Self-Supervised MultiModal Versatile Networks

    Authors: Jean-Baptiste Alayrac, Adrià Recasens, Rosalia Schneider, Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, Andrew Zisserman

    Abstract: Videos are a rich source of multi-modal supervision. In this work, we learn representations using self-supervision by leveraging three modalities naturally present in videos: visual, audio and language streams. To this end, we introduce the notion of a multimodal versatile network -- a network that can ingest multiple modalities and whose representations enable downstream tasks in multiple modalit… ▽ More

    Submitted 30 October, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: To appear in the Thirty-Fourth Annual Conference on Neural Information Processing Systems (NeurIPS 2020)

  9. arXiv:2004.10566  [pdf, other

    cs.CV

    Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions

    Authors: Ignacio Rocco, Relja Arandjelović, Josef Sivic

    Abstract: In this work we target the problem of estimating accurately localised correspondences between a pair of images. We adopt the recent Neighbourhood Consensus Networks that have demonstrated promising performance for difficult correspondence problems and propose modifications to overcome their main limitations: large memory consumption, large inference time and poorly localised correspondences. Our p… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  10. arXiv:2003.11794  [pdf, other

    cs.CV

    Compact Deep Aggregation for Set Retrieval

    Authors: Yujie Zhong, Relja Arandjelović, Andrew Zisserman

    Abstract: The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors. We focus on a specific example of this general problem -- that of retrieving images containing multiple faces from a large scale dataset of images. Here the set consists of the face descriptors in… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

    Comments: 20 pages

  11. arXiv:1910.11306  [pdf, other

    cs.CV cs.NE eess.IV

    Controllable Attention for Structured Layered Video Decomposition

    Authors: Jean-Baptiste Alayrac, João Carreira, Relja Arandjelović, Andrew Zisserman

    Abstract: The objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to. For example, to be able to separate reflections, transparency or object motion. We make the following three contributions: (i) we introduce a new structured neural network architecture that explicitly incorporates layers (as spatial masks) into its desig… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: In ICCV 2019

  12. arXiv:1905.11369  [pdf, other

    cs.CV cs.AI cs.LG

    Object Discovery with a Copy-Pasting GAN

    Authors: Relja Arandjelović, Andrew Zisserman

    Abstract: We tackle the problem of object discovery, where objects are segmented for a given input image, and the system is trained without using any direct supervision whatsoever. A novel copy-pasting GAN framework is proposed, where the generator learns to discover an object in one image by compositing it into another image such that the discriminator cannot tell that the resulting image is fake. After ca… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  13. arXiv:1810.12715  [pdf, other

    cs.LG cs.CR stat.ML

    On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

    Authors: Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, Pushmeet Kohli

    Abstract: Recent work has shown that it is possible to train deep neural networks that are provably robust to norm-bounded adversarial perturbations. Most of these methods are based on minimizing an upper bound on the worst-case loss over all possible adversarial perturbations. While these techniques show promise, they often result in difficult optimization procedures that remain hard to scale to larger net… ▽ More

    Submitted 29 August, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: [v2] Best paper at NeurIPS SECML 2018 Workshop [v4] Accepted at ICCV 2019 under the title "Scalable Verified Training for Provably Robust Image Classification"

  14. arXiv:1810.10510  [pdf, other

    cs.CV cs.LG

    Neighbourhood Consensus Networks

    Authors: Ignacio Rocco, Mircea Cimpoi, Relja Arandjelović, Akihiko Torii, Tomas Pajdla, Josef Sivic

    Abstract: We address the problem of finding reliable dense correspondences between a pair of images. This is a challenging task due to strong appearance differences between the corresponding scene elements and ambiguities generated by repetitive patterns. The contributions of this work are threefold. First, inspired by the classic idea of disambiguating feature matches using semi-local constraints, we devel… ▽ More

    Submitted 29 November, 2018; v1 submitted 24 October, 2018; originally announced October 2018.

    Comments: In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018)

  15. arXiv:1810.09951  [pdf, other

    cs.CV

    GhostVLAD for set-based face recognition

    Authors: Yujie Zhong, Relja Arandjelović, Andrew Zisserman

    Abstract: The objective of this paper is to learn a compact representation of image sets for template-based face recognition. We make the following contributions: first, we propose a network architecture which aggregates and embeds the face descriptors produced by deep convolutional neural networks into a compact fixed-length representation. This compact representation requires minimal memory storage and en… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

    Comments: Accepted by ACCV 2018

  16. arXiv:1805.10265  [pdf, other

    cs.LG stat.ML

    Training verified learners with learned verifiers

    Authors: Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O'Donoghue, Jonathan Uesato, Pushmeet Kohli

    Abstract: This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i.e., networks that provably satisfy some desired input-output properties. The key idea is to simultaneously train two networks: a predictor network that performs the task at hand,e.g., predicting labels given inputs, and a verifier network that computes a bound on how well t… ▽ More

    Submitted 29 May, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

  17. arXiv:1712.06861  [pdf, other

    cs.CV cs.LG

    End-to-end weakly-supervised semantic alignment

    Authors: Ignacio Rocco, Relja Arandjelović, Josef Sivic

    Abstract: We tackle the task of semantic alignment where the goal is to compute dense semantic correspondence aligning two images depicting objects of the same category. This is a challenging task due to large intra-class variation, changes in viewpoint and background clutter. We present the following three principal contributions. First, we develop a convolutional neural network architecture for semantic a… ▽ More

    Submitted 24 April, 2018; v1 submitted 19 December, 2017; originally announced December 2017.

    Comments: In 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018)

  18. arXiv:1712.06651  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Objects that Sound

    Authors: Relja Arandjelović, Andrew Zisserman

    Abstract: In this paper our objectives are, first, networks that can embed audio and visual inputs into a common space that is suitable for cross-modal retrieval; and second, a network that can localize the object that sounds in an image, given the audio signal. We achieve both these objectives by training from unlabelled video using only audio-visual correspondence (AVC) as the objective function. This is… ▽ More

    Submitted 25 July, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

    Comments: Appears in: European Conference on Computer Vision (ECCV) 2018

  19. arXiv:1705.08168  [pdf, other

    cs.CV cs.LG

    Look, Listen and Learn

    Authors: Relja Arandjelović, Andrew Zisserman

    Abstract: We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself -- the correspondence between the visual and the audio streams, and we introduce a novel "Audio-Visual Correspondence" learning task that makes use of this. Training visual and audio networks f… ▽ More

    Submitted 1 August, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: Appears in: IEEE International Conference on Computer Vision (ICCV) 2017

  20. arXiv:1703.05593  [pdf, other

    cs.CV cs.LG

    Convolutional neural network architecture for geometric matching

    Authors: Ignacio Rocco, Relja Arandjelović, Josef Sivic

    Abstract: We address the problem of determining correspondences between two images in agreement with a geometric model such as an affine or thin-plate spline transformation, and estimating its parameters. The contributions of this work are three-fold. First, we propose a convolutional neural network architecture for geometric matching. The architecture is based on three main components that mimic the standa… ▽ More

    Submitted 13 April, 2017; v1 submitted 16 March, 2017; originally announced March 2017.

    Comments: In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)

  21. arXiv:1606.01550  [pdf, other

    cs.CV cs.IR

    Pairwise Quantization

    Authors: Artem Babenko, Relja Arandjelović, Victor Lempitsky

    Abstract: We consider the task of lossy compression of high-dimensional vectors through quantization. We propose the approach that learns quantization parameters by minimizing the distortion of scalar products and squared distances between pairs of points. This is in contrast to previous works that obtain these parameters through the minimization of the reconstruction error of individual points. The propose… ▽ More

    Submitted 5 June, 2016; originally announced June 2016.

  22. arXiv:1511.07247  [pdf, other

    cs.CV cs.LG

    NetVLAD: CNN architecture for weakly supervised place recognition

    Authors: Relja Arandjelović, Petr Gronat, Akihiko Torii, Tomas Pajdla, Josef Sivic

    Abstract: We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following three principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this archite… ▽ More

    Submitted 2 May, 2016; v1 submitted 23 November, 2015; originally announced November 2015.

    Comments: Appears in: IEEE Computer Vision and Pattern Recognition (CVPR) 2016