Skip to main content

Showing 1–44 of 44 results for author: Asano, Y M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12658  [pdf, other

    cs.CV cs.LG

    Federated Learning with a Single Shared Image

    Authors: Sunny Soni, Aaqib Saeed, Yuki M. Asano

    Abstract: Federated Learning (FL) enables multiple machines to collaboratively train a machine learning model without sharing of private training data. Yet, especially for heterogeneous models, a key bottleneck remains the transfer of knowledge gained from each client model with the server. One popular method, FedDF, uses distillation to tackle this task with the use of a common, shared dataset on which pre… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 8 Pages, 3 Figures, Appendix 4 Pages, CVPRW 2024

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7782-7790

  2. arXiv:2405.17423  [pdf, other

    cs.CV cs.CL

    Privacy-Aware Visual Language Models

    Authors: Laurens Samson, Nimrod Barazani, Sennay Ghebreab, Yuki M. Asano

    Abstract: This paper aims to advance our understanding of how Visual Language Models (VLMs) handle privacy-sensitive information, a crucial concern as these technologies become integral to everyday life. To this end, we introduce a new benchmark PrivBench, which contains images from 8 sensitive categories such as passports, or fingerprints. We evaluate 10 state-of-the-art VLMs on this benchmark and observe… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: preprint

  3. arXiv:2405.14862  [pdf, other

    cs.CL

    Bitune: Bidirectional Instruction-Tuning

    Authors: Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

    Abstract: We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirectional attention to the prompt, to obtain a better representation of the query or instruction. We realize this by introducing two sets of parameters, for which we apply parameter-efficient finetuning… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2404.17202  [pdf, other

    cs.CV

    Self-supervised visual learning in the low-data regime: a comparative evaluation

    Authors: Sotirios Konstantakos, Despina Ioanna Chalkiadaki, Ioannis Mademlis, Yuki M. Asano, Efstratios Gavves, Georgios Th. Papadopoulos

    Abstract: Self-Supervised Learning (SSL) is a valuable and robust training methodology for contemporary Deep Neural Networks (DNNs), enabling unsupervised pretraining on a `pretext task' that does not require ground-truth labels/annotation. This allows efficient representation learning from massive amounts of unlabeled training data, which in turn leads to increased accuracy in a `downstream task' by exploi… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  5. arXiv:2404.13381  [pdf, other

    cs.LG cs.CR cs.MA q-bio.PE

    DNA: Differentially private Neural Augmentation for contact tracing

    Authors: Rob Romijnders, Christos Louizos, Yuki M. Asano, Max Welling

    Abstract: The COVID19 pandemic had enormous economic and societal consequences. Contact tracing is an effective way to reduce infection rates by detecting potential virus carriers early. However, this was not generally adopted in the recent pandemic, and privacy concerns are cited as the most important reason. We substantially improve the privacy guarantees of the current state of the art in decentralized c… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: Privacy Regulation and Protection in Machine Learning Workshop at ICLR 2024

  6. arXiv:2402.14957  [pdf, other

    cs.CV cs.LG

    The Common Stability Mechanism behind most Self-Supervised Learning Approaches

    Authors: Abhishek Jha, Matthew B. Blaschko, Yuki M. Asano, Tinne Tuytelaars

    Abstract: Last couple of years have witnessed a tremendous progress in self-supervised learning (SSL), the success of which can be attributed to the introduction of useful inductive biases in the learning process to learn meaningful visual representations while avoiding collapse. These inductive biases and constraints manifest themselves in the form of different optimization formulations in the SSL techniqu… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Additional visualizations (.gif): https://github.com/abskjha/CenterVectorSSL

  7. arXiv:2402.08657  [pdf, other

    cs.CV

    PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

    Authors: Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, Yuki M. Asano

    Abstract: Vision-Language Models (VLMs), such as Flamingo and GPT-4V, have shown immense potential by integrating large language models with vision systems. Nevertheless, these models face challenges in the fundamental computer vision task of object localisation, due to their training on multimodal data containing mostly captions without explicit spatial grounding. While it is possible to construct custom,… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  8. arXiv:2401.05735  [pdf, other

    cs.CV cs.LG

    Object-Centric Diffusion for Efficient Video Editing

    Authors: Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian

    Abstract: Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames, either in the form of diffusion inversion and/or cross-frame attention. In this paper, we c… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  9. arXiv:2312.17244  [pdf, other

    cs.LG cs.CL

    The LLM Surgeon

    Authors: Tycho F. A. van der Ouderaa, Markus Nagel, Mart van Baalen, Yuki M. Asano, Tijmen Blankevoort

    Abstract: State-of-the-art language models are becoming increasingly large in an effort to achieve the highest performance on large corpora of available textual data. However, the sheer size of the Transformer architectures makes it difficult to deploy models within computational, environmental or device-specific constraints. We explore data-driven compression of existing pretrained models as an alternative… ▽ More

    Submitted 20 March, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  10. arXiv:2312.11581  [pdf, other

    cs.CR cs.AI cs.LG

    Protect Your Score: Contact Tracing With Differential Privacy Guarantees

    Authors: Rob Romijnders, Christos Louizos, Yuki M. Asano, Max Welling

    Abstract: The pandemic in 2020 and 2021 had enormous economic and societal consequences, and studies show that contact tracing algorithms can be key in the early containment of the virus. While large strides have been made towards more effective contact tracing algorithms, we argue that privacy concerns currently hold deployment back. The essence of a contact tracing algorithm constitutes the communication… ▽ More

    Submitted 15 February, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted to The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

  11. arXiv:2312.08895  [pdf, other

    cs.CV

    Motion Flow Matching for Human Motion Synthesis and Editing

    Authors: Vincent Tao Hu, Wenzhe Yin, **chuan Ma, Yunlu Chen, Basura Fernando, Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek

    Abstract: Human motion synthesis is a fundamental task in computer animation. Recent methods based on diffusion models or GPT structure demonstrate commendable performance but exhibit drawbacks in terms of slow sampling speeds and error accumulation. In this paper, we propose \emph{Motion Flow Matching}, a novel generative model designed for human motion generation featuring efficient sampling and effective… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: WIP

  12. arXiv:2312.08892  [pdf, other

    cs.CV

    VaLID: Variable-Length Input Diffusion for Novel View Synthesis

    Authors: Shijie Li, Farhad G. Zanjani, Haitam Ben Yahia, Yuki M. Asano, Juergen Gall, Amirhossein Habibian

    Abstract: Novel View Synthesis (NVS), which tries to produce a realistic image at the target view given source view images and their corresponding poses, is a fundamental problem in 3D Vision. As this task is heavily under-constrained, some recent work, like Zero123, tries to solve this problem with generative modeling, specifically using pre-trained diffusion models. Although this strategy generalizes well… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: paper and supplementary material

  13. arXiv:2312.08825  [pdf, other

    cs.CV

    Guided Diffusion from Self-Supervised Diffusion Features

    Authors: Vincent Tao Hu, Yunlu Chen, Mathilde Caron, Yuki M. Asano, Cees G. M. Snoek, Bjorn Ommer

    Abstract: Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or classifier pretraining. That is why guidance was harnessed from self-supervised learning backbones, like DINO. However, recent studies have revealed that the feature representation derived from diffusion model itself is discriminative for numerous downstream tasks a… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Work In Progress

  14. arXiv:2310.11454  [pdf, other

    cs.CL

    VeRA: Vector-based Random Matrix Adaptation

    Authors: Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

    Abstract: Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameter… ▽ More

    Submitted 16 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024, website: https://dkopi.github.io/vera

  15. arXiv:2310.08584  [pdf, other

    cs.CV

    Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video

    Authors: Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano, Yannis Avrithis

    Abstract: Self-supervised learning has unlocked the potential of scaling up pretraining to billions of images, since annotation is unnecessary. But are we making the best use of data? How more economical can we be? In this work, we attempt to answer this question by making two contributions. First, we investigate first-person videos and introduce a "Walking Tours" dataset. These videos are high-resolution,… ▽ More

    Submitted 23 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024 (Best paper honorable mention). Project Page: https://shashankvkt.github.io/dora

  16. arXiv:2310.00500  [pdf, other

    cs.CV

    Self-Supervised Open-Ended Classification with Small Visual Language Models

    Authors: Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Cees G. M. Snoek, Marcel Worring, Yuki M. Asano

    Abstract: We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models. Our approach imitates image captions in a self-supervised way based on clustering a large pool of images followed by assigning semantically-unrelated names to clusters. By doing so, we construct a training signal consisting of inter… ▽ More

    Submitted 6 December, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

  17. arXiv:2308.11796  [pdf, other

    cs.CV

    Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations

    Authors: Mohammadreza Salehi, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano

    Abstract: Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos, this information-rich source has been largely overlooked. Our paper aims to address this gap by proposing a novel approach that incorporates temporal consisten… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  18. arXiv:2308.07350  [pdf, other

    cs.LG cs.AI

    Efficient Neural PDE-Solvers using Quantization Aware Training

    Authors: Winfried van den Dool, Tijmen Blankevoort, Max Welling, Yuki M. Asano

    Abstract: In the past years, the application of neural networks as an alternative to classical numerical methods to solve Partial Differential Equations has emerged as a potential paradigm shift in this century-old mathematical field. However, in terms of practical applicability, computational cost remains a substantial bottleneck. Classical approaches try to mitigate this challenge by limiting the spatial… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted at the ICCV 2023 Workshop on Resource Efficient Deep Learning for Computer Vision

  19. arXiv:2307.08727  [pdf, other

    cs.CV

    Learning to Count without Annotations

    Authors: Lukas Knobel, Tengda Han, Yuki M. Asano

    Abstract: While recent supervised methods for reference-based object counting continue to improve the performance on benchmark datasets, they have to rely on small datasets due to the cost associated with manually annotating dozens of objects in images. We propose UnCounTR, a model that can learn this task without requiring any manual annotations. To this end, we construct "Self-Collages", images with vario… ▽ More

    Submitted 29 March, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted at CVPR'24. Code available at https://github.com/lukasknobel/SelfCollages

  20. arXiv:2306.09643  [pdf, other

    cs.LG cs.AI stat.ME

    BISCUIT: Causal Representation Learning from Binary Interactions

    Authors: Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M. Asano, Taco Cohen, Efstratios Gavves

    Abstract: Identifying the causal variables of an environment and how to intervene on them is of core value in applications such as robotics and embodied AI. While an agent can commonly interact with the environment and may implicitly perturb the behavior of some of these causal variables, often the targets it affects remain unknown. In this paper, we show that causal variables can still be identified for ma… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Published in: Uncertainty in Artificial Intelligence (UAI 2023). Project page: https://phlippe.github.io/BISCUIT/

  21. arXiv:2304.00961  [pdf, other

    cs.CV

    Self-Ordering Point Clouds

    Authors: Pengwan Yang, Cees G. M. Snoek, Yuki M. Asano

    Abstract: In this paper we address the task of finding representative subsets of points in a 3D point cloud by means of a point-wise ordering. Only a few works have tried to address this challenging vision problem, all with the help of hard to obtain point and cloud labels. Different from these works, we introduce the task of point-wise ordering in 3D point clouds through self-supervision, which we call sel… ▽ More

    Submitted 10 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

  22. arXiv:2302.00353  [pdf, other

    cs.LG cs.CV

    Towards Label-Efficient Incremental Learning: A Survey

    Authors: Mert Kilickaya, Joost van de Weijer, Yuki M. Asano

    Abstract: The current dominant paradigm when building a machine learning model is to iterate over a dataset over and over until convergence. Such an approach is non-incremental, as it assumes access to all images of all categories at once. However, for many applications, non-incremental learning is unrealistic. To that end, researchers study incremental learning, where a learner is required to adapt to an i… ▽ More

    Submitted 11 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

  23. arXiv:2301.02240  [pdf, other

    cs.CV

    Skip-Attention: Improving Vision Transformers by Paying Less Attention

    Authors: Shashanka Venkataramanan, Amir Ghodrati, Yuki M. Asano, Fatih Porikli, Amirhossein Habibian

    Abstract: This work aims to improve the efficiency of vision transformers (ViT). While ViTs use computationally expensive self-attention operations in every layer, we identify that these operations are highly correlated across layers -- a key redundancy that causes unnecessary computations. Based on this observation, we propose SkipAt, a method to reuse self-attention computation from preceding layers to ap… ▽ More

    Submitted 17 January, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  24. arXiv:2210.10820  [pdf, other

    cs.CV cs.CL cs.IR cs.LG

    VTC: Improving Video-Text Retrieval with User Comments

    Authors: Laura Hanu, James Thewlis, Yuki M. Asano, Christian Rupprecht

    Abstract: Multi-modal retrieval is an important problem for many applications, such as recommendation and search. Current benchmarks and even datasets are often manually constructed and consist of mostly clean samples where all modalities are well-correlated with the content. Thus, current video-text retrieval literature largely focuses on video titles or audio transcripts, while ignoring user comments, sin… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted paper at the European Conference on Computer Vision (ECCV) 2022

  25. arXiv:2210.06466  [pdf, other

    cs.CV

    Prompt Generation Networks for Input-based Adaptation of Frozen Vision Transformers

    Authors: Jochem Loedeman, Maarten C. Stol, Tengda Han, Yuki M. Asano

    Abstract: With the introduction of the transformer architecture in computer vision, increasing model scale has been demonstrated as a clear path to achieving performance and robustness gains. However, with model parameter counts reaching the billions, classical finetuning approaches are becoming increasingly limiting and even unfeasible when models become hosted as inference APIs, as in NLP. To this end, vi… ▽ More

    Submitted 19 April, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Tech report, 12 pages. Code: https://github.com/jochemloedeman/PGN

  26. arXiv:2210.06462  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Guided Diffusion Models

    Authors: Vincent Tao Hu, David W Zhang, Yuki M. Asano, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we eliminate the need for such annotation by instead leveraging the flexibili… ▽ More

    Submitted 27 November, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: CVPR 2023

  27. arXiv:2209.03268  [pdf, other

    cs.CV

    Measuring the Interpretability of Unsupervised Representations via Quantized Reverse Probing

    Authors: Iro Laina, Yuki M. Asano, Andrea Vedaldi

    Abstract: Self-supervised visual representation learning has recently attracted significant research interest. While a common way to evaluate self-supervised representations is through transfer to various downstream tasks, we instead investigate the problem of measuring their interpretability, i.e. understanding the semantics encoded in raw representations. We formulate the latter as estimating the mutual i… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: Published at ICLR 2022. Appendix included, 26 pages

  28. arXiv:2206.06169  [pdf, other

    cs.LG cs.AI stat.ML

    Causal Representation Learning for Instantaneous and Temporal Effects in Interactive Systems

    Authors: Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M. Asano, Taco Cohen, Efstratios Gavves

    Abstract: Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measur… ▽ More

    Submitted 7 March, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: Published at International Conference on Learning Representations (ICLR), 2023

  29. arXiv:2205.11374  [pdf, other

    cs.CL cs.AI

    Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements

    Authors: Conrad Borchers, Dalia Sara Gala, Benjamin Gilburt, Eduard Oravkin, Wilfried Bounsi, Yuki M. Asano, Hannah Rose Kirk

    Abstract: The growing capability and availability of generative language models has enabled a wide range of new downstream tasks. Academic research has identified, quantified and mitigated biases present in language models but is rarely tailored to downstream tasks where wider impact on individuals and society can be felt. In this work, we leverage one popular generative language model, GPT-3, with the goal… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted for the 4th Workshop on Gender Bias in Natural Language Processing at NAACL 2022

  30. arXiv:2204.13101  [pdf, other

    cs.CV

    Self-Supervised Learning of Object Parts for Semantic Segmentation

    Authors: Adrian Ziegler, Yuki M. Asano

    Abstract: Progress in self-supervised learning has brought strong general image representation learning methods. Yet so far, it has mostly focused on image-level learning. In turn, tasks such as unsupervised image segmentation have not benefited from this trend as they require spatially-diverse representations. However, learning dense representations is challenging, as in the unsupervised context it is not… ▽ More

    Submitted 20 June, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

    Comments: Accepted at CVPR 2022

  31. arXiv:2204.08874  [pdf, other

    cs.CV

    Less than Few: Self-Shot Video Instance Segmentation

    Authors: Pengwan Yang, Yuki M. Asano, Pascal Mettes, Cees G. M. Snoek

    Abstract: The goal of this paper is to bypass the need for labelled examples in few-shot video understanding at run time. While proven effective, in many practical video settings even labelling a few examples appears unrealistic. This is especially true as the level of details in spatio-temporal video understanding and with it, the complexity of annotations continues to increase. Rather than performing few-… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 25 pages, 5 figures, 13 tables

  32. arXiv:2202.03169  [pdf, other

    cs.LG cs.AI stat.ME

    CITRIS: Causal Identifiability from Temporal Intervened Sequences

    Authors: Phillip Lippe, Sara Magliacane, Sindy Löwe, Yuki M. Asano, Taco Cohen, Efstratios Gavves

    Abstract: Understanding the latent causal factors of a dynamical system from visual observations is considered a crucial step towards agents reasoning in complex environments. In this paper, we propose CITRIS, a variational autoencoder framework that learns causal representations from temporal sequences of images in which underlying causal factors have possibly been intervened upon. In contrast to the recen… ▽ More

    Submitted 15 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: Accepted at the International Conference on Machine Learning (ICML), 2022

  33. arXiv:2112.00725  [pdf, other

    cs.CV

    The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image

    Authors: Yuki M. Asano, Aaqib Saeed

    Abstract: What can neural networks learn about the visual world when provided with only a single image as input? While any image obviously cannot contain the multitudes of all existing objects, scenes and lighting conditions - within the space of all 256^(3x224x224) possible 224-sized square images, it might still provide a strong prior for natural images. To analyze this `augmented image prior' hypothesis,… ▽ More

    Submitted 24 January, 2023; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted at ICLR'23. Webpage: https://single-image-distill.github.io/, code: https://github.com/yukimasano/single-img-extrapolating

  34. arXiv:2109.13228  [pdf, other

    cs.CV cs.CY

    PASS: An ImageNet replacement for self-supervised pretraining without humans

    Authors: Yuki M. Asano, Christian Rupprecht, Andrew Zisserman, Andrea Vedaldi

    Abstract: Computer vision has long relied on ImageNet and other large datasets of images sampled from the Internet for pretraining models. However, these datasets have ethical and technical shortcomings, such as containing personal information taken without consent, unclear license usage, biases, and, in some cases, even problematic image content. On the other hand, state-of-the-art pretraining is nowadays… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: Accepted to NeurIPS Track on Datasets and Benchmarks 2021. Webpage: https://www.robots.ox.ac.uk/~vgg/research/pass/

  35. arXiv:2107.04313  [pdf, other

    cs.CV

    Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

    Authors: Hannah Rose Kirk, Yennie Jun, Paulius Rauba, Gal Wachtel, Ruining Li, Xingjian Bai, Noah Broestl, Martin Doff-Sotta, Aleksandar Shtedritski, Yuki M. Asano

    Abstract: Hateful memes pose a unique challenge for current machine learning systems because their message is derived from both text- and visual-modalities. To this effect, Facebook released the Hateful Memes Challenge, a dataset of memes with pre-extracted text captions, but it is unclear whether these synthetic examples generalize to `memes in the wild'. In this paper, we collect hateful and non-hateful m… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted paper at ACL WOAH 2021

  36. arXiv:2106.05392  [pdf, other

    cs.CV

    Kee** Your Eye on the Ball: Trajectory Attention in Video Transformers

    Authors: Mandela Patrick, Dylan Campbell, Yuki M. Asano, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, João F. Henriques

    Abstract: In video transformers, the time dimension is often treated in the same way as the two spatial dimensions. However, in a scene where objects or the camera may move, a physical point imaged at one location in frame $t$ may be entirely unrelated to what is found at that location in frame $t+k$. These temporal correspondences should be modeled to facilitate learning about dynamic scenes. To this end,… ▽ More

    Submitted 23 October, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 (Oral). Project page: https://facebookresearch.github.io/Motionformer

  37. arXiv:2104.06401  [pdf, other

    cs.CV

    Self-supervised object detection from audio-visual correspondence

    Authors: Triantafyllos Afouras, Yuki M. Asano, Francois Fagan, Andrea Vedaldi, Florian Metze

    Abstract: We tackle the problem of learning object detectors without supervision. Differently from weakly-supervised object detection, we do not assume image-level class labels. Instead, we extract a supervisory signal from audio-visual data, using the audio component to "teach" the object detector. While this problem is related to sound source localisation, it is considerably harder because the detector mu… ▽ More

    Submitted 9 July, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR 2022

  38. arXiv:2103.10211  [pdf, other

    cs.CV

    Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning

    Authors: Mandela Patrick, Yuki M. Asano, Bernie Huang, Ishan Misra, Florian Metze, Joao Henriques, Andrea Vedaldi

    Abstract: The quality of the image representations obtained from self-supervised learning depends strongly on the type of data augmentations used in the learning formulation. Recent papers have ported these methods from still images to videos and found that leveraging both audio and video signals yields strong gains; however, they did not find that spatial augmentations such as crop**, which are very impo… ▽ More

    Submitted 27 October, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: Accepted to ICCV 2021. Code at https://github.com/facebookresearch/GDT

  39. arXiv:2103.06587  [pdf, other

    cs.CV

    Privacy-preserving Object Detection

    Authors: Peiyang He, Charlie Griffin, Krzysztof Kacprzyk, Artjom Joosen, Michael Collyer, Aleksandar Shtedritski, Yuki M. Asano

    Abstract: Privacy considerations and bias in datasets are quickly becoming high-priority issues that the computer vision community needs to face. So far, little attention has been given to practical solutions that do not involve collection of new datasets. In this work, we show that for object detection on COCO, both anonymizing the dataset by blurring faces, as well as swap** faces in a balanced manner a… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

  40. arXiv:2102.04130  [pdf, other

    cs.CL cs.AI

    Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models

    Authors: Hannah Kirk, Yennie Jun, Haider Iqbal, Elias Benussi, Filippo Volpin, Frederic A. Dreyer, Aleksandar Shtedritski, Yuki M. Asano

    Abstract: The capabilities of natural language models trained on large-scale data have increased immensely over the past few years. Open source libraries such as HuggingFace have made these models easily available and accessible. While prior research has identified biases in large language models, this paper considers biases contained in the most popular versions of these models when applied `out-of-the-box… ▽ More

    Submitted 27 October, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted to NeurIPS 2021. Code and data at https://github.com/oxai/intersectional_gpt2

  41. arXiv:2006.13662  [pdf, other

    cs.CV cs.LG

    Labelling unlabelled videos from scratch with multi-modal self-supervision

    Authors: Yuki M. Asano, Mandela Patrick, Christian Rupprecht, Andrea Vedaldi

    Abstract: A large part of the current success of deep learning lies in the effectiveness of data -- more precisely: labelled data. Yet, labelling a dataset with human annotation continues to carry high costs, especially for videos. While in the image domain, recent methods have allowed to generate meaningful (pseudo-) labels for unlabelled datasets without supervision, this development is missing for the vi… ▽ More

    Submitted 28 February, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS 2020. Project page: https://www.robots.ox.ac.uk/~vgg/research/selavi, code: https://github.com/facebookresearch/selavi

  42. arXiv:2003.04298  [pdf, other

    cs.CV

    On Compositions of Transformations in Contrastive Self-Supervised Learning

    Authors: Mandela Patrick, Yuki M. Asano, Polina Kuznetsova, Ruth Fong, João F. Henriques, Geoffrey Zweig, Andrea Vedaldi

    Abstract: In the image domain, excellent representations can be learned by inducing invariance to content-preserving transformations via noise contrastive learning. In this paper, we generalize contrastive learning to a wider set of transformations, and their compositions, for which either invariance or distinctiveness is sought. We show that it is not immediately obvious how existing methods such as SimCLR… ▽ More

    Submitted 27 October, 2021; v1 submitted 9 March, 2020; originally announced March 2020.

    Comments: Accepted to ICCV 2021. Code and pretrained models are available at https://github.com/facebookresearch/GDT

  43. arXiv:1911.05371  [pdf, other

    cs.CV cs.NE

    Self-labelling via simultaneous clustering and representation learning

    Authors: Yuki Markus Asano, Christian Rupprecht, Andrea Vedaldi

    Abstract: Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and… ▽ More

    Submitted 19 February, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

    Comments: Accepted paper at the International Conference on Learning Representations (ICLR) 2020

  44. arXiv:1904.13132  [pdf, other

    cs.CV

    A critical analysis of self-supervision, or what we can learn from a single image

    Authors: Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi

    Abstract: We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. Ho… ▽ More

    Submitted 19 February, 2020; v1 submitted 30 April, 2019; originally announced April 2019.

    Comments: Accepted paper at the International Conference on Learning Representations (ICLR) 2020