Skip to main content

Showing 1–50 of 55 results for author: Bilen, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20099  [pdf, other

    cs.CV

    Odd-One-Out: Anomaly Detection by Comparing with Neighbors

    Authors: Ankan Bhunia, Changjian Li, Hakan Bilen

    Abstract: This paper introduces a novel anomaly detection (AD) problem that focuses on identifying `odd-looking' objects relative to the other instances within a scene. Unlike the traditional AD benchmarks, in our setting, anomalies in this context are scene-specific, defined by the regular instances that make up the majority. Since object instances are often partly visible from a single viewpoint, our sett… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Codes & Dataset at https://github.com/VICO-UoE/OddOneOutAD

  2. arXiv:2406.19393  [pdf, other

    cs.CV

    Looking 3D: Anomaly Detection with 2D-3D Alignment

    Authors: Ankan Bhunia, Changjian Li, Hakan Bilen

    Abstract: Automatic anomaly detection based on visual cues holds practical significance in various domains, such as manufacturing and product quality assessment. This paper introduces a new conditional anomaly detection problem, which involves identifying anomalies in a query image by comparing it to a reference shape. To address this challenge, we have created a large dataset, BrokenChairs-180K, consisting… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted at CVPR'24. Codes & dataset available at https://github.com/VICO-UoE/Looking3D

  3. arXiv:2406.16623  [pdf, other

    cs.CV

    Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis

    Authors: Jianning Deng, Kartic Subr, Hakan Bilen

    Abstract: We propose a novel unsupervised method to learn the pose and part-segmentation of articulated objects with rigid parts. Given two observations of an object in different articulation states, our method learns the geometry and appearance of object parts by using an implicit model from the first observation, distils the part segmentation and articulation from the second observation while rendering th… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 9 pages for the maincontent, excluding references and supplementaries

  4. arXiv:2405.07723  [pdf, other

    cs.CV

    Coarse or Fine? Recognising Action End States without Labels

    Authors: Davide Moltisanti, Hakan Bilen, Laura Sevilla-Lara, Frank Keller

    Abstract: We focus on the problem of recognising the end state of an action in an image, which is critical for understanding what action is performed and in which manner. We study this focusing on the task of predicting the coarseness of a cut, i.e., deciding whether an object was cut "coarsely" or "finely". No dataset with these annotated end states is available, so we propose an augmentation method to syn… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: The Eleventh Workshop on Fine-Grained Visual Categorization (CVPR 24)

  5. arXiv:2312.13216  [pdf, other

    cs.CV

    Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps

    Authors: Octave Mariotti, Oisin Mac Aodha, Hakan Bilen

    Abstract: Recent progress in self-supervised representation learning has resulted in models that are capable of extracting image features that are not only effective at encoding image level, but also pixel-level, semantics. These features have been shown to be effective for dense visual semantic correspondence estimation, even outperforming fully-supervised methods. Nevertheless, current self-supervised app… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  6. arXiv:2310.13619  [pdf, other

    cs.CL cs.CV

    Semi-supervised multimodal coreference resolution in image narrations

    Authors: Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

    Abstract: In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised a… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Long paper at EMNLP'23-Main

  7. arXiv:2310.00986  [pdf, other

    cs.CV

    Multi-task Learning with 3D-Aware Regularization

    Authors: Wei-Hong Li, Steven McDonagh, Ales Leonardis, Hakan Bilen

    Abstract: Deep neural networks have become a standard building block for designing models that can perform multiple dense computer vision tasks such as depth estimation and semantic segmentation thanks to their ability to capture complex correlations in high dimensional feature space across tasks. However, the cross-task correlations that are learned in the unstructured feature space can be extremely noisy… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 3D-aware Multi-task Learning, Code will be available at https://github.com/VICO-UoE/MTPSL

  8. arXiv:2306.02956  [pdf, other

    cs.CV cs.GR

    Explicit Neural Surfaces: Learning Continuous Geometry With Deformation Fields

    Authors: Thomas Walker, Octave Mariotti, Amir Vaxman, Hakan Bilen

    Abstract: We introduce Explicit Neural Surfaces (ENS), an efficient smooth surface representation that directly encodes topology with a deformation field from a known base domain. We apply this representation to reconstruct explicit surfaces from multiple views, where we use a series of neural deformation fields to progressively transform the base domain into a target shape. By using meshes as discrete surf… ▽ More

    Submitted 11 December, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    ACM Class: I.4.5; I.2.10; I.3.5

  9. arXiv:2303.15086  [pdf, other

    cs.CV

    Learning Action Changes by Measuring Verb-Adverb Textual Relationships

    Authors: Davide Moltisanti, Frank Keller, Hakan Bilen, Laura Sevilla-Lara

    Abstract: The goal of this work is to understand the way actions are performed in videos. That is, given a video, we aim to predict an adverb indicating a modification applied to the action (e.g. cut "finely"). We cast this problem as a regression task. We measure textual relationships between verbs and adverbs to generate a regression target representing the action change we aim to learn. We test our appro… ▽ More

    Submitted 23 May, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: CVPR 23. Version 2 updates some results due to an errata (see code repository for more details). Code and dataset available at https://github.com/dmoltisanti/air-cvpr23

  10. arXiv:2212.05611  [pdf, other

    cs.CV

    Accelerating Self-Supervised Learning via Efficient Training Strategies

    Authors: Mustafa Taha Koçyiğit, Timothy M. Hospedales, Hakan Bilen

    Abstract: Recently the focus of the computer vision community has shifted from expensive supervised learning towards self-supervised learning of visual representations. While the performance gap between supervised and self-supervised has been narrowing, the time for training self-supervised deep networks remains an order of magnitude larger than its supervised counterparts, which hinders progress, imposes c… ▽ More

    Submitted 11 December, 2022; originally announced December 2022.

  11. arXiv:2212.00436  [pdf, other

    cs.CV

    ViewNeRF: Unsupervised Viewpoint Estimation Using Category-Level Neural Radiance Fields

    Authors: Octave Mariotti, Oisin Mac Aodha, Hakan Bilen

    Abstract: We introduce ViewNeRF, a Neural Radiance Field-based viewpoint estimation method that learns to predict category-level viewpoints directly from images during training. While NeRF is usually trained with ground-truth camera poses, multiple extensions have been proposed to reduce the need for this expensive supervision. Nonetheless, most of these methods still struggle in complex settings with large… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Journal ref: Proceedings of the 33rd British Machine Vision Conference, BMVC 2022

  12. arXiv:2212.00435  [pdf, other

    cs.CV

    ViewNet: Unsupervised Viewpoint Estimation from Conditional Generation

    Authors: Octave Mariotti, Oisin Mac Aodha, Hakan Bilen

    Abstract: Understanding the 3D world without supervision is currently a major challenge in computer vision as the annotations required to supervise deep networks for tasks in this domain are expensive to obtain on a large scale. In this paper, we address the problem of unsupervised viewpoint estimation. We formulate this as a self-supervised learning task, where image reconstruction provides the supervision… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10418-10428

  13. arXiv:2211.14563  [pdf, other

    cs.CV cs.CL

    Who are you referring to? Coreference resolution in image narrations

    Authors: Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

    Abstract: Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing. In this paper, we extend this task to resolving coreferences in long-form narrations of visual scenes. First we introduce a new dataset with annotated coreference chains and their bounding boxes, as most existing image-text datasets only contain short sentence… ▽ More

    Submitted 17 March, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: 15 pages

  14. arXiv:2211.09869  [pdf, other

    cs.CV cs.LG

    RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation

    Authors: Titas Anciukevičius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J. Mitra, Paul Guerrero

    Abstract: Diffusion models currently achieve state-of-the-art performance for both conditional and unconditional image generation. However, so far, image diffusion models do not support tasks required for 3D understanding, such as view-consistent 3D generation or single-view object reconstruction. In this paper, we present RenderDiffusion, the first diffusion model for 3D generation and inference, trained u… ▽ More

    Submitted 20 February, 2024; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted at CVPR 2023. Project page: https://github.com/Anciukevicius/RenderDiffusion

  15. arXiv:2211.03003  [pdf, other

    cs.CV

    Learning to Annotate Part Segmentation with Gradient Matching

    Authors: Yu Yang, Xiaotian Cheng, Hakan Bilen, Xiangyang Ji

    Abstract: The success of state-of-the-art deep neural networks heavily relies on the presence of large-scale labelled datasets, which are extremely expensive and time-consuming to annotate. This paper focuses on tackling semi-supervised part segmentation tasks by generating high-quality images with a pre-trained GAN and labelling the generated images with an automatic annotator. In particular, we formulate… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

    Comments: ICLR 2022

  16. arXiv:2211.03000  [pdf, other

    cs.CV

    Distilling Representations from GAN Generator via Squeeze and Span

    Authors: Yu Yang, Xiaotian Cheng, Chang Liu, Hakan Bilen, Xiangyang Ji

    Abstract: In recent years, generative adversarial networks (GANs) have been an actively studied topic and shown to successfully produce high-quality realistic images in various domains. The controllable synthesis ability of GAN generators suggests that they maintain informative, disentangled, and explainable image representations, but leveraging and transferring their representations to downstream tasks is… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

    Comments: 16 pages, NeurIPS 2022

  17. arXiv:2205.11090  [pdf, other

    cs.CV

    FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders

    Authors: Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Jiankang Deng, Xinchao Wang, Hakan Bilen, Yang You

    Abstract: Face recognition, as one of the most successful applications in artificial intelligence, has been widely used in security, administration, advertising, and healthcare. However, the privacy issues of public face datasets have attracted increasing attention in recent years. Previous works simply mask most areas of faces or synthesize samples using generative models to construct privacy-preserving fa… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: A new paradigm for privacy-preserving face recognition via MAE

  18. arXiv:2204.07513  [pdf, other

    cs.LG cs.CV

    Synthesizing Informative Training Samples with GAN

    Authors: Bo Zhao, Hakan Bilen

    Abstract: Remarkable progress has been achieved in synthesizing photo-realistic images with generative adversarial networks (GANs). Recently, GANs are utilized as the training sample generator when obtaining or storing real training data is expensive even infeasible. However, traditional GANs generated images are not as informative as the real training samples when being used to train deep neural networks.… ▽ More

    Submitted 20 December, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, https://openreview.net/forum?id=frAv0jtUMfS

  19. arXiv:2204.02744  [pdf, other

    cs.CV

    Universal Representations: A Unified Look at Multiple Task and Domain Learning

    Authors: Wei-Hong Li, Xialei Liu, Hakan Bilen

    Abstract: We propose a unified look at jointly learning multiple vision tasks and visual domains through universal representations, a single deep neural network. Learning multiple problems simultaneously involves minimizing a weighted sum of multiple loss functions with different magnitudes and characteristics and thus results in unbalanced state of one loss dominating the optimization and poor results comp… ▽ More

    Submitted 30 August, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Multi-task Learning, Multi-domain Learning, Cross-domain Few-shot Learning, Universal Representation Learning, Balanced Optimization, Dense Prediction, Code and models are available at https://github.com/VICO-UoE/UniversalRepresentations. arXiv admin note: text overlap with arXiv:2103.13841

  20. arXiv:2203.17178  [pdf, other

    cs.CV cs.AI

    3D Equivariant Graph Implicit Functions

    Authors: Yunlu Chen, Basura Fernando, Hakan Bilen, Matthias Nießner, Efstratios Gavves

    Abstract: In recent years, neural implicit representations have made remarkable progress in modeling of 3D shapes with arbitrary topology. In this work, we address two key limitations of such representations, in failing to capture local 3D geometric fine details, and to learn from and generalize to shapes with unseen 3D transformations. To this end, we introduce a novel family of graph implicit functions wi… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: Video: https://youtu.be/W7goOzZP2Kc

  21. arXiv:2203.01531  [pdf, other

    cs.CV

    CAFE: Learning to Condense Dataset by Aligning Features

    Authors: Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, Yang You

    Abstract: Dataset condensation aims at reducing the network training effort through condensing a cumbersome training set into a compact synthetic one. State-of-the-art approaches largely rely on learning the synthetic data by matching the gradients between the real and synthetic data batches. Despite the intuitive motivation and promising results, such gradient-based methods, by nature, easily overfit to a… ▽ More

    Submitted 27 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: The manuscript has been accepted by CVPR-2022!

  22. arXiv:2111.14893  [pdf, other

    cs.CV

    Learning Multiple Dense Prediction Tasks from Partially Annotated Data

    Authors: Wei-Hong Li, Xialei Liu, Hakan Bilen

    Abstract: Despite the recent advances in multi-task learning of dense prediction problems, most methods rely on expensive labelled datasets. In this paper, we present a label efficient approach and look at jointly learning of multiple dense prediction tasks on partially annotated data (i.e. not all the task labels are available for each image), which we call multi-task partially-supervised learning. We prop… ▽ More

    Submitted 4 May, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: CVPR2022, Multi-task Partially-supervised Learning, Code will be available at https://github.com/VICO-UoE/MTPSL

  23. arXiv:2111.13517  [pdf, other

    cs.CV

    Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation

    Authors: Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

    Abstract: Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding. Existing SGG methods trained on the entire set of relations fail to acquire complex reasoning about visual and textual correlations due to various biases in training data. Learning on trivial relations that indicate generic spatial configuration lik… ▽ More

    Submitted 4 April, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: 16 pages

    Journal ref: CVPR 2022

  24. arXiv:2110.04181  [pdf, other

    cs.LG cs.CV

    Dataset Condensation with Distribution Matching

    Authors: Bo Zhao, Hakan Bilen

    Abstract: Computational cost of training state-of-the-art deep models in many learning problems is rapidly increasing due to more sophisticated models and larger datasets. A recent promising direction for reducing training cost is dataset condensation that aims to replace the original large training set with a significantly smaller learned synthetic set while preserving the original information. While train… ▽ More

    Submitted 21 December, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2023 (WACV)

  25. arXiv:2107.00358  [pdf, other

    cs.CV

    Cross-domain Few-shot Learning with Task-specific Adapters

    Authors: Wei-Hong Li, Xialei Liu, Hakan Bilen

    Abstract: In this paper, we look at the problem of cross-domain few-shot classification that aims to learn a classifier from previously unseen classes and domains with few labeled samples. Recent approaches broadly solve this problem by parameterizing their few-shot classifiers with task-agnostic and task-specific weights where the former is typically learned on a large training set and the latter is dynami… ▽ More

    Submitted 4 May, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: CVPR2022, Code will be available at https://github.com/VICO-UoE/URL

  26. Semi-supervised Viewpoint Estimation with Geometry-aware Conditional Generation

    Authors: Octave Mariotti, Hakan Bilen

    Abstract: There is a growing interest in develo** computer vision methods that can learn from limited supervision. In this paper, we consider the problem of learning to predict camera viewpoints, where obtaining ground-truth annotations are expensive and require special equipment, from a limited number of labeled images. We propose a semi-supervised viewpoint estimation method that can learn to infer view… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Journal ref: ECCV 2020: Computer Vision - ECCV 2020 Workshops pp 631-647

  27. arXiv:2104.00483  [pdf, other

    cs.CV

    Learning Foreground-Background Segmentation from Improved Layered GANs

    Authors: Yu Yang, Hakan Bilen, Qiran Zou, Wing Yin Cheung, Xiangyang Ji

    Abstract: Deep learning approaches heavily rely on high-quality human supervision which is nonetheless expensive, time-consuming, and error-prone, especially for image segmentation task. In this paper, we propose a method to automatically synthesize paired photo-realistic images and segmentation masks for the use of training a foreground-background segmentation network. In particular, we learn a generative… ▽ More

    Submitted 3 December, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  28. arXiv:2103.13841  [pdf, other

    cs.CV

    Universal Representation Learning from Multiple Domains for Few-shot Classification

    Authors: Wei-Hong Li, Xialei Liu, Hakan Bilen

    Abstract: In this paper, we look at the problem of few-shot classification that aims to learn a classifier for previously unseen classes and domains from few labeled samples. Recent methods use adaptation networks for aligning their features to new domains or select the relevant features from multiple domain-specific feature extractors. In this work, we propose to learn a single set of universal deep repres… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: Code will be available at https://github.com/VICO-UoE/URL

  29. arXiv:2102.08259  [pdf, other

    cs.LG cs.CV

    Dataset Condensation with Differentiable Siamese Augmentation

    Authors: Bo Zhao, Hakan Bilen

    Abstract: In many machine learning problems, large-scale datasets have become the de-facto standard to train state-of-the-art deep networks at the price of heavy computation load. In this paper, we focus on condensing large training sets into significantly smaller synthetic sets which can be used to train deep neural networks from scratch with minimum drop in performance. Inspired from the recent training s… ▽ More

    Submitted 10 June, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Journal ref: International Conference on Machine Learning 2021

  30. arXiv:2010.02310  [pdf, other

    cs.LG stat.ML

    Deep Anomaly Detection by Residual Adaptation

    Authors: Lucas Deecke, Lukas Ruff, Robert A. Vandermeulen, Hakan Bilen

    Abstract: Deep anomaly detection is a difficult task since, in high dimensions, it is hard to completely characterize a notion of "differentness" when given only examples of normality. In this paper we propose a novel approach to deep anomaly detection based on augmenting large pretrained networks with residual corrections that adjusts them to the task of anomaly detection. Our method gives rise to a highly… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

  31. arXiv:2007.06889  [pdf, other

    cs.CV

    Knowledge Distillation for Multi-task Learning

    Authors: Wei-Hong Li, Hakan Bilen

    Abstract: Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation. Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics (e.g. cross-entropy, Euclidean loss), leading to the imbalance problem in multi-task learning. To a… ▽ More

    Submitted 24 September, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: We propose a knowledge distillation method for addressing the imbalance problem in multi-task learning

  32. arXiv:2006.05929  [pdf, other

    cs.CV cs.LG

    Dataset Condensation with Gradient Matching

    Authors: Bo Zhao, Konda Reddy Mopuri, Hakan Bilen

    Abstract: As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for data-efficient learning, called Dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural net… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Journal ref: International Conference on Learning Representations 2021

  33. arXiv:2006.04455  [pdf, other

    cs.CV

    Continual Representation Learning for Biometric Identification

    Authors: Bo Zhao, Shixiang Tang, Dapeng Chen, Hakan Bilen, Rui Zhao

    Abstract: With the explosion of digital data in recent years, continuously learning new tasks from a stream of data without forgetting previously acquired knowledge has become increasingly important. In this paper, we propose a new continual learning (CL) setting, namely ``continual representation learning'', which focuses on learning better representation in a continuous way. We also provide two large-scal… ▽ More

    Submitted 28 June, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 1198-1208

  34. arXiv:2006.00996  [pdf, other

    cs.LG stat.ML

    Latent Domain Learning with Dynamic Residual Adapters

    Authors: Lucas Deecke, Timothy Hospedales, Hakan Bilen

    Abstract: A practical shortcoming of deep neural networks is their specialization to a single task and domain. While recent techniques in domain adaptation and multi-domain learning enable the learning of more domain-agnostic features, their success relies on the presence of domain labels, typically requiring manual annotation and careful curation of datasets. Here we focus on a less explored, but more real… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

  35. arXiv:2001.02610  [pdf, other

    cs.LG stat.ML

    iDLG: Improved Deep Leakage from Gradients

    Authors: Bo Zhao, Konda Reddy Mopuri, Hakan Bilen

    Abstract: It is widely believed that sharing gradients will not leak private training data in distributed learning systems such as Collaborative Learning and Federated Learning, etc. Recently, Zhu et al. presented an approach which shows the possibility to obtain private training data from the publicly shared gradients. In their Deep Leakage from Gradient (DLG) method, they synthesize the dummy data and cor… ▽ More

    Submitted 8 January, 2020; originally announced January 2020.

  36. arXiv:1912.10364  [pdf, other

    cs.LG cs.CV

    Learning to Impute: A General Framework for Semi-supervised Learning

    Authors: Wei-Hong Li, Chuan-Sheng Foo, Hakan Bilen

    Abstract: Recent semi-supervised learning methods have shown to achieve comparable results to their supervised counterparts while using only a small portion of labels in image classification tasks thanks to their regularization strategies. In this paper, we take a more direct approach for semi-supervised learning and propose learning to impute the labels of unlabeled samples such that a network achieves bet… ▽ More

    Submitted 24 September, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

    Comments: Semi-supervised Learning, Meta-Learning, Learning to impute

  37. arXiv:1911.10082  [pdf, other

    cs.CL cs.CV

    Injecting Prior Knowledge into Image Caption Generation

    Authors: Arushi Goel, Basura Fernando, Thanh-Son Nguyen, Hakan Bilen

    Abstract: Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the visual and textual signals and the correlations between them. The state-of-the-art methods in image captioning struggles to approach human level performance, especially when data is limited. In this paper, we propose to improve the perfo… ▽ More

    Submitted 6 August, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

    Comments: ECCV20 VIPriors Workshop; 14 pages, 5 figures, 4 tables

  38. arXiv:1910.08823  [pdf, other

    cs.CV cs.LG

    NormGrad: Finding the Pixels that Matter for Training

    Authors: Sylvestre-Alvise Rebuffi, Ruth Fong, Xu Ji, Hakan Bilen, Andrea Vedaldi

    Abstract: The different families of saliency methods, either based on contrastive signals, closed-form formulas mixing gradients with activations or on perturbation masks, all focus on which parts of an image are responsible for the model's inference. In this paper, we are rather interested by the locations of an image that contribute to the model's training. First, we propose a principled attribution metho… ▽ More

    Submitted 19 October, 2019; originally announced October 2019.

  39. Image Deconvolution with Deep Image and Kernel Priors

    Authors: Zhunxuan Wang, Zipei Wang, Qiqi Li, Hakan Bilen

    Abstract: Image deconvolution is the process of recovering convolutional degraded images, which is always a hard inverse problem because of its mathematically ill-posed property. On the success of the recently proposed deep image prior (DIP), we build an image deconvolution model with deep image and kernel priors (DIKP). DIP is a learning-free representation which uses neural net structures to express image… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

    Comments: In Proceedings of the 2019 IEEE International Conference on Computer Vision Workshops (ICCVW)

  40. arXiv:1908.06427  [pdf, other

    cs.CV

    Unsupervised Learning of Landmarks by Descriptor Vector Exchange

    Authors: James Thewlis, Samuel Albanie, Hakan Bilen, Andrea Vedaldi

    Abstract: Equivariance to random image transformations is an effective method to learn landmarks of object categories, such as the eyes and the nose in faces, without manual supervision. However, this method does not explicitly guarantee that the learned landmarks are consistent with changes between different instances of the same object, such as different facial identities. In this paper, we develop a new… ▽ More

    Submitted 18 August, 2019; originally announced August 2019.

    Comments: ICCV 2019

  41. Personalised aesthetics with residual adapters

    Authors: Carlos Rodríguez-Pardo, Hakan Bilen

    Abstract: The use of computational methods to evaluate aesthetics in photography has gained interest in recent years due to the popularization of convolutional neural networks and the availability of new annotated datasets. Most studies in this area have focused on designing models that do not take into account individual preferences for the prediction of the aesthetic value of pictures. We propose a model… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

    Comments: 12 pages, 4 figures. In Iberian Conference on Pattern Recognition and Image Analysis proceedings

    MSC Class: 68T10 (Primary); 68T45 (Secondary) ACM Class: I.2.10; I.5.4; I.4.3

  42. Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos

    Authors: Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi

    Abstract: We propose KeypointGAN, a new method for recognizing the pose of objects from a single image that for learning uses only unlabelled videos and a weak empirical prior on the object poses. Video frames differ primarily in the pose of the objects they contain, so our method distils the pose information by analyzing the differences between frames. The distillation uses a new dual representation of the… ▽ More

    Submitted 23 December, 2020; v1 submitted 3 July, 2019; originally announced July 2019.

    Comments: CVPR 2020 (oral). Project page: http://www.robots.ox.ac.uk/~vgg/research/unsupervised_pose/

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8787-8797

  43. arXiv:1904.07774  [pdf, other

    cs.CV cs.LG stat.ML

    Weakly Supervised Gaussian Networks for Action Detection

    Authors: Basura Fernando, Cheston Tan Yin Chet, Hakan Bilen

    Abstract: Detecting temporal extents of human actions in videos is a challenging computer vision problem that requires detailed manual supervision including frame-level labels. This expensive annotation process limits deploying action detectors to a limited number of categories. We propose a novel method, called WSGN, that learns to detect actions from \emph{weak supervision}, using only video-level labels.… ▽ More

    Submitted 5 January, 2020; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: Accepted in WACV 2020

  44. arXiv:1810.05466  [pdf, other

    cs.LG stat.ML

    Mode Normalization

    Authors: Lucas Deecke, Iain Murray, Hakan Bilen

    Abstract: Normalization methods are a central building block in the deep learning toolbox. They accelerate and stabilize training, while decreasing the dependence on manually tuned learning rate schedules. When learning from multi-modal distributions, the effectiveness of batch normalization (BN), arguably the most prominent normalization method, is reduced. As a remedy, we propose a more flexible approach:… ▽ More

    Submitted 12 October, 2018; originally announced October 2018.

  45. arXiv:1806.07823  [pdf, other

    cs.CV

    Unsupervised Learning of Object Landmarks through Conditional Image Generation

    Authors: Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi

    Abstract: We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the two examples differ by a viewpoint change and/or an ob… ▽ More

    Submitted 13 December, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: In NeurIPS 2018. Project page: http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/

  46. arXiv:1803.10082  [pdf, other

    cs.CV stat.ML

    Efficient parametrization of multi-domain deep neural networks

    Authors: Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi

    Abstract: A practical limitation of deep neural networks is their high degree of specialization to a single task and visual domain. Recently, inspired by the successes of transfer learning, several authors have proposed to learn instead universal, fixed feature extractors that, used as the first stage of any deep network, work well for several tasks and domains simultaneously. Nevertheless, such universal f… ▽ More

    Submitted 27 March, 2018; originally announced March 2018.

    Comments: CVPR 2018

  47. arXiv:1706.02932  [pdf, other

    cs.CV stat.ML

    Unsupervised learning of object frames by dense equivariant image labelling

    Authors: James Thewlis, Hakan Bilen, Andrea Vedaldi

    Abstract: One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations. Starting from the recent idea of viewpoint factorization, we propose a new approach that, given a large number of images of an object and no other supervision… ▽ More

    Submitted 17 November, 2017; v1 submitted 9 June, 2017; originally announced June 2017.

    Comments: NIPS 2017

  48. arXiv:1705.08045  [pdf, other

    cs.CV stat.ML

    Learning multiple visual domains with residual adapters

    Authors: Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi

    Abstract: There is a growing interest in learning data representations that work well for many different types of problems and data. In this paper, we look in particular at the task of learning a single visual representation that can be successfully utilized in the analysis of very different types of images, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict… ▽ More

    Submitted 27 November, 2017; v1 submitted 22 May, 2017; originally announced May 2017.

  49. arXiv:1705.02193  [pdf, other

    cs.CV stat.ML

    Unsupervised learning of object landmarks by factorized spatial embeddings

    Authors: James Thewlis, Hakan Bilen, Andrea Vedaldi

    Abstract: Learning automatically the structure of object categories remains an important open problem in computer vision. In this paper, we propose a novel unsupervised approach that can discover and learn landmarks in object categories, thus characterizing their structure. Our approach is based on factorizing image deformations, as induced by a viewpoint change or an object deformation, by learning a deep… ▽ More

    Submitted 6 August, 2017; v1 submitted 5 May, 2017; originally announced May 2017.

    Comments: To be published in ICCV 2017

  50. arXiv:1701.07275  [pdf, other

    cs.CV stat.ML

    Universal representations:The missing link between faces, text, planktons, and cat breeds

    Authors: Hakan Bilen, Andrea Vedaldi

    Abstract: With the advent of large labelled datasets and high-capacity models, the performance of machine vision systems has been improving rapidly. However, the technology has still major limitations, starting from the fact that different vision problems are still solved by different models, trained from scratch or fine-tuned on the target data. The human visual system, in stark contrast, learns a universa… ▽ More

    Submitted 25 January, 2017; originally announced January 2017.

    Comments: 10 pages, 4 figures, 5 tables