Skip to main content

Showing 1–50 of 75 results for author: Tzimiropoulos, G

.
  1. arXiv:2406.07191  [pdf, other

    cs.CV

    MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD

    Authors: Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos

    Abstract: This paper is on long-term video understanding where the goal is to recognise human actions over long temporal windows (up to minutes long). In prior work, long temporal context is captured by constructing a long-term memory bank consisting of past and future video features which are then integrated into standard (short-term) video recognition backbones through the use of attention mechanisms. Two… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ICIP 2024

  2. arXiv:2405.10286  [pdf, other

    cs.CV cs.AI

    FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models

    Authors: Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos

    Abstract: Despite noise and caption quality having been acknowledged as important factors impacting vision-language contrastive pre-training, in this paper, we show that the full potential of improving the training process by addressing such issues is yet to be realized. Specifically, we firstly study and analyze two issues affecting training: incorrect assignment of negative pairs, and low caption quality… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPR 2024

  3. arXiv:2404.07078  [pdf, other

    cs.CV cs.HC

    VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

    Authors: Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: Recognising emotions in context involves identifying the apparent emotions of an individual, taking into account contextual cues from the surrounding scene. Previous approaches to this task have involved the design of explicit scene-encoding architectures or the incorporation of external scene-related information, such as captions. However, these methods often utilise limited contextual informatio… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: A. Xenos, N. Foteinopoulou and I. Ntinou contributed equally to this work; 14 pages, 5 figures

  4. arXiv:2403.17217  [pdf, other

    cs.CV cs.AI

    DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment

    Authors: Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: Video-driven neural face reenactment aims to synthesize realistic facial images that successfully preserve the identity and appearance of a source face, while transferring the target head pose and facial expressions. Existing GAN-based methods suffer from either distortions and visual artifacts or poor reconstruction quality, i.e., the background and several important appearance details, such as h… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Project page: https://stelabou.github.io/diffusionact/

  5. arXiv:2403.08161  [pdf, other

    cs.CV cs.AI

    LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

    Authors: Zhonglin Sun, Chen Feng, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: In this work we focus on learning facial representations that can be adapted to train effective face recognition models, particularly in the absence of labels. Firstly, compared with existing labelled face datasets, a vastly larger magnitude of unlabeled faces exists in the real world. We explore the learning strategy of these unlabeled facial images through self-supervised pretraining to transfer… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: accepted to CVPR 2024

  6. arXiv:2402.03553  [pdf, other

    cs.CV

    One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space

    Authors: Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: In this paper, we present our framework for neural face/head reenactment whose goal is to transfer the 3D head orientation and expression of a target face to a source face. Previous methods focus on learning embedding networks for identity and head pose/expression disentanglement which proves to be a rather hard task, degrading the quality of the generated images. We take a different approach, byp… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Preprint version, accepted for publication in International Journal of Computer Vision (IJCV)

  7. arXiv:2401.17258  [pdf, other

    cs.CV

    You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

    Authors: Mehdi Noroozi, Isma Hadji, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

    Abstract: In this paper, we introduce YONOS-SR, a novel stable diffusion-based approach for image super-resolution that yields state-of-the-art results using only a single DDIM step. We propose a novel scale distillation approach to train our SR model. Instead of directly training our SR model on the scale factor of interest, we start by training a teacher model on a smaller magnification scale, thereby mak… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  8. arXiv:2401.13594  [pdf, other

    cs.CL cs.AI

    Graph Guided Question Answer Generation for Procedural Question-Answering

    Authors: Hai X. Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, Brais Martinez

    Abstract: In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural t… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to EACL 2024 as long paper. 25 pages including appendix

    MSC Class: I.2.7

  9. arXiv:2312.17686  [pdf, other

    cs.CV

    Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization

    Authors: Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos

    Abstract: Action Localization is a challenging problem that combines detection and recognition tasks, which are often addressed separately. State-of-the-art methods rely on off-the-shelf bounding box detections pre-computed at high resolution, and propose transformer models that focus on the classification task alone. Such two-stage solutions are prohibitive for real-time deployment. On the other hand, sing… ▽ More

    Submitted 23 May, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  10. arXiv:2310.13570  [pdf, other

    cs.CV

    A Simple Baseline for Knowledge-Based Visual Question Answering

    Authors: Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer questions requiring external knowledge effectively. A common limitation of such approaches is that they consist of relatively complicated pipelines and often heav… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 (camera-ready version)

  11. arXiv:2309.07760  [pdf

    cs.CV cs.AI cs.LG

    PRE: Vision-Language Prompt Learning with Reparameterization Encoder

    Authors: Anh Pham Thi Minh, An Duc Nguyen, Georgios Tzimiropoulos

    Abstract: Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks. However, to attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions. This manual prompt engineering is the major challenge for deploying such model… ▽ More

    Submitted 6 November, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 8 pages excluding References and Appendix

    ACM Class: I.4.0

  12. arXiv:2307.15697  [pdf, other

    cs.CV

    Aligned Unsupervised Pretraining of Object Detectors with Self-training

    Authors: Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos

    Abstract: The unsupervised pretraining of object detectors has recently become a key component of object detector training, as it leads to improved performance and faster convergence during the supervised fine-tuning stage. Existing unsupervised pretraining methods, however, typically rely on low-level information to define proposals that are used to train the detector. Furthermore, in the absence of class… ▽ More

    Submitted 7 July, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

  13. arXiv:2307.10797  [pdf, other

    cs.CV

    HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces

    Authors: Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual ar… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted for publication in ICCV 2023. Project page: https://stelabou.github.io/hyperreenact.github.io/ Code: https://github.com/StelaBou/HyperReenact

  14. arXiv:2304.01752  [pdf, other

    cs.CV cs.CL cs.LG

    Black Box Few-Shot Adaptation for Vision-Language models

    Authors: Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners. Soft prompt learning is the method of choice for few-shot downstream adaptation aiming to bridge the modality gap caused by the distribution shift induced by the new domain. While parameter-efficient, prompt learning still requires access to the… ▽ More

    Submitted 17 August, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Published at ICCV 2023

  15. arXiv:2304.01042  [pdf, other

    cs.CV

    DivClust: Controlling Diversity in Deep Clustering

    Authors: Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, Ioannis Patras

    Abstract: Clustering has been a major research topic in the field of machine learning, one to which Deep Learning has recently been applied with significant success. However, an aspect of clustering that is not addressed by existing deep clustering methods, is that of efficiently producing multiple, diverse partitionings for a given dataset. This is particularly important, as a diverse set of base clusterin… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in CVPR 2023

  16. arXiv:2212.00057  [pdf, other

    cs.CV cs.AI

    Part-based Face Recognition with Vision Transformers

    Authors: Zhonglin Sun, Georgios Tzimiropoulos

    Abstract: Holistic methods using CNNs and margin-based losses have dominated research on face recognition. In this work, we depart from this setting in two ways: (a) we employ the Vision Transformer as an architecture for training a very strong baseline for face recognition, simply called fViT, which already surpasses most state-of-the-art face recognition methods. (b) Secondly, we capitalize on the Transfo… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

    Comments: Accepted to BMVC 2022

  17. arXiv:2210.04845  [pdf, other

    cs.CV cs.AI

    FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

    Authors: Adrian Bulat, Ricardo Guerrero, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This paper is on Few-Shot Object Detection (FSOD), where given a few templates (examples) depicting a novel class (not seen during training), the goal is to detect all of its occurrences within a set of images. From a practical perspective, an FSOD system must fulfil the following desiderata: (a) it must be used as is, without requiring any fine-tuning at test time, (b) it must be able to process… ▽ More

    Submitted 20 August, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted at ICCV 2023

  18. arXiv:2210.02390  [pdf, other

    cs.CV cs.AI cs.LG

    Bayesian Prompt Learning for Image-Language Model Generalization

    Authors: Mohammad Mahdi Derakhshani, Enrique Sanchez, Adrian Bulat, Victor Guilherme Turrisi da Costa, Cees G. M. Snoek, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Foundational image-language models have generated considerable interest due to their efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of the language model input as trainable while freezing the rest, and optimizes an Empirical Risk Minimization objective. However, Empirical Risk Minimization is known to suffer from distributional shifts which hurt generaliza… ▽ More

    Submitted 20 August, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted at ICCV 2023

  19. arXiv:2210.01115  [pdf, other

    cs.CV cs.AI cs.LG

    LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: Soft prompt learning has recently emerged as one of the methods of choice for adapting V&L models to a downstream task using a few training examples. However, current methods significantly overfit the training data, suffering from large accuracy degradation when tested on unseen classes from the same domain. To this end, in this paper, we make the following 4 contributions: (1) To alleviate base c… ▽ More

    Submitted 2 April, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted at CVPR 2023

  20. arXiv:2209.15000  [pdf, other

    cs.CV cs.AI cs.LG

    REST: REtrieve & Self-Train for generative action recognition

    Authors: Adrian Bulat, Enrique Sanchez, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This work is on training a generative action/video recognition model whose output is a free-form action-specific caption describing the video (rather than an action class label). A generative approach has practical advantages like producing more fine-grained and human-readable output, and being naturally open-world. To this end, we propose to adapt a pre-trained generative Vision & Language (V&L)… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  21. arXiv:2209.13375  [pdf, other

    cs.CV

    StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment

    Authors: Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: In this paper we address the problem of neural face reenactment, where, given a pair of a source and a target facial image, we need to transfer the target's pose (defined as the head pose and its facial expressions) to the source image, by preserving at the same time the source's identity characteristics (e.g., facial shape, hair style, etc), even in the challenging case where the source and the t… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted for publication in IEEE FG 2023. Code: https://github.com/StelaBou/StyleMask

  22. arXiv:2208.11108  [pdf, other

    cs.CV cs.LG

    Efficient Attention-free Video Shift Transformers

    Authors: Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This paper tackles the problem of efficient video recognition. In this area, video transformers have recently dominated the efficiency (top-1 accuracy vs FLOPs) spectrum. At the same time, there have been some attempts in the image domain which challenge the necessity of the self-attention operation within the transformer architecture, advocating the use of simpler approaches for token mixing. How… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  23. arXiv:2206.08339  [pdf, other

    cs.CV cs.LG

    iBoot: Image-bootstrapped Self-Supervised Video Representation Learning

    Authors: Fatemeh Saleh, Fuwen Tan, Adrian Bulat, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Learning visual representations through self-supervision is an extremely challenging task as the network needs to sieve relevant patterns from spurious distractors without the active guidance provided by supervision. This is achieved through heavy data augmentation, large-scale datasets and prohibitive amounts of compute. Video self-supervised learning (SSL) suffers from added challenges: video da… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  24. arXiv:2206.02104  [pdf, other

    cs.CV

    ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences

    Authors: Christos Tzelepis, James Oldfield, Georgios Tzimiropoulos, Ioannis Patras

    Abstract: This work addresses the problem of discovering non-linear interpretable paths in the latent space of pre-trained GANs in a model-agnostic manner. In the proposed method, the discovery is driven by a set of pairs of natural language sentences with contrasting semantics, named semantic dipoles, that serve as the limits of the interpretation that we require by the trainable latent paths to encode. By… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

  25. arXiv:2205.15895  [pdf, other

    cs.CV

    From Keypoints to Object Landmarks via Self-Training Correspondence: A novel approach to Unsupervised Landmark Discovery

    Authors: Dimitrios Mallis, Enrique Sanchez, Matt Bell, Georgios Tzimiropoulos

    Abstract: This paper proposes a novel paradigm for the unsupervised learning of object landmark detectors. Contrary to existing methods that build on auxiliary tasks such as image generation or equivariance, we propose a self-training approach where, departing from generic keypoints, a landmark detector and descriptor is trained to improve itself, tuning the keypoints into distinctive landmarks. To this end… ▽ More

    Submitted 25 February, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

  26. arXiv:2205.06701  [pdf, other

    cs.CV

    Knowledge Distillation Meets Open-Set Semi-Supervised Learning

    Authors: **g Yang, Xiatian Zhu, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: Existing knowledge distillation methods mostly focus on distillation of teacher's prediction and intermediate activation. However, the structured representation, which arguably is one of the most critical ingredients of deep models, is largely overlooked. In this work, we propose a novel {\em \modelname{}} ({\bf\em \shortname{})} method dedicated for distilling representational knowledge semantica… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: 13 pages

  27. arXiv:2205.03436  [pdf, other

    cs.CV

    EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

    Authors: Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Self-attention based models such as vision transformers (ViTs) have emerged as a very competitive architecture alternative to convolutional neural networks (CNNs) in computer vision. Despite increasingly stronger variants with ever-higher recognition accuracies, due to the quadratic complexity of self-attention, existing ViTs are typically demanding in computation and model size. Although several… ▽ More

    Submitted 21 July, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

    Comments: Accepted in ECCV 2022

  28. arXiv:2202.00046  [pdf, other

    cs.CV cs.AI

    Finding Directions in GAN's Latent Space for Neural Face Reenactment

    Authors: Stella Bounareli, Vasileios Argyriou, Georgios Tzimiropoulos

    Abstract: This paper is on face/head reenactment where the goal is to transfer the facial pose (3D head orientation and expression) of a target face to a source face. Previous methods focus on learning embedding networks for identity and pose disentanglement which proves to be a rather hard task, degrading the quality of the generated images. We take a different approach, bypassing the training of such netw… ▽ More

    Submitted 6 October, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: Accepted for publication in BMVC 2022. Project page: https://stelabou.github.io/stylegan-directions-reenactment/ Code: https://github.com/StelaBou/stylegan_directions_face_reenactment

  29. arXiv:2111.11288  [pdf, other

    cs.CV cs.LG

    SSR: An Efficient and Robust Framework for Learning with Unknown Label Noise

    Authors: Chen Feng, Georgios Tzimiropoulos, Ioannis Patras

    Abstract: Despite the large progress in supervised learning with neural networks, there are significant challenges in obtaining high-quality, large-scale and accurately labelled datasets. In such a context, how to learn in the presence of noisy labels has received more and more attention. As a relatively complex problem, in order to achieve good results, current approaches often integrate components from se… ▽ More

    Submitted 7 October, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: Accepted to BMVC2022

    Journal ref: https://bmvc2022.mpi-inf.mpg.de/372/

  30. arXiv:2111.02360  [pdf, other

    cs.CV

    Subpixel Heatmap Regression for Facial Landmark Localization

    Authors: Adrian Bulat, Enrique Sanchez, Georgios Tzimiropoulos

    Abstract: Deep Learning models based on heatmap regression have revolutionized the task of facial landmark localization with existing models working robustly under large poses, non-uniform illumination and shadows, occlusions and self-occlusions, low resolution and blur. However, despite their wide adoption, heatmap regression approaches suffer from discretization-induced errors related to both the heatmap… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: Accepted at BMVC 2021

  31. arXiv:2110.13859  [pdf, other

    cs.LG cs.AI cs.CV

    Defensive Tensorization

    Authors: Adrian Bulat, Jean Kossaifi, Sourav Bhattacharya, Yannis Panagakis, Timothy Hospedales, Georgios Tzimiropoulos, Nicholas D Lane, Maja Pantic

    Abstract: We propose defensive tensorization, an adversarial defence technique that leverages a latent high-order factorization of the network. The layers of a network are first expressed as factorized tensor layers. Tensor dropout is then applied in the latent subspace, therefore resulting in dense reconstructed weights, without the sparsity or perturbations typically induced by the randomization.Our appro… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: To be presented at BMVC 2021

  32. arXiv:2110.02902  [pdf, ps, other

    cs.CV

    SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

    Authors: Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: GSF and XViT. GSF is an efficient spatio-temporal feature extracting module that can be plugged into 2D CNNs for video action recognition. XViT is a… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Ranked third in the EPIC-Kitchens-100 Action Recognition Challenge @ CVPR 2021

  33. arXiv:2109.13357  [pdf, other

    cs.CV

    WarpedGANSpace: Finding non-linear RBF paths in GAN latent space

    Authors: Christos Tzelepis, Georgios Tzimiropoulos, Ioannis Patras

    Abstract: This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs, so as to provide an intuitive and easy way of controlling the underlying generative factors. In doing so, it addresses some of the limitations of the state-of-the-art works, namely, a) that they discover directions that are independent of the latent code, i.e., pat… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: Accepted for publication in ICCV 2021

  34. arXiv:2106.05968  [pdf, other

    cs.CV cs.AI cs.LG

    Space-time Mixing Attention for Video Transformer

    Authors: Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This paper is on video recognition using Transformers. Very recent attempts in this area have demonstrated promising results in terms of recognition accuracy, yet they have been also shown to induce, in many cases, significant computational overheads due to the additional modelling of the temporal information. In this work, we propose a Video Transformer model the complexity of which scales linear… ▽ More

    Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Updated results on SSv2

  35. arXiv:2103.17267  [pdf, other

    cs.LG cs.AI cs.CV

    Bit-Mixer: Mixed-precision networks with runtime bit-width selection

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: Mixed-precision networks allow for a variable bit-width quantization for every layer in the network. A major limitation of existing work is that the bit-width for each layer must be predefined during training time. This allows little flexibility if the characteristics of the device on which the network is deployed change during runtime. In this work, we propose Bit-Mixer, the very first method to… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

  36. arXiv:2103.16554  [pdf, other

    cs.CV cs.LG

    Pre-training strategies and datasets for facial representation learning

    Authors: Adrian Bulat, Shiyang Cheng, **g Yang, Andrew Garbett, Enrique Sanchez, Georgios Tzimiropoulos

    Abstract: What is the best way to learn a universal face representation? Recent work on Deep Learning in the area of face analysis has focused on supervised learning for specific tasks of interest (e.g. face recognition, facial landmark localization etc.) but has overlooked the overarching question of how to find a facial representation that can be readily adapted to several facial analysis tasks and datase… ▽ More

    Submitted 20 July, 2022; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Accepted at ECCV 2022

  37. arXiv:2103.13372  [pdf, other

    cs.CV cs.LG

    Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition

    Authors: Enrique Sanchez, Mani Kumar Tellamekala, Michel Valstar, Georgios Tzimiropoulos

    Abstract: Temporal context is key to the recognition of expressions of emotion. Existing methods, that rely on recurrent or self-attention models to enforce temporal consistency, work on the feature level, ignoring the task-specific temporal dependencies, and fail to model context uncertainty. To alleviate these issues, we build upon the framework of Neural Processes to propose a method for apparent emotion… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted at CVPR 2021

  38. arXiv:2102.04442  [pdf, other

    cs.CV cs.LG

    Improving memory banks for unsupervised learning with large mini-batch, consistency and hard negative mining

    Authors: Adrian Bulat, Enrique Sánchez-Lozano, Georgios Tzimiropoulos

    Abstract: An important component of unsupervised learning by instance-based discrimination is a memory bank for storing a feature representation for each training sample in the dataset. In this paper, we introduce 3 improvements to the vanilla memory bank-based formulation which brings massive accuracy gains: (a) Large mini-batch: we pull multiple augmentations for each sample within the same batch and show… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted at ICASSP 2021

  39. arXiv:2011.01864  [pdf, other

    cs.CV

    Semi-supervised Facial Action Unit Intensity Estimation with Contrastive Learning

    Authors: Enrique Sanchez, Adrian Bulat, Anestis Zaganidis, Georgios Tzimiropoulos

    Abstract: This paper tackles the challenging problem of estimating the intensity of Facial Action Units with few labeled images. Contrary to previous works, our method does not require to manually select key frames, and produces state-of-the-art results with as little as $2\%$ of annotated frames, which are \textit{randomly chosen}. To this end, we propose a semi-supervised learning approach where a spatio-… ▽ More

    Submitted 4 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: ACCV 2020

  40. arXiv:2010.03558  [pdf, other

    cs.CV cs.AI cs.LG

    High-Capacity Expert Binary Networks

    Authors: Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: Network binarization is a promising hardware-aware direction for creating efficient deep models. Despite its memory and computational advantages, reducing the accuracy gap between binary models and their real-valued counterparts remains an unsolved challenging research problem. To this end, we make the following 3 contributions: (a) To increase model capacity, we propose Expert Binary Convolution,… ▽ More

    Submitted 30 March, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted at ICLR 2021

  41. arXiv:2004.06657  [pdf, other

    cs.CV

    A Transfer Learning approach to Heatmap Regression for Action Unit intensity estimation

    Authors: Ioanna Ntinou, Enrique Sanchez, Adrian Bulat, Michel Valstar, Georgios Tzimiropoulos

    Abstract: Action Units (AUs) are geometrically-based atomic facial muscle movements known to produce appearance changes at specific facial locations. Motivated by this observation we propose a novel AU modelling problem that consists of jointly estimating their localisation and intensity. To this end, we propose a simple yet efficient approach based on Heatmap Regression that merges both problems into a sin… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: Submitted for review to IEEE Trans. on Affective Computing

  42. arXiv:2003.11535  [pdf, other

    cs.CV

    Training Binary Neural Networks with Real-to-Binary Convolutions

    Authors: Brais Martinez, **g Yang, Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper shows how to train binary networks to within a few percent points ($\sim 3-5 \%$) of the full precision counterpart. We first show how to build a strong baseline, which already achieves state-of-the-art accuracy, by combining recently proposed advances and carefully adjusting the optimization procedure. Secondly, we show that by attempting to minimize the discrepancy between the output… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

    Comments: ICLR 2020

  43. arXiv:2003.04289  [pdf, other

    cs.CV cs.LG

    Knowledge distillation via adaptive instance normalization

    Authors: **g Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper addresses the problem of model compression via knowledge distillation. To this end, we propose a new knowledge distillation method based on transferring feature statistics, specifically the channel-wise mean and variance, from the teacher to the student. Our method goes beyond the standard way of enforcing the mean and variance of the student to be similar to those of the teacher throug… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  44. arXiv:2003.01711  [pdf, other

    cs.CV cs.LG

    BATS: Binary ArchitecTure Search

    Authors: Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This paper proposes Binary ArchitecTure Search (BATS), a framework that drastically reduces the accuracy gap between binary neural networks and their real-valued counterparts by means of Neural Architecture Search (NAS). We show that directly applying NAS to the binary domain provides very poor results. To alleviate this, we describe, to our knowledge, for the first time, the 3 key ingredients for… ▽ More

    Submitted 23 July, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: accepted to ECCV 2020

  45. arXiv:2002.11098  [pdf, other

    cs.CV

    Toward fast and accurate human pose estimation via soft-gated skip connections

    Authors: Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

    Abstract: This paper is on highly accurate and highly efficient human pose estimation. Recent works based on Fully Convolutional Networks (FCNs) have demonstrated excellent results for this difficult problem. While residual connections within FCNs have proved to be quintessential for achieving high accuracy, we re-analyze this design choice in the context of improving both the accuracy and the efficiency ov… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

    Comments: Accepted to FG 2020 (oral)

  46. arXiv:1911.06095  [pdf, other

    cs.CV

    Towards Pose-invariant Lip-Reading

    Authors: Shiyang Cheng, **chuan Ma, Georgios Tzimiropoulos, Stavros Petridis, Adrian Bulat, Jie Shen, Maja Pantic

    Abstract: Lip-reading models have been significantly improved recently thanks to powerful deep learning architectures. However, most works focused on frontal or near frontal views of the mouth. As a consequence, lip-reading performance seriously deteriorates in non-frontal mouth views. In this work, we present a framework for training pose-invariant lip-reading models on synthetic data instead of collecting… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: 6 pages, 2 figures

  47. arXiv:1910.09469  [pdf, other

    cs.CV cs.LG eess.IV

    Object landmark discovery through unsupervised adaptation

    Authors: Enrique Sanchez, Georgios Tzimiropoulos

    Abstract: This paper proposes a method to ease the unsupervised learning of object landmark detectors. Similarly to previous methods, our approach is fully unsupervised in a sense that it does not require or make any use of annotated landmarks for the target object category. Contrary to previous works, we do however assume that a landmark detector, which has already learned a structured representation for a… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019. Code is available https://github.com/ESanchezLozano/SAIC-Unsupervised-landmark-detection-NeurIPS2019

  48. arXiv:1909.13863  [pdf, other

    cs.CV cs.LG eess.IV

    XNOR-Net++: Improved Binary Neural Networks

    Authors: Adrian Bulat, Georgios Tzimiropoulos

    Abstract: This paper proposes an improved training algorithm for binary neural networks in which both weights and activations are binary numbers. A key but fairly overlooked feature of the current state-of-the-art method of XNOR-Net is the use of analytically calculated real-valued scaling factors for re-weighting the output of binary convolutions. We argue that analytic calculation of these factors is sub-… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: Accepted to BMVC 2019

  49. arXiv:1909.04951  [pdf, other

    cs.CV

    AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces

    Authors: Muhammad Haris Khan, John McDonagh, Salman Khan, Muhammad Shahabuddin, Aditya Arora, Fahad Shahbaz Khan, Ling Shao, Georgios Tzimiropoulos

    Abstract: Being heavily reliant on animals, it is our ethical obligation to improve their well-being by understanding their needs. Several studies show that animal needs are often expressed through their faces. Though remarkable progress has been made towards the automatic understanding of human faces, this has regrettably not been the case with animal faces. There exists significant room and appropriate ne… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: 15 pages, 14 figures

  50. arXiv:1904.07852  [pdf, other

    cs.CV cs.AI cs.LG

    Matrix and tensor decompositions for training binary neural networks

    Authors: Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

    Abstract: This paper is on improving the training of binary neural networks in which both activations and weights are binary. While prior methods for neural network binarization binarize each filter independently, we propose to instead parametrize the weight tensor of each layer using matrix or tensor decomposition. The binarization process is then performed using this latent parametrization, via a quantiza… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.