Skip to main content

Showing 1–50 of 55 results for author: Patras, I

.
  1. arXiv:2406.09070  [pdf, other

    cs.LG cs.AI cs.CV

    EquiPrompt: Debiasing Diffusion Models via Iterative Bootstrap** in Chain of Thoughts

    Authors: Zahraa Al Sahili, Ioannis Patras, Matthew Purver

    Abstract: In the domain of text-to-image generative models, the inadvertent propagation of biases inherent in training datasets poses significant ethical challenges, particularly in the generation of socially sensitive content. This paper introduces EquiPrompt, a novel method employing Chain of Thought (CoT) reasoning to reduce biases in text-to-image generative models. EquiPrompt uses iterative bootstrappi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2405.19100  [pdf, other

    cs.CV

    Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

    Authors: Zengqun Zhao, Yu Cao, Shaogang Gong, Ioannis Patras

    Abstract: Current facial expression recognition (FER) models are often designed in a supervised learning manner and thus are constrained by the lack of large-scale facial expression images with high-quality annotations. Consequently, these models often fail to generalize well, performing poorly on unseen images in inference. Vision-language-based zero-shot models demonstrate a promising potential for addres… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: The code and pre-trained models are available at https://github.com/zengqunzhao/Exp-CLIP

  3. arXiv:2404.18591  [pdf, other

    cs.CV cs.AI

    FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

    Authors: Abhishek Kumar Singh, Ioannis Patras

    Abstract: The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI. This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. Utilizing ControlNet and LoRA fine-tuning, our approach generates high-quality images from multimodal input… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 9 pages, 8 figures

  4. arXiv:2404.07078  [pdf, other

    cs.CV cs.HC

    VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

    Authors: Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: Recognising emotions in context involves identifying the apparent emotions of an individual, taking into account contextual cues from the surrounding scene. Previous approaches to this task have involved the design of explicit scene-encoding architectures or the incorporation of external scene-related information, such as captions. However, these methods often utilise limited contextual informatio… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: A. Xenos, N. Foteinopoulou and I. Ntinou contributed equally to this work; 14 pages, 5 figures

  5. arXiv:2403.17217  [pdf, other

    cs.CV cs.AI

    DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment

    Authors: Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: Video-driven neural face reenactment aims to synthesize realistic facial images that successfully preserve the identity and appearance of a source face, while transferring the target head pose and facial expressions. Existing GAN-based methods suffer from either distortions and visual artifacts or poor reconstruction quality, i.e., the background and several important appearance details, such as h… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Project page: https://stelabou.github.io/diffusionact/

  6. arXiv:2403.08161  [pdf, other

    cs.CV cs.AI

    LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

    Authors: Zhonglin Sun, Chen Feng, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: In this work we focus on learning facial representations that can be adapted to train effective face recognition models, particularly in the absence of labels. Firstly, compared with existing labelled face datasets, a vastly larger magnitude of unlabeled faces exists in the real world. We explore the learning strategy of these unlabeled facial images through self-supervised pretraining to transfer… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: accepted to CVPR 2024

  7. MOAB: Multi-Modal Outer Arithmetic Block For Fusion Of Histopathological Images And Genetic Data For Brain Tumor Grading

    Authors: Omnia Alwazzan, Abbas Khan, Ioannis Patras, Gregory Slabaugh

    Abstract: Brain tumors are an abnormal growth of cells in the brain. They can be classified into distinct grades based on their growth. Often grading is performed based on a histological image and is one of the most significant predictors of a patients prognosis, the higher the grade, the more aggressive the tumor. Correct diagnosis of a tumor grade remains challenging. Though histopathological grading has… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Journal ref: pages={1--5},year={2023},organization={IEEE}

  8. arXiv:2403.06339  [pdf, other

    cs.CV

    FOAA: Flattened Outer Arithmetic Attention For Multimodal Tumor Classification

    Authors: Omnia Alwazzan, Ioannis Patras, Gregory Slabaugh

    Abstract: Fusion of multimodal healthcare data holds great promise to provide a holistic view of a patient's health, taking advantage of the complementarity of different modalities while leveraging their correlation. This paper proposes a simple and effective approach, inspired by attention, to fuse discriminative features from different modalities. We propose a novel attention mechanism, called Flattened O… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted for ISBI-2024

  9. arXiv:2403.02138  [pdf, other

    cs.CV

    Self-Supervised Facial Representation Learning with Facial Region Awareness

    Authors: Zheng Gao, Ioannis Patras

    Abstract: Self-supervised pre-training has been proved to be effective in learning transferable representations that benefit various visual tasks. This paper asks this question: can self-supervised pre-training learn general facial representations for various facial analysis tasks? Recent efforts toward this goal are limited to treating each face image as a whole, i.e., learning consistent facial representa… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  10. arXiv:2402.12550  [pdf, other

    cs.CV cs.LG

    Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

    Authors: James Oldfield, Markos Georgopoulos, Grigorios G. Chrysos, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Jiankang Deng, Ioannis Patras

    Abstract: The Mixture of Experts (MoE) paradigm provides a powerful way to decompose dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability. However, a major challenge lies in the computational cost of scaling the number of experts high enough to achieve fine-grained specialization. In this paper, we propose the Multilinear Mixture of Experts… ▽ More

    Submitted 31 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Github: https://github.com/james-oldfield/muMoE. Project page: https://james-oldfield.github.io/muMoE/

  11. arXiv:2402.03553  [pdf, other

    cs.CV

    One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space

    Authors: Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: In this paper, we present our framework for neural face/head reenactment whose goal is to transfer the 3D head orientation and expression of a target face to a source face. Previous methods focus on learning embedding networks for identity and head pose/expression disentanglement which proves to be a rather hard task, degrading the quality of the generated images. We take a different approach, byp… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Preprint version, accepted for publication in International Journal of Computer Vision (IJCV)

  12. arXiv:2311.01573  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Improving Fairness using Vision-Language Driven Image Augmentation

    Authors: Moreno D'Incà, Christos Tzelepis, Ioannis Patras, Nicu Sebe

    Abstract: Fairness is crucial when training a deep-learning discriminative model, especially in the facial domain. Models tend to correlate specific characteristics (such as age and skin color) with unrelated attributes (downstream tasks), resulting in biases which do not correspond to reality. It is common knowledge that these correlations are present in the data and are then transferred to the models duri… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted for publication in WACV 2024

  13. arXiv:2310.16677  [pdf, other

    cs.HC

    Machine Learning Approaches for Fine-Grained Symptom Estimation in Schizophrenia: A Comprehensive Review

    Authors: Niki Maria Foteinopoulou, Ioannis Patras

    Abstract: Schizophrenia is a severe yet treatable mental disorder, it is diagnosed using a multitude of primary and secondary symptoms. Diagnosis and treatment for each individual depends on the severity of the symptoms, therefore there is a need for accurate, personalised assessments. However, the process can be both time-consuming and subjective; hence, there is a motivation to explore automated methods t… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 19 pages, 5 figures

  14. arXiv:2310.16640  [pdf, ps, other

    cs.CV cs.HC

    EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition

    Authors: Niki Maria Foteinopoulou, Ioannis Patras

    Abstract: Facial Expression Recognition (FER) is a crucial task in affective computing, but its conventional focus on the seven basic emotions limits its applicability to the complex and expanding emotional spectrum. To address the issue of new and unseen emotions present in dynamic in-the-wild FER, we propose a novel vision-language model that utilises sample-level text descriptions (i.e. captions of the c… ▽ More

    Submitted 18 March, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted at FG'2024

  15. arXiv:2310.13570  [pdf, other

    cs.CV

    A Simple Baseline for Knowledge-Based Visual Question Answering

    Authors: Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer questions requiring external knowledge effectively. A common limitation of such approaches is that they consist of relatively complicated pipelines and often heav… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 (camera-ready version)

  16. arXiv:2308.13392  [pdf, other

    cs.CV

    Self-Supervised Representation Learning with Cross-Context Learning between Global and Hypercolumn Features

    Authors: Zheng Gao, Chen Feng, Ioannis Patras

    Abstract: Whilst contrastive learning yields powerful representations by matching different augmented views of the same instance, it lacks the ability to capture the similarities between different instances. One popular way to address this limitation is by learning global features (after the global pooling) to capture inter-instance relationships based on knowledge distillation, where the global features of… ▽ More

    Submitted 1 September, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

  17. arXiv:2308.13382  [pdf, other

    cs.CV

    Prompting Visual-Language Models for Dynamic Facial Expression Recognition

    Authors: Zengqun Zhao, Ioannis Patras

    Abstract: This paper presents a novel visual-language model called DFER-CLIP, which is based on the CLIP model and designed for in-the-wild Dynamic Facial Expression Recognition (DFER). Specifically, the proposed DFER-CLIP consists of a visual part and a textual part. For the visual part, based on the CLIP image encoder, a temporal model consisting of several Transformer encoders is introduced for extractin… ▽ More

    Submitted 14 October, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: Accepted at BMVC 2023 (Camera ready)

  18. arXiv:2307.15697  [pdf, other

    cs.CV

    Aligned Unsupervised Pretraining of Object Detectors with Self-training

    Authors: Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos

    Abstract: The unsupervised pretraining of object detectors has recently become a key component of object detector training, as it leads to improved performance and faster convergence during the supervised fine-tuning stage. Existing unsupervised pretraining methods, however, typically rely on low-level information to define proposals that are used to train the detector. Furthermore, in the absence of class… ▽ More

    Submitted 7 July, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

  19. arXiv:2307.10797  [pdf, other

    cs.CV

    HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces

    Authors: Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual ar… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted for publication in ICCV 2023. Project page: https://stelabou.github.io/hyperreenact.github.io/ Code: https://github.com/StelaBou/HyperReenact

  20. arXiv:2305.14053  [pdf, other

    cs.CV cs.LG

    Parts of Speech-Grounded Subspaces in Vision-Language Models

    Authors: James Oldfield, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Ioannis Patras

    Abstract: Latent image representations arising from vision-language models have proved immensely useful for a variety of downstream tasks. However, their utility is limited by their entanglement with respect to different visual attributes. For instance, recent work has shown that CLIP image representations are often biased toward specific visual properties (such as objects or actions) in an unpredictable ma… ▽ More

    Submitted 12 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023

  21. arXiv:2304.03378  [pdf, other

    cs.CV cs.LG

    Self-Supervised Video Similarity Learning

    Authors: Giorgos Kordopatis-Zilos, Giorgos Tolias, Christos Tzelepis, Ioannis Kompatsiaris, Ioannis Patras, Symeon Papadopoulos

    Abstract: We introduce S$^2$VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning. Here, in contrast to prior work, SSL is used to perform video similarity learning and address multiple retrieval and detection tasks at once with no use of labe… ▽ More

    Submitted 16 June, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

  22. arXiv:2304.01042  [pdf, other

    cs.CV

    DivClust: Controlling Diversity in Deep Clustering

    Authors: Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, Ioannis Patras

    Abstract: Clustering has been a major research topic in the field of machine learning, one to which Deep Learning has recently been applied with significant success. However, an aspect of clustering that is not addressed by existing deep clustering methods, is that of efficiently producing multiple, diverse partitionings for a given dataset. This is particularly important, as a diverse set of base clusterin… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in CVPR 2023

  23. arXiv:2303.12756  [pdf, other

    cs.CV

    MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset

    Authors: Chen Feng, Ioannis Patras

    Abstract: Deep learning has achieved great success in recent years with the aid of advanced neural network structures and large-scale human-annotated datasets. However, it is often costly and difficult to accurately and efficiently annotate large-scale datasets, especially for some specialized domains where fine-grained labels are required. In this setting, coarse labels are much easier to acquire as they d… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 camera-ready version. Codes are available at https://github.com/MrChenFeng/MaskCon_CVPR2023

  24. arXiv:2303.11296  [pdf, other

    cs.CV

    Attribute-preserving Face Dataset Anonymization via Latent Code Optimization

    Authors: Simone Barattin, Christos Tzelepis, Ioannis Patras, Nicu Sebe

    Abstract: This work addresses the problem of anonymizing the identity of faces in a dataset of images, such that the privacy of those depicted is not violated, while at the same time the dataset is useful for downstream task such as for training machine learning models. To the best of our knowledge, we are the first to explicitly address this issue and deal with two major drawbacks of the existing state-of-… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted for publication in CVPR 2023

  25. arXiv:2211.11460  [pdf, other

    eess.SP cs.AI

    Motor Imagery Decoding Using Ensemble Curriculum Learning and Collaborative Training

    Authors: Georgios Zoumpourlis, Ioannis Patras

    Abstract: In this work, we study the problem of cross-subject motor imagery (MI) decoding from electroencephalography (EEG) data. Multi-subject EEG datasets present several kinds of domain shifts due to various inter-individual differences (e.g. brain anatomy, personality and cognitive profile). These domain shifts render multi-subject training a challenging task and also impede robust cross-subject general… ▽ More

    Submitted 21 February, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted for publication in 12th IEEE International Winter Conference on Brain-Computer Interface (BCI), 2024. Code: https://github.com/gzoumpourlis/Ensemble-MI

  26. arXiv:2209.13375  [pdf, other

    cs.CV

    StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment

    Authors: Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: In this paper we address the problem of neural face reenactment, where, given a pair of a source and a target facial image, we need to transfer the target's pose (defined as the head pose and its facial expressions) to the source image, by preserving at the same time the source's identity characteristics (e.g., facial shape, hair style, etc), even in the challenging case where the source and the t… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted for publication in IEEE FG 2023. Code: https://github.com/StelaBou/StyleMask

  27. arXiv:2209.11276  [pdf, other

    cs.CV cs.AI

    Capsule Network based Contrastive Learning of Unsupervised Visual Representations

    Authors: Harsh Panwar, Ioannis Patras

    Abstract: Capsule Networks have shown tremendous advancement in the past decade, outperforming the traditional CNNs in various task due to it's equivariant properties. With the use of vector I/O which provides information of both magnitude and direction of an object or it's part, there lies an enormous possibility of using Capsule Networks in unsupervised learning environment for visual representation tasks… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  28. Adaptive Soft Contrastive Learning

    Authors: Chen Feng, Ioannis Patras

    Abstract: Self-supervised learning has recently achieved great success in representation learning without human annotations. The dominant method -- that is contrastive learning, is generally based on instance discrimination tasks, i.e., individual samples are treated as independent categories. However, presuming all the samples are different contradicts the natural grou** of similar samples in common visu… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted to ICPR2022

  29. arXiv:2207.05577  [pdf, other

    cs.CV cs.HC cs.MM

    Learning from Label Relationships in Human Affect

    Authors: Niki Maria Foteinopoulou, Ioannis Patras

    Abstract: Human affect and mental state estimation in an automated manner, face a number of difficulties, including learning from labels with poor or no temporal resolution, learning from few datasets with little data (often due to confidentiality constraints) and, (very) long, in-the-wild videos. For these reasons, deep learning methodologies tend to overfit, that is, arrive at latent representations with… ▽ More

    Submitted 15 August, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: Accepted at ACM Multimedia (ACMMM) 2022, 10 pages, 4 figures

  30. arXiv:2206.02104  [pdf, other

    cs.CV

    ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences

    Authors: Christos Tzelepis, James Oldfield, Georgios Tzimiropoulos, Ioannis Patras

    Abstract: This work addresses the problem of discovering non-linear interpretable paths in the latent space of pre-trained GANs in a model-agnostic manner. In the proposed method, the discovery is driven by a set of pairs of natural language sentences with contrasting semantics, named semantic dipoles, that serve as the limits of the interpretation that we require by the trainable latent paths to encode. By… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

  31. arXiv:2206.00048  [pdf, other

    cs.CV cs.LG

    PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs

    Authors: James Oldfield, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Ioannis Patras

    Abstract: Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs. However, existing methods are often tailored to specific GAN architectures and are limited to either discovering global semantic directions that do not fac… ▽ More

    Submitted 6 February, 2023; v1 submitted 31 May, 2022; originally announced June 2022.

    Comments: Accepted at ICLR 2023. Code available at: https://github.com/james-oldfield/PandA

  32. arXiv:2111.11736  [pdf, other

    cs.CV

    Tensor Component Analysis for Interpreting the Latent Space of GANs

    Authors: James Oldfield, Markos Georgopoulos, Yannis Panagakis, Mihalis A. Nicolaou, Ioannis Patras

    Abstract: This paper addresses the problem of finding interpretable directions in the latent space of pre-trained Generative Adversarial Networks (GANs) to facilitate controllable image synthesis. Such interpretable directions correspond to transformations that can affect both the style and geometry of the synthetic images. However, existing approaches that utilise linear techniques to find these transforma… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: BMVC 2021

  33. arXiv:2111.11288  [pdf, other

    cs.CV cs.LG

    SSR: An Efficient and Robust Framework for Learning with Unknown Label Noise

    Authors: Chen Feng, Georgios Tzimiropoulos, Ioannis Patras

    Abstract: Despite the large progress in supervised learning with neural networks, there are significant challenges in obtaining high-quality, large-scale and accurately labelled datasets. In such a context, how to learn in the presence of noisy labels has received more and more attention. As a relatively complex problem, in order to achieve good results, current approaches often integrate components from se… ▽ More

    Submitted 7 October, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: Accepted to BMVC2022

    Journal ref: https://bmvc2022.mpi-inf.mpg.de/372/

  34. arXiv:2109.13357  [pdf, other

    cs.CV

    WarpedGANSpace: Finding non-linear RBF paths in GAN latent space

    Authors: Christos Tzelepis, Georgios Tzimiropoulos, Ioannis Patras

    Abstract: This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs, so as to provide an intuitive and easy way of controlling the underlying generative factors. In doing so, it addresses some of the limitations of the state-of-the-art works, namely, a) that they discover directions that are independent of the latent code, i.e., pat… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: Accepted for publication in ICCV 2021

  35. DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval

    Authors: Giorgos Kordopatis-Zilos, Christos Tzelepis, Symeon Papadopoulos, Ioannis Kompatsiaris, Ioannis Patras

    Abstract: In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal representations and similarity calculations, achieving high performance at a high computational cost or (ii) coarse-grained approaches representing/indexing vide… ▽ More

    Submitted 5 August, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

    Journal ref: International Journal of Computer Vision (2022)

  36. Few-Shot Action Localization without Knowing Boundaries

    Authors: Ting-Ting Xie, Christos Tzelepis, Fan Fu, Ioannis Patras

    Abstract: Learning to localize actions in long, cluttered, and untrimmed videos is a hard task, that in the literature has typically been addressed assuming the availability of large amounts of annotated training samples for each class -- either in a fully-supervised setting, where action boundaries are known, or in a weakly-supervised setting, where only class labels are known for each video. In this paper… ▽ More

    Submitted 23 September, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: ICMR21 Camera ready; link to code: https://github.com/June01/WFSAL-icmr21

  37. arXiv:2103.04846  [pdf, other

    cs.CV cs.AI

    Relationship-based Neural Baby Talk

    Authors: Fan Fu, Tingting Xie, Ioannis Patras, Sepehr Jalali

    Abstract: Understanding interactions between objects in an image is an important element for generating captions. In this paper, we propose a relationship-based neural baby talk (R-NBT) model to comprehensively investigate several types of pairwise object interactions by encoding each image via three different relationship-based graph attention networks (GATs). We study three main relationships: \textit{spa… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

  38. arXiv:2102.06064  [pdf, other

    cs.LG

    Uncertainty Propagation in Convolutional Neural Networks: Technical Report

    Authors: Christos Tzelepis, Ioannis Patras

    Abstract: In this technical report we study the problem of propagation of uncertainty (in terms of variances of given uni-variate normal random variables) through typical building blocks of a Convolutional Neural Network (CNN). These include layers that perform linear operations, such as 2D convolutions, fully-connected, and average pooling layers, as well as layers that act non-linearly on their input, suc… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

    Comments: A PyTorch implementation is available under the MIT license here: https://github.com/chi0tzp/uacnn

  39. arXiv:2101.06072  [pdf, ps, other

    cs.CV cs.LG cs.MM

    Video Summarization Using Deep Neural Networks: A Survey

    Authors: Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, Ioannis Patras

    Abstract: Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content. Several approaches have been developed over the last couple of decades and the current state of the art is represented by methods that rely on modern deep neural network architectures. This work focuses on the recent advances in the area and provides a compre… ▽ More

    Submitted 27 September, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: Accepted for publication at the Proceedings of the IEEE

  40. arXiv:2008.11254  [pdf, ps, other

    cs.CV

    Temporal Action Localization with Variance-Aware Networks

    Authors: Ting-Ting Xie, Christos Tzelepis, Ioannis Patras

    Abstract: This work addresses the problem of temporal action localization with Variance-Aware Networks (VAN), i.e., DNNs that use second-order statistics in the input and/or the output of regression tasks. We first propose a network (VANp) that when presented with the second-order statistics of the input, i.e., each sample has a mean and a variance, it propagates the mean and the variance throughout the net… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: Journal paper; Under review

  41. arXiv:2008.11170  [pdf, other

    cs.CV

    Boundary Uncertainty in a Single-Stage Temporal Action Localization Network

    Authors: Ting-Ting Xie, Christos Tzelepis, Ioannis Patras

    Abstract: In this paper, we address the problem of temporal action localization with a single stage neural network. In the proposed architecture we model the boundary predictions as uni-variate Gaussian distributions in order to model their uncertainties, which is the first in this area to the best of our knowledge. We use two uncertainty-aware boundary regression losses: first, the Kullback-Leibler diverge… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: Tech report

  42. arXiv:1908.07410  [pdf, other

    cs.CV cs.IR

    ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

    Authors: Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris

    Abstract: In this paper we introduce ViSiL, a Video Similarity Learning architecture that considers fine-grained Spatio-Temporal relations between pairs of videos -- such relations are typically lost in previous video retrieval approaches that embed the whole frame or even the whole video into a vector descriptor before the similarity estimation. By contrast, our Convolutional Neural Network (CNN)-based app… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

  43. arXiv:1907.09021  [pdf, other

    cs.CV

    TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition

    Authors: Mina Bishay, Georgios Zoumpourlis, Ioannis Patras

    Abstract: In this paper we propose a novel Temporal Attentive Relation Network (TARN) for the problems of few-shot and zero-shot action recognition. At the heart of our network is a meta-learning approach that learns to compare representations of variable temporal length, that is, either two videos of different length (in the case of few-shot action recognition) or a video and a semantic representation such… ▽ More

    Submitted 21 July, 2019; originally announced July 2019.

    Comments: 14 pages, IEEE Transactions on Affective Computing

    Report number: British Machine Vision Conference (BMVC) 2019

  44. arXiv:1905.10608  [pdf, other

    cs.CV

    Exploring Feature Representation and Training strategies in Temporal Action Localization

    Authors: Tingting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras

    Abstract: Temporal action localization has recently attracted significant interest in the Computer Vision community. However, despite the great progress, it is hard to identify which aspects of the proposed methods contribute most to the increase in localization performance. To address this issue, we conduct ablative experiments on feature extraction methods, fixed-size feature representation methods and tr… ▽ More

    Submitted 29 May, 2019; v1 submitted 25 May, 2019; originally announced May 2019.

    Comments: ICIP19 Camera Ready

  45. Registration-free Face-SSD: Single shot analysis of smiles, facial attributes, and affect in the wild

    Authors: Youngkyoon Jang, Hatice Gunes, Ioannis Patras

    Abstract: In this paper, we present a novel single shot face-related task analysis method, called Face-SSD, for detecting faces and for performing various face-related (classification/regression) tasks including smile recognition, face attribute prediction and valence-arousal estimation in the wild. Face-SSD uses a Fully Convolutional Neural Network (FCNN) to detect multiple faces of different sizes and rec… ▽ More

    Submitted 11 February, 2019; originally announced February 2019.

    Comments: 14 pages, 9 figures, 8 tables, accepted for Elsevier CVIU 2019

  46. arXiv:1809.04094  [pdf, other

    cs.MM cs.CV cs.IR

    FIVR: Fine-grained Incident Video Retrieval

    Authors: Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris

    Abstract: This paper introduces the problem of Fine-grained Incident Video Retrieval (FIVR). Given a query video, the objective is to retrieve all associated videos, considering several types of associations that range from duplicate videos to videos from the same incident. FIVR offers a single framework that contains several retrieval tasks as special cases. To address the benchmarking needs of all such ta… ▽ More

    Submitted 24 March, 2019; v1 submitted 11 September, 2018; originally announced September 2018.

    Journal ref: IEEE Transactions on Multimedia 2019

  47. arXiv:1808.02531  [pdf, other

    cs.CV

    SchiNet: Automatic Estimation of Symptoms of Schizophrenia from Facial Behaviour Analysis

    Authors: Mina Bishay, Petar Palasek, Stefan Priebe, Ioannis Patras

    Abstract: Patients with schizophrenia often display impairments in the expression of emotion and speech and those are observed in their facial behaviour. Automatic analysis of patients' facial expressions that is aimed at estimating symptoms of schizophrenia has received attention recently. However, the datasets that are typically used for training and evaluating the developed methods, contain only a small… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

    Comments: 13 pages, IEEE Transactions on Affective Computing

  48. arXiv:1801.04438  [pdf, other

    cs.CV

    Semi-supervised Fisher vector network

    Authors: Petar Palasek, Ioannis Patras

    Abstract: In this work we explore how the architecture proposed in [8], which expresses the processing steps of the classical Fisher vector pipeline approaches, i.e. dimensionality reduction by principal component analysis (PCA) projection, Gaussian mixture model (GMM) and Fisher vector descriptor extraction as network layers, can be modified into a hybrid network that combines the benefits of both unsuperv… ▽ More

    Submitted 13 January, 2018; originally announced January 2018.

  49. arXiv:1707.06119  [pdf, other

    cs.CV

    Discriminative convolutional Fisher vector network for action recognition

    Authors: Petar Palasek, Ioannis Patras

    Abstract: In this work we propose a novel neural network architecture for the problem of human action recognition in videos. The proposed architecture expresses the processing steps of classical Fisher vector approaches, that is dimensionality reduction by principal component analysis (PCA) projection, Gaussian mixture model (GMM) and Fisher vector descriptor extraction, as network layers. By contrast to ot… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.

  50. arXiv:1702.02510  [pdf, other

    q-bio.NC cs.HC

    AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups

    Authors: Juan Abdon Miranda-Correa, Mojtaba Khomami Abadi, Nicu Sebe, Ioannis Patras

    Abstract: We present AMIGOS-- A dataset for Multimodal research of affect, personality traits and mood on Individuals and GrOupS. Different to other databases, we elicited affect using both short and long videos in two social contexts, one with individual viewers and one with groups of viewers. The database allows the multimodal study of the affective responses, by means of neuro-physiological signals of in… ▽ More

    Submitted 13 April, 2017; v1 submitted 2 February, 2017; originally announced February 2017.

    Comments: 14 pages, Transaction on Affective Computing