Skip to main content

Showing 1–6 of 6 results for author: Pitsikalis, V

.
  1. arXiv:2311.16261  [pdf, other

    cs.CV cs.AI

    RelVAE: Generative Pretraining for few-shot Visual Relationship Detection

    Authors: Sotiris Karapiperis, Markos Diomataris, Vassilis Pitsikalis

    Abstract: Visual relations are complex, multimodal concepts that play an important role in the way humans perceive the world. As a result of their complexity, high-quality, diverse and large scale datasets for visual relations are still absent. In an attempt to overcome this data barrier, we choose to focus on the problem of few-shot Visual Relationship Detection (VRD), a setting that has been so far neglec… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  2. arXiv:2311.04834  [pdf, other

    cs.CV

    Self-Supervised Learning for Visual Relationship Detection through Masked Bounding Box Reconstruction

    Authors: Zacharias Anastasakis, Dimitrios Mallis, Markos Diomataris, George Alexandridis, Stefanos Kollias, Vassilis Pitsikalis

    Abstract: We present a novel self-supervised approach for representation learning, particularly for the task of Visual Relationship Detection (VRD). Motivated by the effectiveness of Masked Image Modeling (MIM), we propose Masked Bounding Box Reconstruction (MBBR), a variation of MIM where a percentage of the entities/objects within a scene are masked and subsequently reconstructed based on the unmasked obj… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Camera Ready paper version of WACV 2024

  3. arXiv:2309.03726  [pdf, other

    cs.CV

    Interpretable Visual Question Answering via Reasoning Supervision

    Authors: Maria Parelli, Dimitrios Mallis, Markos Diomataris, Vassilis Pitsikalis

    Abstract: Transformer-based architectures have recently demonstrated remarkable performance in the Visual Question Answering (VQA) task. However, such models are likely to disregard crucial visual cues and often rely on multimodal shortcuts and inherent biases of the language modality to predict the correct answer, a phenomenon commonly referred to as lack of visual grounding. In this work, we alleviate thi… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  4. An Incremental Learning framework for Large-scale CTR Prediction

    Authors: Petros Katsileros, Nikiforos Mandilaras, Dimitrios Mallis, Vassilis Pitsikalis, Stavros Theodorakis, Gil Chamiel

    Abstract: In this work we introduce an incremental learning framework for Click-Through-Rate (CTR) prediction and demonstrate its effectiveness for Taboola's massive-scale recommendation service. Our approach enables rapid capture of emerging trends through warm-starting from previously deployed models and fine tuning on "fresh" data only. Past knowledge is maintained via a teacher-student paradigm, where t… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: To be published in the Sixteenth ACM Conference on Recommender Systems (RecSys 22), Seattle, WA, USA

  5. arXiv:1902.05829  [pdf, other

    cs.CV

    Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

    Authors: Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Koutras, Athanasia Zlatintsi, Petros Maragos

    Abstract: Detecting visual relationships, i.e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch. We introduce a new deeply supervised two-branch architecture, the Multimodal Attentional Translation Embeddings, where the visual features of each branch are driven by a multimodal atten… ▽ More

    Submitted 15 February, 2019; originally announced February 2019.

  6. arXiv:1711.01775  [pdf, other

    cs.MM cs.HC cs.RO

    Multimodal Signal Processing and Learning Aspects of Human-Robot Interaction for an Assistive Bathing Robot

    Authors: A. Zlatintsi, I. Rodomagoulakis, P. Koutras, A. C. Dometios, V. Pitsikalis, C. S. Tzafestas, P. Maragos

    Abstract: We explore new aspects of assistive living on smart human-robot interaction (HRI) that involve automatic recognition and online validation of speech and gestures in a natural interface, providing social features for HRI. We introduce a whole framework and resources of a real-life scenario for elderly subjects supported by an assistive bathing robot, addressing health and hygiene care issues. We co… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.