Skip to main content

Showing 1–6 of 6 results for author: Nikandrou, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19297  [pdf, other

    cs.CV

    Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation

    Authors: Malvina Nikandrou, Georgios Pantazopoulos, Ioannis Konstas, Alessandro Suglia

    Abstract: Continual learning focuses on incrementally training a model on a sequence of tasks with the aim of learning new tasks while minimizing performance drop on previous tasks. Existing approaches at the intersection of Continual Learning and Visual Question Answering (VQA) do not study how the multimodal nature of the input affects the learning dynamics of a model. In this paper, we demonstrate that e… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2405.04403  [pdf, other

    cs.CV cs.CL

    Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks

    Authors: Georgios Pantazopoulos, Amit Parekh, Malvina Nikandrou, Alessandro Suglia

    Abstract: Augmenting Large Language Models (LLMs) with image-understanding capabilities has resulted in a boom of high-performing Vision-Language models (VLMs). While studying the alignment of LLMs to human values has received widespread attention, the safety of VLMs has not received the same attention. In this paper, we explore the impact of jailbreaking on three state-of-the-art VLMs, each using a distinc… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. arXiv:2311.04067  [pdf, other

    cs.LG cs.AI cs.CV

    Multitask Multimodal Prompted Training for Interactive Embodied Task Completion

    Authors: Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia

    Abstract: Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation. To tackle these challenges, we propose an Embodied MultiModal Agent (EMMA): a unified encoder-decoder model that reasons over images and trajectories, and casts action predi… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  4. arXiv:2304.14623  [pdf, other

    cs.CV

    Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

    Authors: Lu Yu, Malvina Nikandrou, Jiali **, Verena Rieser

    Abstract: Automated image captioning has the potential to be a useful tool for people with vision impairments. Images taken by this user group are often noisy, which leads to incorrect and even unsafe model predictions. In this paper, we propose a quality-agnostic framework to improve the performance and robustness of image captioning models for visually impaired people. We address this problem from three a… ▽ More

    Submitted 1 May, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: To appear in IJCAI 2023

  5. arXiv:2211.04534  [pdf, other

    cs.CV cs.CL

    Going for GOAL: A Resource for Grounded Football Commentaries

    Authors: Alessandro Suglia, José Lopes, Emanuele Bastianelli, Andrea Vanzo, Shubham Agarwal, Malvina Nikandrou, Lu Yu, Ioannis Konstas, Verena Rieser

    Abstract: Recent video+language datasets cover domains where the interaction is highly structured, such as instructional videos, or where the interaction is scripted, such as TV shows. Both of these properties can lead to spurious cues to be exploited by models rather than learning to ground language. In this paper, we present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or `soccer')… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: Preprint formatted using the ACM Multimedia template (8 pages + appendix)

  6. arXiv:2210.00044  [pdf, other

    cs.LG

    Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering

    Authors: Mavina Nikandrou, Lu Yu, Alessandro Suglia, Ioannis Konstas, Verena Rieser

    Abstract: Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge. Although continual learning has been widely studied in computer vision, its application to Vision+Language tasks is not that straightforward, as settings can be parameterized in multiple ways according to their input modalities. In this paper, we present a detailed study of how diff… ▽ More

    Submitted 20 January, 2024; v1 submitted 30 September, 2022; originally announced October 2022.