Skip to main content

Showing 1–9 of 9 results for author: Panagopoulou, A

.
  1. arXiv:2405.19423  [pdf, other

    cs.CV cs.AI

    Evaluating Vision-Language Models on Bistable Images

    Authors: Artemis Panagopoulou, Coby Melkin, Chris Callison-Burch

    Abstract: Bistable images, also known as ambiguous or reversible images, present visual stimuli that can be seen in two distinct interpretations, though not simultaneously by the observer. In this study, we conduct the most extensive examination of vision-language models using bistable images to date. We manually gathered a dataset of 29 bistable images, along with their associated labels, and subjected the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2311.18799  [pdf, other

    cs.CV cs.CL

    X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

    Authors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

    Abstract: Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs). In this paper, we introduce a simple, yet effective, cross-modality framework built atop frozen LLMs that allows the integration of various modalities without extensive modality-specific custo… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  3. arXiv:2305.14724  [pdf, other

    cs.CL cs.AI cs.CV cs.HC

    I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors

    Authors: Tuhin Chakrabarty, Arkadiy Saakyan, Olivia Winn, Artemis Panagopoulou, Yue Yang, Marianna Apidianaki, Smaranda Muresan

    Abstract: Visual metaphors are powerful rhetorical devices used to persuade or communicate creative ideas through images. Similar to linguistic metaphors, they convey meaning implicitly through symbolism and juxtaposition of the symbols. We propose a new task of generating visual metaphors from linguistic metaphors. This is a challenging task for diffusion-based text-to-image models, such as DALL$\cdot$E 2,… ▽ More

    Submitted 14 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: ACL 2023 (Findings)

  4. arXiv:2305.08275  [pdf, other

    cs.CV

    ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

    Authors: Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese

    Abstract: Recent advancements in multimodal pre-training have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions. However, the methods used by existing frameworks to curate such multimodal data, in particular language descriptions for 3D shapes, are not scalable, and the collected language descriptions are… ▽ More

    Submitted 25 April, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

    Comments: CVPR2024

    Journal ref: CVPR2024

  5. arXiv:2304.01721  [pdf, other

    cs.OS

    Virtio-FPGA: a virtualization solution for SoC-attached FPGAs

    Authors: Anna Panagopoulou, Michele Paolino, Daniel Raho

    Abstract: Recently, FPGA accelerators have risen in popularity as they present a suitable way of satisfying the high-computation and low-power demands of real time applications. The modern electric transportation systems (such as aircraft, road vehicles) can greatly profit from embedded FPGAs, which incorporate both high-performance and flexibility features into a single SoC. At the same time, the virtualiz… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  6. arXiv:2211.11158  [pdf, other

    cs.CV cs.CL

    Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification

    Authors: Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel **, Chris Callison-Burch, Mark Yatskar

    Abstract: Concept Bottleneck Models (CBM) are inherently interpretable models that factor model decisions into human-readable concepts. They allow people to easily understand why a model is failing, a critical feature for high-stakes applications. CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and… ▽ More

    Submitted 25 April, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Published in CVPR 2023, 18 pages, 12 figures, 16 tables

  7. arXiv:2210.12905  [pdf, other

    cs.CL

    Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction

    Authors: Yue Yang, Artemis Panagopoulou, Marianna Apidianaki, Mark Yatskar, Chris Callison-Burch

    Abstract: Neural language models encode rich knowledge about entities and their relationships which can be extracted from their representations using probing. Common properties of nouns (e.g., red strawberries, small ant) are, however, more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts. We hypothesize this to mainly be the case for perceptual… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022; The first two authors contributed equally

    Journal ref: Findings of EMNLP 2022

  8. arXiv:2111.09276  [pdf, other

    cs.CV cs.CL

    Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval

    Authors: Yue Yang, Joongwon Kim, Artemis Panagopoulou, Mark Yatskar, Chris Callison-Burch

    Abstract: Schemata are structured representations of complex tasks that can aid artificial intelligence by allowing models to break down complex tasks into intermediate steps. We propose a novel system that induces schemata from web videos and generalizes them to capture unseen tasks with the goal of improving video retrieval performance. Our system proceeds in three major phases: (1) Given a task with rela… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  9. arXiv:2104.05845  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Visual Goal-Step Inference using wikiHow

    Authors: Yue Yang, Artemis Panagopoulou, Qing Lyu, Li Zhang, Mark Yatskar, Chris Callison-Burch

    Abstract: Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible st… ▽ More

    Submitted 9 September, 2021; v1 submitted 12 April, 2021; originally announced April 2021.