Skip to main content

Showing 1–10 of 10 results for author: Papalampidi, P

.
  1. arXiv:2407.07726  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    PaliGemma: A versatile 3B VLM for transfer

    Authors: Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisenschlos, Rishabh Kabra, Matthias Bauer, Matko Bošnjak, Xi Chen, Matthias Minderer , et al. (10 additional authors not shown)

    Abstract: PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2402.05861  [pdf, other

    cs.CV

    Memory Consolidation Enables Long-Context Video Understanding

    Authors: Ivana Balažević, Yuge Shi, Pinelopi Papalampidi, Rahma Chaabouni, Skanda Koppula, Olivier J. Hénaff

    Abstract: Most transformer-based video encoders are limited to short temporal contexts due to their quadratic complexity. While various attempts have been made to extend this context, this has often come at the cost of both conceptual and computational complexity. We propose to instead re-purpose existing pre-trained video transformers by simply fine-tuning them to attend to memories derived non-parametrica… ▽ More

    Submitted 31 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2312.07395  [pdf, other

    cs.CV cs.CL

    A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

    Authors: Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak, Justin Chiu, Joe Heyward, Viorica Patraucean, Jiajun Shen, Antoine Miech, Andrew Zisserman, Aida Nematzdeh

    Abstract: Understanding long, real-world videos requires modeling of long-range visual dependencies. To this end, we explore video-first architectures, building on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion. However, we expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in stan… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  4. arXiv:2210.04829  [pdf, other

    cs.CL

    Hierarchical3D Adapters for Long Video-to-text Summarization

    Authors: Pinelopi Papalampidi, Mirella Lapata

    Abstract: In this paper, we focus on video-to-text summarization and investigate how to best utilize multimodal information for summarizing long inputs (e.g., an hour-long TV show) into long outputs (e.g., a multi-sentence summary). We extend SummScreen (Chen et al., 2021), a dialogue summarization dataset consisting of transcripts of TV episodes with reference summaries, and create a multimodal variant by… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  5. arXiv:2202.01709  [pdf, other

    cs.CL cs.LG

    Towards Coherent and Consistent Use of Entities in Narrative Generation

    Authors: Pinelopi Papalampidi, Kris Cao, Tomas Kocisky

    Abstract: Large pre-trained language models (LMs) have demonstrated impressive capabilities in generating long, fluent text; however, there is little to no analysis on their ability to maintain entity coherence and consistency. In this work, we focus on the end task of narrative generation and systematically analyse the long-range entity coherence and consistency in generated stories. First, we propose a se… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  6. arXiv:2111.08774  [pdf, other

    cs.CV

    Film Trailer Generation via Task Decomposition

    Authors: Pinelopi Papalampidi, Frank Keller, Mirella Lapata

    Abstract: Movie trailers perform multiple functions: they introduce viewers to the story, convey the mood and artistic style of the film, and encourage audiences to see the movie. These diverse functions make automatic trailer generation a challenging endeavor. We decompose it into two subtasks: narrative structure identification and sentiment prediction. We model movies as graphs, where nodes are shots and… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

  7. arXiv:2012.07536  [pdf, other

    cs.CL cs.CV

    Movie Summarization via Sparse Graph Construction

    Authors: Pinelopi Papalampidi, Frank Keller, Mirella Lapata

    Abstract: We summarize full-length movies by creating shorter videos containing their most informative scenes. We explore the hypothesis that a summary can be created by assembling scenes which are turning points (TPs), i.e., key events in a movie that describe its storyline. We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constru… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: Accepted at AAAI 2021

  8. arXiv:2004.12727  [pdf, other

    cs.CL

    Screenplay Summarization Using Latent Narrative Structure

    Authors: Pinelopi Papalampidi, Frank Keller, Lea Frermann, Mirella Lapata

    Abstract: Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront. As a result, such models are biased on position and often perform a smart selection of sentences from the beginning of the document. When summarizing long narratives, which have complex structure and present information piecemeal, simple position heurist… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: Accepted to appear at ACL 2020

  9. arXiv:1908.10328  [pdf, other

    cs.CL

    Movie Plot Analysis via Turning Point Identification

    Authors: Pinelopi Papalampidi, Frank Keller, Mirella Lapata

    Abstract: According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a screenplay: they define the plot structure, determine its progression and segment the screenplay into thematic units (e.g., setup, complications, aftermath). We propose the task of turning point identification in movies as a means of analyzing their narrative stru… ▽ More

    Submitted 30 August, 2019; v1 submitted 27 August, 2019; originally announced August 2019.

    Comments: Accepted to appear at EMNLP 2019

  10. arXiv:1804.06659  [pdf, other

    cs.CL

    NTUA-SLP at SemEval-2018 Task 3: Tracking Ironic Tweets using Ensembles of Word and Character Level Attentive RNNs

    Authors: Christos Baziotis, Nikos Athanasiou, Pinelopi Papalampidi, Athanasia Kolovou, Georgios Paraskevopoulos, Nikolaos Ellinas, Alexandros Potamianos

    Abstract: In this paper we present two deep-learning systems that competed at SemEval-2018 Task 3 "Irony detection in English tweets". We design and ensemble two independent models, based on recurrent neural networks (Bi-LSTM), which operate at the word and character level, in order to capture both the semantic and syntactic information in tweets. Our models are augmented with a self-attention mechanism, in… ▽ More

    Submitted 18 April, 2018; originally announced April 2018.

    Comments: SemEval-2018, Task 3 "Irony detection in English tweets"