Showing 1–2 of 2 results for author: Abdessalem, A

Search v0.5.6 released 2020-02-24

arXiv:2310.19923 [pdf, other]

cs.CL cs.AI cs.LG

**a Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

Authors: Michael Günther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao

Abstract: Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like information retrieval, semantic clustering, and text re-ranking, most existing open-source models, especially those built on architectures like BERT, struggle to represent lengthy documents and often… ▽ More Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like information retrieval, semantic clustering, and text re-ranking, most existing open-source models, especially those built on architectures like BERT, struggle to represent lengthy documents and often resort to truncation. One common approach to mitigate this challenge involves splitting documents into smaller paragraphs for embedding. However, this strategy results in a much larger set of vectors, consequently leading to increased memory consumption and computationally intensive vector searches with elevated latency. To address these challenges, we introduce **a Embeddings 2, an open-source text embedding model capable of accommodating up to 8192 tokens. This model is designed to transcend the conventional 512-token limit and adeptly process long documents. **a Embeddings 2 not only achieves state-of-the-art performance on a range of embedding-related tasks in the MTEB benchmark but also matches the performance of OpenAI's proprietary ada-002 model. Additionally, our experiments indicate that an extended context can enhance performance in tasks such as NarrativeQA. △ Less

Submitted 4 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: 14 pages

MSC Class: 68T50 ACM Class: I.2.7
arXiv:1004.4605 [pdf]

cs.OH

Video shot boundary detection using motion activity descriptor

Authors: Abdelati Malek Amel, Ben Abdelali Abdessalem, Mtibaa Abdellatif

Abstract: This paper focus on the study of the motion activity descriptor for shot boundary detection in video sequences. We interest in the validation of this descriptor in the aim of its real time implementation with reasonable high performances in shot boundary detection. The motion activity information is extracted in uncompressed domain based on adaptive rood pattern search (ARPS) algorithm. In this co… ▽ More This paper focus on the study of the motion activity descriptor for shot boundary detection in video sequences. We interest in the validation of this descriptor in the aim of its real time implementation with reasonable high performances in shot boundary detection. The motion activity information is extracted in uncompressed domain based on adaptive rood pattern search (ARPS) algorithm. In this context, the motion activity descriptor was applied for different video sequence. △ Less

Submitted 26 April, 2010; originally announced April 2010.

Comments: Abdelati Malek Amel, Ben Abdelali Abdessalem and Mtibaa Abdellatif, "Video shot boundary detection using motion activity descriptor", Journal of Telecommunications, Volume 2, Issue 1, p54-59, April 2010

Journal ref: Journal of Telecommunications, Volume 2, Issue 1, p54-59, April 2010

Search v0.5.6 released 2020-02-24