Search | arXiv e-print repository

Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification

Authors: Pierre Lepagnol, Thomas Gerald, Sahar Ghannay, Christophe Servan, Sophie Rosset

Abstract: This study is part of the debate on the efficiency of large versus small language models for text classification by prompting.We assess the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large models.Across 15 datasets, our investigation benchmarks language models from 77M to 40B parameters using different architectures and scoring fu… ▽ More This study is part of the debate on the efficiency of large versus small language models for text classification by prompting.We assess the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large models.Across 15 datasets, our investigation benchmarks language models from 77M to 40B parameters using different architectures and scoring functions. Our findings reveal that small models can effectively classify texts, getting on par with or surpassing their larger counterparts.We developed and shared a comprehensive open-source repository that encapsulates our methodologies. This research underscores the notion that bigger isn't always better, suggesting that resource-efficient small models may offer viable solutions for specific data classification challenges. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Journal ref: LREC-COLING 2024, May 2024, TURIN, Italy

arXiv:2301.04413 [pdf, ps, other]

CoSPLADE: Contextualizing SPLADE for Conversational Information Retrieval

Authors: Nam Le Hai, Thomas Gerald, Thibault Formal, Jian-Yun Nie, Benjamin Piwowarski, Laure Soulier

Abstract: Conversational search is a difficult task as it aims at retrieving documents based not only on the current user query but also on the full conversation history. Most of the previous methods have focused on a multi-stage ranking approach relying on query reformulation, a critical intermediate step that might lead to a sub-optimal retrieval. Other approaches have tried to use a fully neural IR first… ▽ More Conversational search is a difficult task as it aims at retrieving documents based not only on the current user query but also on the full conversation history. Most of the previous methods have focused on a multi-stage ranking approach relying on query reformulation, a critical intermediate step that might lead to a sub-optimal retrieval. Other approaches have tried to use a fully neural IR first-stage, but are either zero-shot or rely on full learning-to-rank based on a dataset with pseudo-labels. In this work, leveraging the CANARD dataset, we propose an innovative lightweight learning technique to train a first-stage ranker based on SPLADE. By relying on SPLADE sparse representations, we show that, when combined with a second-stage ranker based on T5Mono, the results are competitive on the TREC CAsT 2020 and 2021 tracks. △ Less

Submitted 4 July, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: Accepted at ECIR 2023

arXiv:2201.03356 [pdf, other]

Continual Learning of Long Topic Sequences in Neural Information Retrieval

Authors: Thomas Gerald, Laure Soulier

Abstract: In information retrieval (IR) systems, trends and users' interests may change over time, altering either the distribution of requests or contents to be recommended. Since neural ranking approaches heavily depend on the training data, it is crucial to understand the transfer capacity of recent IR approaches to address new domains in the long term. In this paper, we first propose a dataset based upo… ▽ More In information retrieval (IR) systems, trends and users' interests may change over time, altering either the distribution of requests or contents to be recommended. Since neural ranking approaches heavily depend on the training data, it is crucial to understand the transfer capacity of recent IR approaches to address new domains in the long term. In this paper, we first propose a dataset based upon the MSMarco corpus aiming at modeling a long stream of topics as well as IR property-driven controlled settings. We then in-depth analyze the ability of recent neural IR models while continually learning those streams. Our empirical study highlights in which particular cases catastrophic forgetting occurs (e.g., level of similarity between tasks, peculiarities on text length, and ways of learning models) to provide future directions in terms of model design. △ Less

Submitted 10 January, 2022; originally announced January 2022.

arXiv:2112.04344 [pdf, other]

Does Structure Matter? Leveraging Data-to-Text Generation for Answering Complex Information Needs

Authors: Hanane Djeddal, Thomas Gerald, Laure Soulier, Karen Pinel-Sauvagnat, Lynda Tamine

Abstract: In this work, our aim is to provide a structured answer in natural language to a complex information need. Particularly, we envision using generative models from the perspective of data-to-text generation. We propose the use of a content selection and planning pipeline which aims at structuring the answer by generating intermediate plans. The experimental evaluation is performed using the TREC Com… ▽ More In this work, our aim is to provide a structured answer in natural language to a complex information need. Particularly, we envision using generative models from the perspective of data-to-text generation. We propose the use of a content selection and planning pipeline which aims at structuring the answer by generating intermediate plans. The experimental evaluation is performed using the TREC Complex Answer Retrieval (CAR) dataset. We evaluate both the generated answer and its corresponding structure and show the effectiveness of planning-based models in comparison to a text-to-text model. △ Less

Submitted 8 December, 2021; originally announced December 2021.

Comments: 8 pages, 1 figure, ECIR 2022 short paper

arXiv:2004.04667 [pdf, other]

Geomstats: A Python Package for Riemannian Geometry in Machine Learning

Authors: Nina Miolane, Alice Le Brigant, Johan Mathe, Benjamin Hou, Nicolas Guigui, Yann Thanwerdas, Stefan Heyder, Olivier Peltre, Niklas Koep, Hadi Zaatiti, Hatem Hajri, Yann Cabanes, Thomas Gerald, Paul Chauchat, Christian Shewmake, Bernhard Kainz, Claire Donnat, Susan Holmes, Xavier Pennec

Abstract: We introduce Geomstats, an open-source Python toolbox for computations and statistics on nonlinear manifolds, such as hyperbolic spaces, spaces of symmetric positive definite matrices, Lie groups of transformations, and many more. We provide object-oriented and extensively unit-tested implementations. Among others, manifolds come equipped with families of Riemannian metrics, with associated expone… ▽ More We introduce Geomstats, an open-source Python toolbox for computations and statistics on nonlinear manifolds, such as hyperbolic spaces, spaces of symmetric positive definite matrices, Lie groups of transformations, and many more. We provide object-oriented and extensively unit-tested implementations. Among others, manifolds come equipped with families of Riemannian metrics, with associated exponential and logarithmic maps, geodesics and parallel transport. Statistics and learning algorithms provide methods for estimation, clustering and dimension reduction on manifolds. All associated operations are vectorized for batch computation and provide support for different execution backends, namely NumPy, PyTorch and TensorFlow, enabling GPU acceleration. This paper presents the package, compares it with related libraries and provides relevant code examples. We show that Geomstats provides reliable building blocks to foster research in differential geometry and statistics, and to democratize the use of Riemannian geometry in machine learning applications. The source code is freely available under the MIT license at \url{geomstats.ai}. △ Less

Submitted 7 April, 2020; originally announced April 2020.

arXiv:1907.01662 [pdf, other]

From Node Embedding To Community Embedding : A Hyperbolic Approach

Authors: Thomas Gerald, Hadi Zaatiti, Hatem Hajri, Nicolas Baskiotis, Olivier Schwander

Abstract: Detecting communities on graphs has received significant interest in recent literature. Current state-of-the-art community embedding approach called \textit{ComE} tackles this problem by coupling graph embedding with community detection. Considering the success of hyperbolic representations of graph-structured data in last years, an ongoing challenge is to set up a hyperbolic approach for the comm… ▽ More Detecting communities on graphs has received significant interest in recent literature. Current state-of-the-art community embedding approach called \textit{ComE} tackles this problem by coupling graph embedding with community detection. Considering the success of hyperbolic representations of graph-structured data in last years, an ongoing challenge is to set up a hyperbolic approach for the community detection problem. The present paper meets this challenge by introducing a Riemannian equivalent of \textit{ComE}. Our proposed approach combines hyperbolic embeddings with Riemannian K-means or Riemannian mixture models to perform community detection. We illustrate the usefulness of this framework through several experiments on real-world social networks and comparisons with \textit{ComE} and recent hyperbolic-based classification approaches. △ Less

Submitted 1 March, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

Comments: This version replaces the previous one. The package generating the experimental results will be made public in the near future

arXiv:1906.09838 [pdf, other]

Binary Stochastic Representations for Large Multi-class Classification

Authors: Thomas Gerald, Aurélia Léon, Nicolas Baskiotis, Ludovic Denoyer

Abstract: Classification with a large number of classes is a key problem in machine learning and corresponds to many real-world applications like tagging of images or textual documents in social networks. If one-vs-all methods usually reach top performance in this context, these approaches suffer from a high inference complexity, linear w.r.t the number of categories. Different models based on the notion of… ▽ More Classification with a large number of classes is a key problem in machine learning and corresponds to many real-world applications like tagging of images or textual documents in social networks. If one-vs-all methods usually reach top performance in this context, these approaches suffer from a high inference complexity, linear w.r.t the number of categories. Different models based on the notion of binary codes have been proposed to overcome this limitation, achieving in a sublinear inference complexity. But they a priori need to decide which binary code to associate to which category before learning using more or less complex heuristics. We propose a new end-to-end model which aims at simultaneously learning to associate binary codes with categories, but also learning to map inputs to binary codes. This approach called Deep Stochastic Neural Codes (DSNC) keeps the sublinear inference complexity but do not need any a priori tuning. Experimental results on different datasets show the effectiveness of the approach w.r.t baseline methods. △ Less

Submitted 24 June, 2019; originally announced June 2019.

Showing 1–7 of 7 results for author: Gerald, T