-
LOCOST: State-Space Models for Long Document Abstractive Summarization
Authors:
Florian Le Bronnec,
Song Duong,
Mathieu Ravaut,
Alexandre Allauzen,
Nancy F. Chen,
Vincent Guigue,
Alberto Lumbreras,
Laure Soulier,
Patrick Gallinari
Abstract:
State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-a…
▽ More
State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.
△ Less
Submitted 25 March, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Learning from Multiple Sources for Data-to-Text and Text-to-Data
Authors:
Song Duong,
Alberto Lumbreras,
Mike Gartrell,
Patrick Gallinari
Abstract:
Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert structured data, such as graphs or tables into fluent text, and vice versa. These tasks are usually handled separately and use corpora extracted from a single source. Current systems leverage pre-trained language models fine-tuned on D2T or T2D tasks. This approach has two main limitations: first, a separate system has to be tun…
▽ More
Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert structured data, such as graphs or tables into fluent text, and vice versa. These tasks are usually handled separately and use corpora extracted from a single source. Current systems leverage pre-trained language models fine-tuned on D2T or T2D tasks. This approach has two main limitations: first, a separate system has to be tuned for each task and source; second, learning is limited by the scarcity of available corpora. This paper considers a more general scenario where data are available from multiple heterogeneous sources. Each source, with its specific data format and semantic domain, provides a non-parallel corpus of text and structured data. We introduce a variational auto-encoder model with disentangled style and content variables that allows us to represent the diversity that stems from multiple sources of text and data. Our model is designed to handle the tasks of D2T and T2D jointly. We evaluate our model on several datasets, and show that by learning from multiple sources, our model closes the performance gap with its supervised single-source counterpart and outperforms it in some cases.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Non-parametric clustering over user features and latent behavioral functions with dual-view mixture models
Authors:
Alberto Lumbreras,
Julien Velcin,
Marie Guégan,
Bertrand Jouve
Abstract:
We present a dual-view mixture model to cluster users based on their features and latent behavioral functions. Every component of the mixture model represents a probability density over a feature view for observed user attributes and a behavior view for latent behavioral functions that are indirectly observed through user actions or behaviors. Our task is to infer the groups of users as well as th…
▽ More
We present a dual-view mixture model to cluster users based on their features and latent behavioral functions. Every component of the mixture model represents a probability density over a feature view for observed user attributes and a behavior view for latent behavioral functions that are indirectly observed through user actions or behaviors. Our task is to infer the groups of users as well as their latent behavioral functions. We also propose a non-parametric version based on a Dirichlet Process to automatically infer the number of clusters. We test the properties and performance of the model on a synthetic dataset that represents the participation of users in the threads of an online forum. Experiments show that dual-view models outperform single-view ones when one of the views lacks information.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Bayesian Mean-parameterized Nonnegative Binary Matrix Factorization
Authors:
Alberto Lumbreras,
Louis Filstroff,
Cédric Févotte
Abstract:
Binary data matrices can represent many types of data such as social networks, votes, or gene expression. In some cases, the analysis of binary matrices can be tackled with nonnegative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually…
▽ More
Binary data matrices can represent many types of data such as social networks, votes, or gene expression. In some cases, the analysis of binary matrices can be tackled with nonnegative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually Bernoulli-distributed. Often, a link function is used to map the factorization to the $[0,1]$ range, ensuring a valid Bernoulli mean parameter. However, link functions have the potential disadvantage to lead to uninterpretable models. Mean-parameterized NMF, on the contrary, overcomes this problem. We propose a unified framework for Bayesian mean-parameterized nonnegative binary matrix factorization models (NBMF). We analyze three models which correspond to three possible constraints that respect the mean-parametrization without the need for link functions. Furthermore, we derive a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors. Next, we extend the proposed models to a nonparametric setting where the number of used latent dimensions is automatically driven by the observed data. We analyze the performance of our NBMF methods in multiple datasets for different tasks such as dictionary learning and prediction of missing data. Experiments show that our methods provide similar or superior results than the state of the art, while automatically detecting the number of relevant components.
△ Less
Submitted 20 June, 2020; v1 submitted 17 December, 2018;
originally announced December 2018.
-
The missing light of the Hubble Ultra Deep Field
Authors:
Alejandro Borlaff,
Ignacio Trujillo,
Javier Román,
John E. Beckman,
M. Carmen Eliche-Moral,
Raúl Infante-Sáinz,
Alejandro Lumbreras,
Rodrigo Takuro Sato Martín de Almagro,
Carlos Gómez-Guijarro,
María Cebrián,
Antonio Dorta,
Nicolás Cardiel,
Mohammad Akhlaghi,
Cristina Martínez-Lombilla
Abstract:
The Hubble Ultra Deep field (HUDF) is the deepest region ever observed with the Hubble Space Telescope. With the main objective of unveiling the nature of galaxies up to $z \sim 7-8$, the observing and reduction strategy have focused on the properties of small and unresolved objects, rather than the outskirts of the largest objects, which are usually over-subtracted.
We aim to create a new set o…
▽ More
The Hubble Ultra Deep field (HUDF) is the deepest region ever observed with the Hubble Space Telescope. With the main objective of unveiling the nature of galaxies up to $z \sim 7-8$, the observing and reduction strategy have focused on the properties of small and unresolved objects, rather than the outskirts of the largest objects, which are usually over-subtracted.
We aim to create a new set of WFC3/IR mosaics of the HUDF using novel techniques to preserve the properties of the low surface brightness regions. We created ABYSS: a pipeline that optimises the estimate and modelling of low-level systematic effects to obtain a robust background subtraction.
We have improved four key points in the reduction: 1) creation of new absolute sky flat fields, 2) extended persistence models, 3) dedicated sky background subtraction and 4) robust co-adding. The new mosaics successfully recover the low surface brightness structure removed on the previous HUDF published reductions.
The amount of light recovered with a mean surface brightness dimmer than $\overlineμ=26$ mar arcsec$^{-2}$ is equivalent to a m=19 mag source when compared to the XDF and a m=20 mag compared to the HUDF12. We present a set of techniques to reduce ultra-deep images ($μ>32.5$ mag arcsec$^{-2}$, $3σ$ in $10\times10$ arcsec boxes), that successfully allow to detect the low surface brightness structure of extended sources on ultra deep surveys. The developed procedures are applicable to HST, JWST, EUCLID and many other space and ground-based observatories. We will make the final ABYSS WFC3/IR HUDF mosaics publicly available at http://www.iac.es/proyecto/abyss/.
△ Less
Submitted 4 February, 2019; v1 submitted 28 September, 2018;
originally announced October 2018.
-
Closed-form Marginal Likelihood in Gamma-Poisson Matrix Factorization
Authors:
Louis Filstroff,
Alberto Lumbreras,
Cédric Févotte
Abstract:
We present novel understandings of the Gamma-Poisson (GaP) model, a probabilistic matrix factorization model for count data. We show that GaP can be rewritten free of the score/activation matrix. This gives us new insights about the estimation of the topic/dictionary matrix by maximum marginal likelihood estimation. In particular, this explains the robustness of this estimator to over-specified va…
▽ More
We present novel understandings of the Gamma-Poisson (GaP) model, a probabilistic matrix factorization model for count data. We show that GaP can be rewritten free of the score/activation matrix. This gives us new insights about the estimation of the topic/dictionary matrix by maximum marginal likelihood estimation. In particular, this explains the robustness of this estimator to over-specified values of the factorization rank, especially its ability to automatically prune irrelevant dictionary columns, as empirically observed in previous work. The marginalization of the activation matrix leads in turn to a new Monte Carlo Expectation-Maximization algorithm with favorable properties.
△ Less
Submitted 31 May, 2018; v1 submitted 5 January, 2018;
originally announced January 2018.
-
SMA Submillimeter Observations of HL Tau: Revealing a compact molecular outflow
Authors:
Alba M. Lumbreras,
Luis A. Zapata
Abstract:
We present archival high angular resolution ($\sim$ 2$''$) $^{12}$CO(3-2) line and continuum submillimeter observations of the young stellar object HL Tau made with the Submillimeter Array (SMA). The $^{12}$CO(3-2) line observations reveal the presence of a compact and wide opening angle bipolar outflow with a northeast and southwest orientation (P.A. = 50$^\circ$), and that is associated with the…
▽ More
We present archival high angular resolution ($\sim$ 2$''$) $^{12}$CO(3-2) line and continuum submillimeter observations of the young stellar object HL Tau made with the Submillimeter Array (SMA). The $^{12}$CO(3-2) line observations reveal the presence of a compact and wide opening angle bipolar outflow with a northeast and southwest orientation (P.A. = 50$^\circ$), and that is associated with the optical and infrared jet emanating from HL Tau with a similar orientation. On the other hand, the 850 $μ$m continuum emission observations exhibit a strong and compact source in the position of HL Tau that has a spatial size of $\sim$ 200 $\times$ 70 AU with a P.A. $=$ 145$^\circ$, and a dust mass of around 0.1 M$_\odot$. These physical parameters are in agreement with values obtained recently from millimeter observations. This submillimeter source is therefore related with the disk surrounding HL Tau.
△ Less
Submitted 2 January, 2014;
originally announced January 2014.
-
Analyse des rôles dans les communautés virtuelles : définitions et premières expérimentations sur IMDb
Authors:
Alberto Lumbreras,
James Lanagan,
Julien Velcin,
Bertrand Jouve
Abstract:
Role analysis in online communities allows us to understand and predict users behavior. Though several approaches have been followed, there is still lack of generalization of their methods and their results. In this paper, we discuss about the ground theory of roles and search for a consistent and computable definition that allows the automatic detection of roles played by users in forum threads o…
▽ More
Role analysis in online communities allows us to understand and predict users behavior. Though several approaches have been followed, there is still lack of generalization of their methods and their results. In this paper, we discuss about the ground theory of roles and search for a consistent and computable definition that allows the automatic detection of roles played by users in forum threads on the internet. We analyze the web site IMDb to illustrate the discussion.
△ Less
Submitted 11 March, 2016; v1 submitted 27 September, 2013;
originally announced September 2013.