-
Does Instruction Tuning Make LLMs More Consistent?
Authors:
Constanza Fierro,
Jiaang Li,
Anders Søgaard
Abstract:
The purpose of instruction tuning is enabling zero-shot performance, but instruction tuning has also been shown to improve chain-of-thought reasoning and value alignment (Si et al., 2023). Here we consider the impact on $\textit{consistency}$, i.e., the sensitivity of language models to small perturbations in the input. We compare 10 instruction-tuned LLaMA models to the original LLaMA-7b model an…
▽ More
The purpose of instruction tuning is enabling zero-shot performance, but instruction tuning has also been shown to improve chain-of-thought reasoning and value alignment (Si et al., 2023). Here we consider the impact on $\textit{consistency}$, i.e., the sensitivity of language models to small perturbations in the input. We compare 10 instruction-tuned LLaMA models to the original LLaMA-7b model and show that almost across-the-board they become more consistent, both in terms of their representations and their predictions in zero-shot and downstream tasks. We explain these improvements through mechanistic analyses of factual recall.
△ Less
Submitted 30 April, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Learning to Plan and Generate Text with Citations
Authors:
Constanza Fierro,
Reinald Kim Amplayo,
Fantine Huot,
Nicola De Cao,
Joshua Maynez,
Shashi Narayan,
Mirella Lapata
Abstract:
The increasing demand for the deployment of LLMs in information-seeking scenarios has spurred efforts in creating verifiable systems, which generate responses to queries along with supporting evidence. In this paper, we explore the attribution capabilities of plan-based models which have been recently shown to improve the faithfulness, grounding, and controllability of generated text. We conceptua…
▽ More
The increasing demand for the deployment of LLMs in information-seeking scenarios has spurred efforts in creating verifiable systems, which generate responses to queries along with supporting evidence. In this paper, we explore the attribution capabilities of plan-based models which have been recently shown to improve the faithfulness, grounding, and controllability of generated text. We conceptualize plans as a sequence of questions which serve as blueprints of the generated content and its organization. We propose two attribution models that utilize different variants of blueprints, an abstractive model where questions are generated from scratch, and an extractive model where questions are copied from the input. Experiments on long-form question-answering show that planning consistently improves attribution quality. Moreover, the citations generated by blueprint models are more accurate compared to those obtained from LLM-based pipelines lacking a planning component.
△ Less
Submitted 13 May, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
MuLan: A Study of Fact Mutability in Language Models
Authors:
Constanza Fierro,
Nicolas Garneau,
Emanuele Bugliarello,
Yova Kementchedjhieva,
Anders Søgaard
Abstract:
Facts are subject to contingencies and can be true or false in different circumstances. One such contingency is time, wherein some facts mutate over a given period, e.g., the president of a country or the winner of a championship. Trustworthy language models ideally identify mutable facts as such and process them accordingly. We create MuLan, a benchmark for evaluating the ability of English langu…
▽ More
Facts are subject to contingencies and can be true or false in different circumstances. One such contingency is time, wherein some facts mutate over a given period, e.g., the president of a country or the winner of a championship. Trustworthy language models ideally identify mutable facts as such and process them accordingly. We create MuLan, a benchmark for evaluating the ability of English language models to anticipate time-contingency, covering both 1:1 and 1:N relations. We hypothesize that mutable facts are encoded differently than immutable ones, hence being easier to update. In a detailed evaluation of six popular large language models, we consistently find differences in the LLMs' confidence, representations, and update behavior, depending on the mutability of a fact. Our findings should inform future work on the injection of and induction of time-contingent knowledge to/from LLMs.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
$μ$PLAN: Summarizing using a Content Plan as Cross-Lingual Bridge
Authors:
Fantine Huot,
Joshua Maynez,
Chris Alberti,
Reinald Kim Amplayo,
Priyanka Agrawal,
Constanza Fierro,
Shashi Narayan,
Mirella Lapata
Abstract:
Cross-lingual summarization consists of generating a summary in one language given an input document in a different language, allowing for the dissemination of relevant content across speakers of other languages. The task is challenging mainly due to the paucity of cross-lingual datasets and the compounded difficulty of summarizing and translating. This work presents $μ$PLAN, an approach to cross-…
▽ More
Cross-lingual summarization consists of generating a summary in one language given an input document in a different language, allowing for the dissemination of relevant content across speakers of other languages. The task is challenging mainly due to the paucity of cross-lingual datasets and the compounded difficulty of summarizing and translating. This work presents $μ$PLAN, an approach to cross-lingual summarization that uses an intermediate planning step as a cross-lingual bridge. We formulate the plan as a sequence of entities capturing the summary's content and the order in which it should be communicated. Importantly, our plans abstract from surface form: using a multilingual knowledge base, we align entities to their canonical designation across languages and generate the summary conditioned on this cross-lingual bridge and the input. Automatic and human evaluation on the XWikis dataset (across four language pairs) demonstrates that our planning objective achieves state-of-the-art performance in terms of informativeness and faithfulness. Moreover, $μ$PLAN models improve the zero-shot transfer to new cross-lingual language pairs compared to baselines without a planning component.
△ Less
Submitted 31 January, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
Authors:
Jiaang Li,
Yova Kementchedjhieva,
Constanza Fierro,
Anders Søgaard
Abstract:
Large-scale pretrained language models (LMs) are said to ``lack the ability to connect utterances to the world'' (Bender and Koller, 2020), because they do not have ``mental models of the world' '(Mitchell and Krakauer, 2023). If so, one would expect LM representations to be unrelated to representations induced by vision models. We present an empirical evaluation across four families of LMs (BERT,…
▽ More
Large-scale pretrained language models (LMs) are said to ``lack the ability to connect utterances to the world'' (Bender and Koller, 2020), because they do not have ``mental models of the world' '(Mitchell and Krakauer, 2023). If so, one would expect LM representations to be unrelated to representations induced by vision models. We present an empirical evaluation across four families of LMs (BERT, GPT-2, OPT and LLaMA-2) and three vision model architectures (ResNet, SegFormer, and MAE). Our experiments show that LMs partially converge towards representations isomorphic to those of vision models, subject to dispersion, polysemy and frequency. This has important implications for both multi-modal processing and the LM understanding debate (Mitchell and Krakauer, 2023).
△ Less
Submitted 6 July, 2024; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Factual Consistency of Multilingual Pretrained Language Models
Authors:
Constanza Fierro,
Anders Søgaard
Abstract:
Pretrained language models can be queried for factual knowledge, with potential applications in knowledge base acquisition and tasks that require inference. However, for that, we need to know how reliable this knowledge is, and recent work has shown that monolingual English language models lack consistency when predicting factual knowledge, that is, they fill-in-the-blank differently for paraphras…
▽ More
Pretrained language models can be queried for factual knowledge, with potential applications in knowledge base acquisition and tasks that require inference. However, for that, we need to know how reliable this knowledge is, and recent work has shown that monolingual English language models lack consistency when predicting factual knowledge, that is, they fill-in-the-blank differently for paraphrases describing the same fact. In this paper, we extend the analysis of consistency to a multilingual setting. We introduce a resource, mParaRel, and investigate (i) whether multilingual language models such as mBERT and XLM-R are more consistent than their monolingual counterparts; and (ii) if such models are equally consistent across languages. We find that mBERT is as inconsistent as English BERT in English paraphrases, but that both mBERT and XLM-R exhibit a high degree of inconsistency in English and even more so for all the other 45 languages.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Challenges and Strategies in Cross-Cultural NLP
Authors:
Daniel Hershcovich,
Stella Frank,
Heather Lent,
Miryam de Lhoneux,
Mostafa Abdou,
Stephanie Brandl,
Emanuele Bugliarello,
Laura Cabello Piqueras,
Ilias Chalkidis,
Ruixiang Cui,
Constanza Fierro,
Katerina Margatina,
Phillip Rust,
Anders Søgaard
Abstract:
Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogo…
▽ More
Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogous to cross-lingual and multilingual NLP, cross-cultural and multicultural NLP considers these differences in order to better serve users of NLP systems. We propose a principled framework to frame these efforts, and survey existing and potential strategies.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
Predicting Unplanned Readmissions with Highly Unstructured Data
Authors:
Constanza Fierro,
Jorge Pérez,
Javier Mora
Abstract:
Deep learning techniques have been successfully applied to predict unplanned readmissions of patients in medical centers. The training data for these models is usually based on historical medical records that contain a significant amount of free-text from admission reports, referrals, exam notes, etc. Most of the models proposed so far are tailored to English text data and assume that electronic m…
▽ More
Deep learning techniques have been successfully applied to predict unplanned readmissions of patients in medical centers. The training data for these models is usually based on historical medical records that contain a significant amount of free-text from admission reports, referrals, exam notes, etc. Most of the models proposed so far are tailored to English text data and assume that electronic medical records follow standards common in developed countries. These two characteristics make them difficult to apply in develo** countries that do not necessarily follow international standards for registering patient information, or that store text information in languages other than English.
In this paper we propose a deep learning architecture for predicting unplanned readmissions that consumes data that is significantly less structured compared with previous models in the literature. We use it to present the first results for this task in a large clinical dataset that mainly contains Spanish text data. The dataset is composed of almost 10 years of records in a Chilean medical center. On this dataset, our model achieves results that are comparable to some of the most recent results obtained in US medical centers for the same task (0.76 AUROC).
△ Less
Submitted 5 April, 2020; v1 submitted 19 March, 2020;
originally announced March 2020.
-
New galactic star clusters discovered in the VVV survey. Candidates projected on the inner disk and bulge
Authors:
J. Borissova,
A. -N. Chené,
S. Ramírez Alegría,
Saurabh Sharma,
J. R. A. Clarke,
R. Kurtev,
I. Negueruela,
A. Marco,
P. Amigo,
D. Minniti,
E. Bica,
C. Bonatto,
M. Catelan,
C. Fierro,
D. Geisler,
M. Gromadzki,
M. Hempel,
M. M. Hanson,
V. D. Ivanov,
P. Lucas,
D. Majaess,
C. Moni Bidin,
B. Popescu,
R. K. Saito
Abstract:
VISTA Variables in the Vía Láctea (VVV) is one of six ESO Public Surveys using the 4 meter Visible and Infrared Survey Telescope for Astronomy (VISTA). The VVV survey covers the Milky Way bulge and an adjacent section of the disk, and one of the principal objectives is to search for new star clusters within previously unreachable obscured parts of the Galaxy.
The primary motivation behind this w…
▽ More
VISTA Variables in the Vía Láctea (VVV) is one of six ESO Public Surveys using the 4 meter Visible and Infrared Survey Telescope for Astronomy (VISTA). The VVV survey covers the Milky Way bulge and an adjacent section of the disk, and one of the principal objectives is to search for new star clusters within previously unreachable obscured parts of the Galaxy.
The primary motivation behind this work is to discover and analyze obscured star clusters in the direction of the inner Galactic disk and bulge. Regions of the inner disk and bulge covered by the VVV survey were visually inspected using composite JHKs color images to select new cluster candidates on the basis of apparent overdensities. DR1, DR2, CASU, and PSF photometry of 10x10 arcmin fields centered on each candidate cluster were used to construct color-magnitude and color-color diagrams. Follow-up spectroscopy of the brightest members of several cluster candidates was obtained in order to clarify their nature.
We report the discovery of 58 new infrared cluster candidates. Fundamental parameters such as age, distance, and metallicity were determined for 20 of the most populous clusters.
△ Less
Submitted 26 June, 2014;
originally announced June 2014.
-
Massive open star clusters using the VVV survey III: A young massive cluster at the far edge of the Galactic bar
Authors:
S. Ramírez Alegría,
J. Borissova,
A. N. Chené,
E. O'Leary,
P. Amigo,
D. Minniti,
R. K. Saito,
D. Geisler,
R. Kurtev,
M. Hempel,
M. Gromadzki,
J. R. A. Clarke,
I. Negueruela,
A. Marco,
C. Fierro,
C. Bonatto,
M. Catelan
Abstract:
Context: Young massive clusters are key to map the Milky Way's structure, and near-IR large area sky surveys have contributed strongly to the discovery of new obscured massive stellar clusters.
Aims: We present the third article in a series of papers focused on young and massive clusters discovered in the VVV survey. This article is dedicated to the physical characterization of VVV CL086, using…
▽ More
Context: Young massive clusters are key to map the Milky Way's structure, and near-IR large area sky surveys have contributed strongly to the discovery of new obscured massive stellar clusters.
Aims: We present the third article in a series of papers focused on young and massive clusters discovered in the VVV survey. This article is dedicated to the physical characterization of VVV CL086, using part of its OB-stellar population.
Methods: We physically characterized the cluster using $JHK_S$ near-infrared photometry from ESO public survey VVV images, using the VVV-SkZ pipeline, and near-infrared $K$-band spectroscopy, following the methodology presented in the first article of the series.
Results: Individual distances for two observed stars indicate that the cluster is located at the far edge of the Galactic bar. These stars, which are probable cluster members from the statistically field-star decontaminated CMD, have spectral types between O9 and B0V. According to our analysis, this young cluster ($1.0$ Myr $<$ age $< 5.0$ Myr) is located at a distance of $11^{+5}_{-6}$ kpc, and we estimate a lower limit for the cluster total mass of $(2.8^{+1.6}_{-1.4})\cdot10^3 {M}_{\odot}$. It is likely that the cluster contains even earlier and more massive stars.
△ Less
Submitted 13 March, 2014;
originally announced March 2014.
-
Exploring nervous system transcriptomes during embryogenesis and metamorphosis in Xenopus tropicalis using EST analysis
Authors:
Ana C Fierro,
Raphaël Thuret,
Laurent Coen,
Muriel Perron,
Barbara A Demeneix,
Maurice Wegnez,
Gabor Gyapay,
Jean Weissenbach,
Patrick Wincker,
André Mazabraud,
Nicolas Pollet
Abstract:
Xenopus tropicalis is an anuran amphibian species used as model in vertebrate comparative genomics. It provides the same advantages as Xenopus laevis but is diploid and has a smaller genome of 1.7 Gbp. Therefore X. tropicalis is more amenable to systematic transcriptome surveys. We initiated a large-scale partial cDNA sequencing project to provide a functional genomics resource on genes expresse…
▽ More
Xenopus tropicalis is an anuran amphibian species used as model in vertebrate comparative genomics. It provides the same advantages as Xenopus laevis but is diploid and has a smaller genome of 1.7 Gbp. Therefore X. tropicalis is more amenable to systematic transcriptome surveys. We initiated a large-scale partial cDNA sequencing project to provide a functional genomics resource on genes expressed in the nervous system during early embryogenesis and metamorphosis in X. tropicalis. A gene index was defined and analysed after the collection of over 48,785 high quality sequences. Partial cDNA sequences were obtained from an embryonic head and retina library (30,272 sequences) and from a metamorphic brain and spinal cord library (27,602 sequences). These ESTs are estimated to represent 9,693 transcripts derived from an estimated 6,000 genes. An estimated 46% of these cDNA sequences contain their start codon. Further annotation included Gene Ontology functional classification, InterPro domain analysis, alternative splicing and non-coding RNA identification. Gene expression profiles were derived from EST counts and used to define transcripts specific to metamorphic stages of development. Moreover, these ESTs allowed identification of a set of 225 polymorphic microsatellites that can be used as genetic markers. These cDNA sequences permit in silico cloning of numerous genes and will facilitate studies aimed at deciphering the roles of cognate genes expressed in the nervous system during neural development and metamorphosis. The genomic resources developed to study X. tropicalis biology will accelerate exploration of amphibian physiology and genetics.
△ Less
Submitted 4 July, 2007;
originally announced July 2007.