Skip to main content

Showing 1–12 of 12 results for author: Févry, T

.
  1. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  2. arXiv:2202.01279  [pdf, other

    cs.LG cs.CL

    PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

    Authors: Stephen H. Bach, Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-David, Canwen Xu, Gunjan Chhablani, Han Wang, Jason Alan Fries, Maged S. Al-shaibani, Shanya Sharma, Urmish Thakker, Khalid Almubarak, Xiangru Tang, Dragomir Radev , et al. (2 additional authors not shown)

    Abstract: PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges… ▽ More

    Submitted 29 March, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: ACL 2022 Demo

  3. arXiv:2110.08207  [pdf, other

    cs.LG cs.CL

    Multitask Prompted Training Enables Zero-Shot Task Generalization

    Authors: Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen , et al. (16 additional authors not shown)

    Abstract: Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale,… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ICLR 2022 Spotlight (with extended discussion)

  4. arXiv:2102.11972  [pdf, other

    cs.LG cs.CL

    Do Transformer Modifications Transfer Across Implementations and Applications?

    Authors: Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

    Abstract: The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we f… ▽ More

    Submitted 10 September, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: To appear at EMNLP 2021 as a conference paper

  5. arXiv:2010.12821  [pdf, other

    cs.CL cs.LG

    Rethinking embedding coupling in pre-trained language models

    Authors: Hyung Won Chung, Thibault Févry, Henry Tsai, Melvin Johnson, Sebastian Ruder

    Abstract: We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art pre-trained language models. We show that decoupled embeddings provide increased modeling flexibility, allowing us to significantly improve the efficiency of parameter allocation in the input embedding of multilingual models. By reallocating the input embedding parameters in the Transfor… ▽ More

    Submitted 24 October, 2020; originally announced October 2020.

  6. arXiv:2005.14253  [pdf, ps, other

    cs.CL cs.LG

    Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking

    Authors: Thibault Févry, Nicholas FitzGerald, Livio Baldini Soares, Tom Kwiatkowski

    Abstract: In this work, we present an entity linking model which combines a Transformer architecture with large scale pretraining from Wikipedia links. Our model achieves the state-of-the-art on two commonly used entity linking datasets: 96.7% on CoNLL and 94.9% on TAC-KBP. We present detailed analyses to understand what design choices are important for entity linking, including choices of negative entity c… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: 11 pages, 8 figures, appearing at AKBC 2020

  7. arXiv:2004.07202  [pdf, other

    cs.CL cs.LG

    Entities as Experts: Sparse Memory Access with Entity Supervision

    Authors: Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, Tom Kwiatkowski

    Abstract: We focus on the problem of capturing declarative knowledge about entities in the learned parameters of a language model. We introduce a new model - Entities as Experts (EAE) - that can access distinct memories of the entities mentioned in a piece of text. Unlike previous efforts to integrate entity knowledge into sequence models, EAE's entity representations are learned directly from text. We show… ▽ More

    Submitted 6 October, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  8. arXiv:2001.03765  [pdf, other

    cs.CL

    Learning Cross-Context Entity Representations from Text

    Authors: Jeffrey Ling, Nicholas FitzGerald, Zifei Shan, Livio Baldini Soares, Thibault Févry, David Weiss, Tom Kwiatkowski

    Abstract: Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a f… ▽ More

    Submitted 11 January, 2020; originally announced January 2020.

  9. arXiv:1908.00615  [pdf, other

    eess.IV cs.CV stat.ML

    Improving localization-based approaches for breast cancer screening exam classification

    Authors: Thibault Févry, Jason Phang, Nan Wu, S. Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras

    Abstract: We trained and evaluated a localization-based deep CNN for breast cancer screening exam classification on over 200,000 exams (over 1,000,000 images). Our model achieves an AUC of 0.919 in predicting malignancy in patients undergoing breast cancer screening, reducing the error rate of the baseline (Wu et al., 2019a) by 23%. In addition, the models generates bounding boxes for benign and malignant f… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

    Comments: MIDL 2019 [arXiv:1907.08612]

    Report number: MIDL/2019/ExtendedAbstract/HyxoAR_AK4

  10. arXiv:1903.08297  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

    Authors: Nan Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin, Stanisław Jastrzębski, Thibault Févry, Joe Katsnelson, Eric Kim, Stacey Wolfson, Ujas Parikh, Sushma Gaddam, Leng Leng Young Lin, Kara Ho, Joshua D. Weinstein, Beatriu Reig, Yiming Gao, Hildegard Toth, Kristine Pysarenko, Alana Lewin, Jiyon Lee, Krystal Airola, Eralda Mema, Stephanie Chung , et al. (7 additional authors not shown)

    Abstract: We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

    Comments: MIDL 2019 [arXiv:1907.08612]

    Report number: MIDL/2019/ExtendedAbstract/SkxYez76FE

  11. arXiv:1811.01088  [pdf, other

    cs.CL

    Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

    Authors: Jason Phang, Thibault Févry, Samuel R. Bowman

    Abstract: Pretraining sentence encoders with language modeling and related unsupervised tasks has recently been shown to be very effective for language understanding tasks. By supplementing language model-style pretraining with further training on data-rich supervised tasks, such as natural language inference, we obtain additional performance improvements on the GLUE benchmark. Applying supplementary traini… ▽ More

    Submitted 27 February, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

  12. arXiv:1809.02669  [pdf, other

    cs.CL

    Unsupervised Sentence Compression using Denoising Auto-Encoders

    Authors: Thibault Févry, Jason Phang

    Abstract: In sentence compression, the task of shortening sentences while retaining the original meaning, models tend to be trained on large corpora containing pairs of verbose and compressed sentences. To remove the need for paired corpora, we emulate a summarization task and add noise to extend sentences and train a denoising auto-encoder to recover the original, constructing an end-to-end training regime… ▽ More

    Submitted 7 September, 2018; originally announced September 2018.

    Comments: CoNLL 2018