Skip to main content

Showing 1–17 of 17 results for author: Morin, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.16689  [pdf, other

    cs.CL cs.AI

    Adaptation of Biomedical and Clinical Pretrained Models to French Long Documents: A Comparative Study

    Authors: Adrien Bazoge, Emmanuel Morin, Beatrice Daille, Pierre-Antoine Gourraud

    Abstract: Recently, pretrained language models based on BERT have been introduced for the French biomedical domain. Although these models have achieved state-of-the-art results on biomedical and clinical NLP tasks, they are constrained by a limited input sequence length of 512 tokens, which poses challenges when applied to clinical notes. In this paper, we present a comparative study of three adaptation str… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  2. arXiv:2402.13432  [pdf, other

    cs.CL cs.AI cs.LG

    DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

    Authors: Yanis Labrak, Adrien Bazoge, Oumaima El Khettari, Mickael Rouvier, Pacome Constant dit Beaufils, Natalia Grabar, Beatrice Daille, Solen Quiniou, Emmanuel Morin, Pierre-Antoine Gourraud, Richard Dufour

    Abstract: The biomedical domain has sparked a significant interest in the field of Natural Language Processing (NLP), which has seen substantial advancements with pre-trained language models (PLMs). However, comparing these models has proven challenging due to variations in evaluation protocols across different models. A fair solution is to aggregate diverse downstream tasks into a benchmark, allowing for t… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-Coling 2024

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  3. arXiv:2402.10373  [pdf, other

    cs.CL cs.AI cs.LG

    BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

    Authors: Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, Richard Dufour

    Abstract: Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential applications across specialized domains such as healthcare and medicine. Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose LLMs to the medical domain presents significant challenges. In this paper, we introduce BioMistral, an open-sourc… ▽ More

    Submitted 9 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 - Proceedings of the 62st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

    Journal ref: Proceedings of the 62st Annual Meeting of the Association for Computational Linguistics - Volume 1: Long Papers (ACL 2024)

  4. arXiv:2304.04280  [pdf, other

    cs.CL cs.AI

    FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

    Authors: Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud

    Abstract: This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy, mixing single and multiple answers. Each instance of the dataset contains an identifier, a question, five possible answers and their manual… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

    Journal ref: Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI 2022)

  5. arXiv:2304.00958  [pdf, other

    cs.CL

    DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains

    Authors: Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud

    Abstract: In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains. In this paper, we propose an original study of PLMs in the medical domain on French language. We compare, for the first time,… ▽ More

    Submitted 4 May, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted at ACL 2023

  6. arXiv:2111.02780  [pdf

    cs.LG

    Flood forecasting with machine learning models in an operational framework

    Authors: Sella Nevo, Efrat Morin, Adi Gerzi Rosenthal, Asher Metzger, Chen Barshai, Dana Weitzner, Dafi Voloshin, Frederik Kratzert, Gal Elidan, Gideon Dror, Gregory Begelman, Grey Nearing, Guy Shalev, Hila Noga, Ira Shavitt, Liora Yuklea, Moriah Royz, Niv Giladi, Nofar Peled Levi, Ofir Reich, Oren Gilon, Ronnie Maor, Shahar Timnat, Tal Shechter, Vladimir Anisimov , et al. (6 additional authors not shown)

    Abstract: The operational flood forecasting system by Google was developed to provide accurate real-time flood warnings to agencies and the public, with a focus on riverine floods in large, gauged rivers. It became operational in 2018 and has since expanded geographically. This forecasting system consists of four subsystems: data validation, stage forecasting, inundation modeling, and alert distribution. Ma… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 36 pages, 10 figures, 3 tables, 1 supplementary table (9 pages)

  7. arXiv:2103.11521  [pdf, other

    cs.LG cs.CV

    Conditional Frechet Inception Distance

    Authors: Michael Soloveitchik, Tzvi Diskin, Efrat Morin, Ami Wiesel

    Abstract: We consider distance functions between conditional distributions. We focus on the Wasserstein metric and its Gaussian case known as the Frechet Inception Distance (FID). We develop conditional versions of these metrics, analyze their relations and provide a closed form solution to the conditional FID (CFID) metric. We numerically compare the metrics in the context of performance evaluation of mode… ▽ More

    Submitted 28 February, 2022; v1 submitted 21 March, 2021; originally announced March 2021.

  8. Recent Advances in End-to-End Spoken Language Understanding

    Authors: Natalia Tomashenko, Antoine Caubriere, Yannick Esteve, Antoine Laurent, Emmanuel Morin

    Abstract: This work investigates spoken language understanding (SLU) systems in the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. Two SLU tasks are considered: named entity recognition (NER) and semantic slot filling (SF). For these tasks, in order to improve the model performance, we explore various techniques inclu… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Journal ref: Statistical Language and Speech Processing. SLSP 2019

  9. arXiv:1907.12878  [pdf, other

    cs.CL

    Deep Retrieval-Based Dialogue Systems: A Short Review

    Authors: Basma El Amel Boussaha, Nicolas Hernandez, Christine Jacquin, Emmanuel Morin

    Abstract: Building dialogue systems that naturally converse with humans is being an attractive and an active research domain. Multiple systems are being designed everyday and several datasets are being available. For this reason, it is being hard to keep an up-to-date state-of-the-art. In this work, we present the latest and most relevant retrieval-based dialogue systems and the available datasets used to b… ▽ More

    Submitted 30 July, 2019; originally announced July 2019.

  10. arXiv:1906.07601  [pdf, other

    cs.CL cs.SD eess.AS

    Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability

    Authors: Antoine Caubrière, Natalia Tomashenko, Antoine Laurent, Emmanuel Morin, Nathalie Camelin, Yannick Estève

    Abstract: We present an end-to-end approach to extract semantic concepts directly from the speech audio signal. To overcome the lack of data available for this spoken language understanding approach, we investigate the use of a transfer learning strategy based on the principles of curriculum learning. This approach allows us to exploit out-of-domain data that can help to prepare a fully neural architecture.… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted to the INTERSPEECH 2019 conference. Submitted on March 29, 2019 (Paper submission deadline)

  11. arXiv:1805.12045  [pdf, other

    cs.CL

    End-to-end named entity extraction from speech

    Authors: Sahar Ghannay, Antoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin

    Abstract: Named entity recognition (NER) is among SLU tasks that usually extract semantic information from textual documents. Until now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs. Such approach has some disadvantages (error propagation, metric to tune ASR systems sub-opt… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: Submitted to Interspeech 2018

    ACM Class: I.2.7

  12. arXiv:1412.4401  [pdf

    cs.CY cs.CL

    Tools for Terminology Processing

    Authors: C. Enguehard, B. Daille, E. Morin

    Abstract: Automatic terminology processing appeared 10 years ago when electronic corpora became widely available. Such processing may be statistically or linguistically based and produces terminology resources that can be used in a number of applications : indexing, information retrieval, technology watch, etc. We present the tools that have been developed in the IRIN Institute. They all take as input texts… ▽ More

    Submitted 14 December, 2014; originally announced December 2014.

    Journal ref: R. K. Arora, M. Kulkarni, H. Darbari. The Indo-European Conference on Multilingual Communications Technologies (IEMCT), Jun 2002, Pune, India. Tata McGraw-Hill, pp.218 - 229

  13. arXiv:1210.5751  [pdf

    cs.CL

    Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking

    Authors: Estelle Delpech, Béatrice Daille, Emmanuel Morin, Claire Lemaire

    Abstract: This paper proposes a method for extracting translations of morphologically constructed terms from comparable corpora. The method is based on compositional translation and exploits translation equivalences at the morpheme-level, which allows for the generation of "fertile" translations (translation pairs in which the target term has more words than the source term). Ranking methods relying on corp… ▽ More

    Submitted 21 October, 2012; originally announced October 2012.

    Comments: arXiv admin note: substantial text overlap with arXiv:1209.2400

    Journal ref: COLING 2012, Mumbai : India (2012)

  14. arXiv:1209.2400  [pdf, ps, other

    cs.CL

    Identification of Fertile Translations in Medical Comparable Corpora: a Morpho-Compositional Approach

    Authors: Estelle Delpech, Béatrice Daille, Emmanuel Morin, Claire Lemaire

    Abstract: This paper defines a method for lexicon in the biomedical domain from comparable corpora. The method is based on compositional translation and exploits morpheme-level translation equivalences. It can generate translations for a large variety of morphologically constructed words and can also generate 'fertile' translations. We show that fertile translations increase the overall quality of the extra… ▽ More

    Submitted 11 September, 2012; originally announced September 2012.

    Journal ref: AMTA, San Diego, CA : United States (2012)

  15. arXiv:0909.3028  [pdf

    cs.CL

    Vers la reconnaissance de mini-messages manuscrits

    Authors: Emmanuel Prochasson, Emmanuel Morin, Christian Viard-Gaudin

    Abstract: Handwriting is an alternative method for entering texts which composed Short Message Services. However, a whole new language features the texts which are produced. They include for instance abbreviations and other consonantal writing which sprung up for time saving and fashion. We have collected and processed a significant number of such handwritten SMS, and used various strategies to tackle thi… ▽ More

    Submitted 16 September, 2009; originally announced September 2009.

    Journal ref: Colloque International sur le Lexique et la Grammaire, Bonifacio : France (2007)

  16. arXiv:0909.3027  [pdf

    cs.CL

    Language Models for Handwritten Short Message Services

    Authors: Emmanuel Ep Prochasson, Christian Viard-Gaudin, Emmanuel Morin

    Abstract: Handwriting is an alternative method for entering texts composing Short Message Services. However, a whole new language features the texts which are produced. They include for instance abbreviations and other consonantal writing which sprung up for time saving and fashion. We have collected and processed a significant number of such handwriting SMS, and used various strategies to tackle this cha… ▽ More

    Submitted 16 September, 2009; originally announced September 2009.

    Journal ref: International Conference on Document Analysis and Recognition, Brazil (2007)

  17. Restricted Complexity, General Complexity

    Authors: Edgar Morin

    Abstract: Why has the problematic of complexity appeared so late? And why would it be justified?

    Submitted 10 October, 2006; originally announced October 2006.

    Comments: 25 pages. Presented at the Colloquium "Intelligence de la complexit'e : 'epist'emologie et pragmatique", Cerisy-La-Salle, France, June 26th, 2005. Translated from French by Carlos Gershenson