Skip to main content

Showing 1–2 of 2 results for author: Marmet, A

.
  1. arXiv:2311.16079  [pdf, other

    cs.CL cs.AI cs.LG

    MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

    Authors: Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

    Abstract: Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by rele… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  2. arXiv:2211.13331  [pdf, other

    cs.CL cs.LG

    Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of Modulated Cross-Entropy in Natural Language Inference

    Authors: Frano Rajič, Ivan Stresec, Axel Marmet, Tim Poštuvan

    Abstract: There is no such thing as a perfect dataset. In some datasets, deep neural networks discover underlying heuristics that allow them to take shortcuts in the learning process, resulting in poor generalization capability. Instead of using standard cross-entropy, we explore whether a modulated version of cross-entropy called focal loss can constrain the model so as not to use heuristics and improve ge… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Code available at https://github.com/m43/focal-loss-against-heuristics