Skip to main content

Showing 1–15 of 15 results for author: Ravaut, M

.
  1. arXiv:2404.00699  [pdf, other

    cs.CL

    How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library

    Authors: Mathieu Ravaut, Bosheng Ding, Fangkai Jiao, Hailin Chen, Xingxuan Li, Ruochen Zhao, Chengwei Qin, Caiming Xiong, Shafiq Joty

    Abstract: With the rise of Large Language Models (LLMs) in recent years, new opportunities are emerging, but also new challenges, and contamination is quickly becoming critical. Business applications and fundraising in AI have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars, placing high pressure on model int… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 10 pages, 1 figure, 3 tables

  2. arXiv:2401.17919  [pdf, other

    cs.CL cs.LG

    LOCOST: State-Space Models for Long Document Abstractive Summarization

    Authors: Florian Le Bronnec, Song Duong, Mathieu Ravaut, Alexandre Allauzen, Nancy F. Chen, Vincent Guigue, Alberto Lumbreras, Laure Soulier, Patrick Gallinari

    Abstract: State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-a… ▽ More

    Submitted 25 March, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: 9 pages, 5 figures, 7 tables, EACL 2024 conference

  3. arXiv:2401.14194  [pdf, other

    cs.CL

    Parameter-Efficient Conversational Recommender System as a Language Processing Task

    Authors: Mathieu Ravaut, Hao Zhang, Lu Xu, Aixin Sun, Yong Liu

    Abstract: Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation. Prior work often utilizes external knowledge graphs for items' semantic information, a language model for dialogue generation, and a recommendation module for ranking relevant items. This combination of multiple components suffers from a cumbersome t… ▽ More

    Submitted 24 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 9 pages, 4 figures, 8 tables, EACL 2024 conference, fixed typo

  4. arXiv:2311.16989  [pdf, other

    cs.CL

    ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?

    Authors: Hailin Chen, Fangkai Jiao, Xingxuan Li, Chengwei Qin, Mathieu Ravaut, Ruochen Zhao, Caiming Xiong, Shafiq Joty

    Abstract: Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce. Through instruction-tuning a large language model (LLM) with supervised fine-tuning and reinforcement learning from human feedback, it showed that a model could answer human questions and follow instructions on a broad panel of tasks. Following this success, interests in… ▽ More

    Submitted 15 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: version v4, included latest top-performing open-sourced LLMs

  5. arXiv:2310.10570  [pdf, other

    cs.CL

    On Context Utilization in Summarization with Large Language Models

    Authors: Mathieu Ravaut, Aixin Sun, Nancy F. Chen, Shafiq Joty

    Abstract: Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. However, in question answering, language models exhibit uneven utilization of their input context. They tend to favor the initial and final segments, resulting in a U-shaped perfo… ▽ More

    Submitted 14 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: ACL 2024. 9 pages, 7 figures, 3 tables

  6. arXiv:2308.03117  [pdf, other

    cs.CL

    PromptSum: Parameter-Efficient Controllable Abstractive Summarization

    Authors: Mathieu Ravaut, Hailin Chen, Ruochen Zhao, Chengwei Qin, Shafiq Joty, Nancy Chen

    Abstract: Prompt tuning (PT), a parameter-efficient technique that only tunes the additional prompt embeddings while kee** the backbone pre-trained language model (PLM) frozen, has shown promising results in language understanding tasks, especially in low-resource scenarios. However, effective prompt design methods suitable for generation tasks such as summarization are still lacking. At the same time, su… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  7. A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

    Authors: Iva Bojic, Josef Halim, Verena Suharman, Sreeja Tar, Qi Chwen Ong, Duy Phung, Mathieu Ravaut, Shafiq Joty, Josip Car

    Abstract: Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. High-quality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their… ▽ More

    Submitted 26 May, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Journal ref: 2023.In The Fourth Workshop on Insights from Negative Results in NLP, pages 19-32, Dubrovnik, Croatia. Association for Computational Linguistics

  8. arXiv:2212.09593  [pdf, other

    cs.CL

    Unsupervised Summarization Re-ranking

    Authors: Mathieu Ravaut, Shafiq Joty, Nancy Chen

    Abstract: With the rise of task-specific pre-training objectives, abstractive summarization models like PEGASUS offer appealing zero-shot performance on downstream summarization tasks. However, the performance of such unsupervised models still lags significantly behind their supervised counterparts. Similarly to the supervised setup, we notice a very high variance in quality among summary candidates from th… ▽ More

    Submitted 26 May, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: 9 pages, 1 figure, 10 tables, 23 appendix pages, ACL Findings 2023

  9. arXiv:2210.08779  [pdf, other

    cs.CL

    Towards Summary Candidates Fusion

    Authors: Mathieu Ravaut, Shafiq Joty, Nancy F. Chen

    Abstract: Sequence-to-sequence deep neural models fine-tuned for abstractive summarization can achieve great performance on datasets with enough human annotations. Yet, it has been shown that they have not reached their full potential, with a wide gap between the top beam search output and the oracle beam. Recently, re-ranking methods have been proposed, to learn to select a better summary candidate. Howeve… ▽ More

    Submitted 26 May, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: 4 Figures, 9 Tables, EMNLP 2022

  10. arXiv:2203.06569  [pdf, other

    cs.CL

    SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization

    Authors: Mathieu Ravaut, Shafiq Joty, Nancy F. Chen

    Abstract: Sequence-to-sequence neural networks have recently achieved great success in abstractive summarization, especially through fine-tuning large pre-trained language models on the downstream dataset. These models are typically decoded with beam search to generate a unique summary. However, the search space is very large, and with the exposure bias, such decoding is not optimal. In this paper, we show… ▽ More

    Submitted 26 May, 2023; v1 submitted 13 March, 2022; originally announced March 2022.

    Comments: 9 pages, 6 figures, 6 tables, 9 appendix pages, ACL 2022

  11. arXiv:1904.04137  [pdf, other

    stat.AP cs.LG

    Diabetes Mellitus Forecasting Using Population Health Data in Ontario, Canada

    Authors: Mathieu Ravaut, Hamed Sadeghi, Kin Kwan Leung, Maksims Volkovs, Laura C. Rosella

    Abstract: Leveraging health administrative data (HAD) datasets for predicting the risk of chronic diseases including diabetes has gained a lot of attention in the machine learning community recently. In this paper, we use the largest health records datasets of patients in Ontario,Canada. Provided by the Institute of Clinical Evaluative Sciences (ICES), this database is age, gender and ethnicity-diverse. The… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

    Comments: 18 pages, 3 figures, 8 Tables, Submitted to 2019 ML for Healthcare conference

  12. arXiv:1805.02788  [pdf, other

    stat.ML cs.LG

    ReGAN: RE[LAX|BAR|INFORCE] based Sequence Generation using GANs

    Authors: Aparna Balagopalan, Satya Gorti, Mathieu Ravaut, Raeid Saqur

    Abstract: Generative Adversarial Networks (GANs) have seen steep ascension to the peak of ML research zeitgeist in recent years. Mostly catalyzed by its success in the domain of image generation, the technique has seen wide range of adoption in a variety of other problem domains. Although GANs have had a lot of success in producing more realistic images than other approaches, they have only seen limited use… ▽ More

    Submitted 7 May, 2018; originally announced May 2018.

  13. arXiv:1801.09136  [pdf, other

    stat.ML

    Gradient descent revisited via an adaptive online learning rate

    Authors: Mathieu Ravaut, Satya Gorti

    Abstract: Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the gradient descent algorithm in the which the learning rate is not fixed. Instead, we learn the learning rate itself, either by another gradient descent (first-orde… ▽ More

    Submitted 8 April, 2018; v1 submitted 27 January, 2018; originally announced January 2018.

  14. arXiv:1706.05461  [pdf, other

    cs.CV

    Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

    Authors: Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Yuan Fang, Seokhwan Kim, Nancy Chen, Luis Fernando D'Haro, Luu Anh Tuan, Hongyuan Zhu, Zeng Zeng, Ngai Man Cheung, Georgios Piliouras, Jie Lin, Vijay Chandrasekhar

    Abstract: The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, a… ▽ More

    Submitted 9 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: 8 pages, Accepted to CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding

  15. arXiv:1705.09435  [pdf, other

    cs.CV

    Deep Learning for Lung Cancer Detection: Tackling the Kaggle Data Science Bowl 2017 Challenge

    Authors: Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Huiling Chen, Jie Lin, Babar Nazir, Cen Chen, Tse Chiang Howe, Zeng Zeng, Vijay Chandrasekhar

    Abstract: We present a deep learning framework for computer-aided lung cancer diagnosis. Our multi-stage framework detects nodules in 3D lung CAT scans, determines if each nodule is malignant, and finally assigns a cancer probability based on these results. We discuss the challenges and advantages of our framework. In the Kaggle Data Science Bowl 2017, our framework ranked 41st out of 1972 teams.

    Submitted 26 May, 2017; originally announced May 2017.