Skip to main content

Showing 1–7 of 7 results for author: Siriwardhana, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14971  [pdf, other

    cs.CL cs.AI cs.LG

    Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

    Authors: Shamane Siriwardhana, Mark McQuade, Thomas Gauthier, Lucas Atkins, Fernando Fernandes Neto, Luke Meyers, Anneketh Vij, Tyler Odenthal, Charles Goddard, Mary MacCarthy, Jacob Solawetz

    Abstract: We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of int… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 8 pages, 6 figures

  2. arXiv:2403.13257  [pdf, other

    cs.CL cs.AI cs.LG

    Arcee's MergeKit: A Toolkit for Merging Large Language Models

    Authors: Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vlad Karpukhin, Brian Benedict, Mark McQuade, Jacob Solawetz

    Abstract: The rapid expansion of the open-source language model landscape presents an opportunity to merge the competencies of these model checkpoints by combining their parameters. Advances in transfer learning, the process of fine-tuning pretrained models for specific tasks, has resulted in the development of vast amounts of task-specific models, typically specialized in individual tasks and unable to uti… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures

  3. arXiv:2210.02627  [pdf, other

    cs.CL cs.IR

    Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering

    Authors: Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Tharindu Kaluarachchi, Rajib Rana, Suranga Nanayakkara

    Abstract: Retrieval Augment Generation (RAG) is a recent advancement in Open-Domain Question Answering (ODQA). RAG has only been trained and explored with a Wikipedia-based external knowledge base and is not optimized for use in other specialized domains such as healthcare and news. In this paper, we evaluate the impact of joint training of the retriever and generator components of RAG for the task of domai… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: This paper is awaiting publication at Transactions of the Association for Computational Linguistics. This is a pre-MIT Press publication version. For associated huggingface transformers code, see https://github.com/huggingface/transformers/tree/main/examples/research_projects/rag-end2end-retriever

  4. arXiv:2106.11517  [pdf, ps, other

    cs.IR cs.CL

    Fine-tune the Entire RAG Architecture (including DPR retriever) for Question-Answering

    Authors: Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Suranga Nanayakkara

    Abstract: In this paper, we illustrate how to fine-tune the entire Retrieval Augment Generation (RAG) architecture in an end-to-end manner. We highlighted the main engineering challenges that needed to be addressed to achieve this objective. We also compare how end-to-end RAG architecture outperforms the original RAG architecture for the task of question answering. We have open-sourced our implementation in… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: for associated code, see https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag-end2end-retriever

  5. arXiv:2008.06682  [pdf, other

    eess.AS cs.CL cs.SD

    Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition

    Authors: Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, Suranga Nanayakkara

    Abstract: Multimodal emotion recognition from speech is an important area in affective computing. Fusing multiple data modalities and learning representations with limited amounts of labeled data is a challenging task. In this paper, we explore the use of modality-specific "BERT-like" pretrained Self Supervised Learning (SSL) architectures to represent both speech and text modalities for the task of multimo… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

    Comments: Accepted to INTERSPEECH 2020

  6. arXiv:1908.06376  [pdf

    cs.LG cs.AI cs.NE stat.ML

    VUSFA:Variational Universal Successor Features Approximator to Improve Transfer DRL for Target Driven Visual Navigation

    Authors: Shamane Siriwardhana, Rivindu Weerasakera, Denys J. C. Matthies, Suranga Nanayakkara

    Abstract: In this paper, we show how novel transfer reinforcement learning techniques can be applied to the complex task of target driven navigation using the photorealistic AI2THOR simulator. Specifically, we build on the concept of Universal Successor Features with an A3C agent. We introduce the novel architectural contribution of a Successor Feature Dependant Policy (SFDP) and adopt the concept of Variat… ▽ More

    Submitted 18 August, 2019; originally announced August 2019.

  7. arXiv:1811.11312  [pdf, other

    cs.AI cs.LG

    Target Driven Visual Navigation with Hybrid Asynchronous Universal Successor Representations

    Authors: Shamane Siriwardhana, Rivindu Weerasekera, Suranga Nanayakkara

    Abstract: Being able to navigate to a target with minimal supervision and prior knowledge is critical to creating human-like assistive agents. Prior work on map-based and map-less approaches have limited generalizability. In this paper, we present a novel approach, Hybrid Asynchronous Universal Successor Representations (HAUSR), which overcomes the problem of generalizability to new goals by adapting recent… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

    Comments: Deep Reinforcement Learning Workshop, NeurIPS 2018