Skip to main content

Showing 1–50 of 82 results for author: Chandar, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05918  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Why Don't Prompt-Based Fairness Metrics Correlate?

    Authors: Abdelrahman Zayed, Goncalo Mordido, Ioana Baldini, Sarath Chandar

    Abstract: The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: In Proceedings of ACL main 2024

  2. arXiv:2406.04879  [pdf, other

    cs.CL

    A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

    Authors: Megh Thakkar, Quentin Fournier, Matthew D Riemer, Pin-Yu Chen, Amal Zouaq, Payel Das, Sarath Chandar

    Abstract: Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quant… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL (Main) 2024

  3. arXiv:2406.03686  [pdf, other

    cs.LG

    BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

    Authors: Artem Zholus, Maksim Kuznetsov, Roman Schutski, Rim Shayakhmetov, Daniil Polykovskiy, Sarath Chandar, Alex Zhavoronkov

    Abstract: Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding of the complex physical interactions between the molecule and its environment. In this paper, we present a novel generative model, BindGPT which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. Our mode… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  4. arXiv:2405.15895  [pdf, other

    cs.LG

    Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective

    Authors: Pranshu Malviya, Jerry Huang, Quentin Fournier, Sarath Chandar

    Abstract: The optimal model for a given task is often challenging to determine, requiring training multiple models from scratch which becomes prohibitive as dataset and model sizes grow. A more efficient alternative is to reuse smaller pre-trained models by expanding them, however, this is not widely adopted as how this impacts training dynamics remains poorly understood. While prior works have introduced s… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  5. arXiv:2405.05386  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Interpretability Needs a New Paradigm

    Authors: Andreas Madsen, Himabindu Lakkaraju, Siva Reddy, Sarath Chandar

    Abstract: Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and the post-hoc paradigm, which believes that black-box models can be explained. At the core of this debate is how each paradigm ensures its explanations… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  6. arXiv:2405.02749  [pdf, other

    cs.LG

    Sub-goal Distillation: A Method to Improve Small Language Agents

    Authors: Maryam Hashemzadeh, Elias Stengel-Eskin, Sarath Chandar, Marc-Alexandre Cote

    Abstract: While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferr… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  7. arXiv:2405.01684  [pdf, other

    cs.LG cs.AI

    Intelligent Switching for Reset-Free RL

    Authors: Darshan Patil, Janarthanan Rajendran, Glen Berseth, Sarath Chandar

    Abstract: In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable. The \textit{resetting} assumption limits the potential of reinforcement learning in the real world, as providing resets to an agent usually requires the creation of additional handcrafted mechanisms or human interventions. Recent work aims to train agents (\textit{forward}) wit… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Published at ICLR 2024

  8. arXiv:2404.09339  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Practical Tool Usage for Continually Learning LLMs

    Authors: Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar

    Abstract: Large language models (LLMs) show an innate skill for solving language based tasks. But insights have suggested an inability to adjust for information or task-solving skills becoming outdated, as their knowledge, stored directly within their parameters, remains static in time. Tool use helps by offloading work to systems that the LLM can access through an interface, but LLMs that use them still mu… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 20 pages, 11 tables, 7 figures

  9. arXiv:2403.04253  [pdf, other

    cs.LG

    Mastering Memory Tasks with World Models

    Authors: Mohammad Reza Samsami, Artem Zholus, Janarthanan Rajendran, Sarath Chandar

    Abstract: Current model-based reinforcement learning (MBRL) agents struggle with long-term dependencies. This limits their ability to effectively solve tasks involving extended time gaps between actions and outcomes, or tasks demanding the recalling of distant observations to inform current actions. To improve temporal coherence, we integrate a new family of state space models (SSMs) in world models of MBRL… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at The International Conference on Learning Representations 2024

  10. arXiv:2401.07927  [pdf, other

    cs.CL cs.AI cs.LG

    Are self-explanations from Large Language Models faithful?

    Authors: Andreas Madsen, Sarath Chandar, Siva Reddy

    Abstract: Instruction-tuned Large Language Models (LLMs) excel at many tasks and will even explain their reasoning, so-called self-explanations. However, convincing and wrong self-explanations can lead to unsupported confidence in LLMs, thus increasing risk. Therefore, it's important to measure if self-explanations truly reflect the model's behavior. Such a measure is called interpretability-faithfulness an… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: The 62nd Annual Meeting of the Association for Computational Linguistics

  11. arXiv:2312.15398  [pdf, other

    cs.CL cs.CY cs.LG

    Fairness-Aware Structured Pruning in Transformers

    Authors: Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Ioana Baldini, Sarath Chandar

    Abstract: The increasing size of large language models (LLMs) has introduced challenges in their training and inference. Removing model components is perceived as a solution to tackle the large model sizes, however, existing pruning methods solely focus on performance, without considering an essential aspect for the responsible use of LLMs: model fairness. It is crucial to address the fairness of LLMs towar… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: In Proceedings of AAAI 2024

  12. arXiv:2311.07687  [pdf, other

    cs.CL cs.AI cs.LG

    Language Model-In-The-Loop: Data Optimal Approach to Learn-To-Recommend Actions in Text Games

    Authors: Arjun Vaithilingam Sudhakar, Prasanna Parthasarathi, Janarthanan Rajendran, Sarath Chandar

    Abstract: Large Language Models (LLMs) have demonstrated superior performance in language understanding benchmarks. CALM, a popular approach, leverages linguistic priors of LLMs -- GPT-2 -- for action candidate recommendations to improve the performance in text games in Jericho without environment-provided actions. However, CALM adapts GPT-2 with annotated human gameplays and keeps the LLM fixed during the… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  13. arXiv:2311.00913  [pdf, other

    cs.CL

    Self-Influence Guided Data Reweighting for Language Model Pre-training

    Authors: Megh Thakkar, Tolga Bolukbasi, Sriram Ganapathy, Shikhar Vashishth, Sarath Chandar, Partha Talukdar

    Abstract: Language Models (LMs) pre-trained with self-supervision on large text corpora have become the default starting point for develo** models for various NLP tasks. Once the pre-training corpus has been assembled, all data samples in the corpus are treated with equal importance during LM pre-training. However, due to varying levels of relevance and quality of data, equal importance to all the data sa… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023

  14. arXiv:2310.15372  [pdf, other

    cs.CL cs.AI

    EpiK-Eval: Evaluation for Language Models as Epistemic Models

    Authors: Gabriele Prato, Jerry Huang, Prasannna Parthasarathi, Shagun Sodhani, Sarath Chandar

    Abstract: In the age of artificial intelligence, the role of large language models (LLMs) is becoming increasingly central. Despite their growing prevalence, their capacity to consolidate knowledge from different training documents - a crucial ability in numerous applications - remains unexplored. This paper presents the first study examining the capability of LLMs to effectively combine such information wi… ▽ More

    Submitted 22 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  15. arXiv:2310.07819  [pdf, other

    cs.CL cs.LG

    Faithfulness Measurable Masked Language Models

    Authors: Andreas Madsen, Siva Reddy, Sarath Chandar

    Abstract: A common approach to explaining NLP models is to use importance measures that express which tokens are important for a prediction. Unfortunately, such explanations are often wrong despite being persuasive. Therefore, it is essential to measure their faithfulness. One such metric is if tokens are truly important, then masking them should result in worse model performance. However, token masking int… ▽ More

    Submitted 9 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  16. arXiv:2308.10284  [pdf, other

    cs.LG cs.AI cs.MA

    Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi

    Authors: Hadi Nekoei, Xutong Zhao, Janarthanan Rajendran, Miao Liu, Sarath Chandar

    Abstract: Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Zero-Shot Coordination (ZSC) have gained significant attention in recent years. ZSC refers to the ability of agents to coordinate zero-shot (without additional interaction experience) with independently trained agents. While ZSC is crucial for cooperative MARL agents, it might not be possible for complex tasks and changing envir… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  17. arXiv:2307.16704  [pdf, other

    cs.LG cs.AI

    Lookbehind-SAM: k steps back, 1 step forward

    Authors: Gonçalo Mordido, Pranshu Malviya, Aristide Baratin, Sarath Chandar

    Abstract: Sharpness-aware minimization (SAM) methods have gained increasing popularity by formulating the problem of minimizing both loss value and loss sharpness as a minimax objective. In this work, we increase the efficiency of the maximization and minimization parts of SAM's objective to achieve a better loss-sharpness trade-off. By taking inspiration from the Lookahead optimizer, which uses multiple de… ▽ More

    Submitted 16 May, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: ICML 2024

  18. arXiv:2307.09638  [pdf, other

    cs.LG cs.AI

    Promoting Exploration in Memory-Augmented Adam using Critical Momenta

    Authors: Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar

    Abstract: Adaptive gradient-based optimizers, notably Adam, have left their mark in training large-scale deep learning models, offering fast convergence and robustness to hyperparameter settings. However, they often struggle with generalization, attributed to their tendency to converge to sharp minima in the loss landscape. To address this, we propose a new memory-augmented version of Adam that encourages e… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Published in Transactions on Machine Learning Research

  19. arXiv:2306.17693  [pdf, other

    cs.LG

    Thompson sampling for improved exploration in GFlowNets

    Authors: Jarrid Rector-Brooks, Kanika Madan, Moksh Jain, Maksym Korablyov, Cheng-Hao Liu, Sarath Chandar, Nikolay Malkin, Yoshua Bengio

    Abstract: Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering mod… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: Structured Probabilistic Inference and Generative Modeling (SPIGM) workshop @ ICML 2023

  20. arXiv:2305.14775  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models

    Authors: Amirhossein Kazemnejad, Mehdi Rezagholizadeh, Prasanna Parthasarathi, Sarath Chandar

    Abstract: While pre-trained language models (PLMs) have shown evidence of acquiring vast amounts of knowledge, it remains unclear how much of this parametric knowledge is actually usable in performing downstream tasks. We propose a systematic framework to measure parametric knowledge utilization in PLMs. Our framework first extracts knowledge from a PLM's parameters and subsequently constructs a downstream… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  21. arXiv:2305.13088  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Should We Attend More or Less? Modulating Attention for Fairness

    Authors: Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Sarath Chandar

    Abstract: The abundance of annotated data in natural language processing (NLP) poses both opportunities and challenges. While it enables the development of high-performing models for a variety of tasks, it also poses the risk of models learning harmful biases from the data, such as gender stereotypes. In this work, we investigate the role of attention, a widely-used technique in current state-of-the-art NLP… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  22. On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics -- Empirical Study on Brown Build and Risk Prediction

    Authors: Doriane Olewicki, Sarra Habchi, Mathieu Nayrolles, Mojtaba Faramarzi, Sarath Chandar, Bram Adams

    Abstract: Nowadays, software analytics tools using machine learning (ML) models to, for example, predict the risk of a code change are well established. However, as the goals of a project shift over time, and developers and their habits change, the performance of said models tends to degrade (drift) over time. Current retraining practices typically require retraining a new model from scratch on a large upda… ▽ More

    Submitted 12 February, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

    Journal ref: 46th International Conference on Software Engineering: Software Engineering in Practice 2024

  23. arXiv:2303.09032  [pdf, other

    cs.LG cs.MA

    Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

    Authors: Xutong Zhao, Yangchen Pan, Chenjun Xiao, Sarath Chandar, Janarthanan Rajendran

    Abstract: Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism esti… ▽ More

    Submitted 13 July, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted at UAI 2023

  24. arXiv:2303.08690  [pdf, other

    cs.LG cs.AI

    Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep Model-Based Reinforcement Learning

    Authors: Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Harm van Seijen, Sarath Chandar

    Abstract: One of the key behavioral characteristics used in neuroscience to determine whether the subject of study -- be it a rodent or a human -- exhibits model-based learning is effective adaptation to local changes in the environment, a particular form of adaptivity that is the focus of this work. In reinforcement learning, however, recent work has shown that modern deep model-based reinforcement-learnin… ▽ More

    Submitted 27 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  25. arXiv:2302.02792  [pdf, other

    cs.LG

    Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

    Authors: Hadi Nekoei, Akilesh Badrinaaraayanan, Amit Sinha, Mohammad Amini, Janarthanan Rajendran, Aditya Mahajan, Sarath Chandar

    Abstract: Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme… ▽ More

    Submitted 17 August, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  26. arXiv:2211.14449  [pdf, other

    cs.CV cs.AI cs.LG

    PatchBlender: A Motion Prior for Video Transformers

    Authors: Gabriele Prato, Yale Song, Janarthanan Rajendran, R Devon Hjelm, Neel Joshi, Sarath Chandar

    Abstract: Transformers have become one of the dominant architectures in the field of computer vision. However, there are yet several challenges when applying such architectures to video data. Most notably, these models struggle to model the temporal patterns of video data effectively. Directly targeting this issue, we introduce PatchBlender, a learnable blending function that operates over patch embeddings… ▽ More

    Submitted 10 February, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

  27. arXiv:2211.11561  [pdf, other

    cs.LG cs.AI

    SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for Improving DNN Generalization and Robustness

    Authors: Gonçalo Mordido, Sébastien Henwood, Sarath Chandar, François Leduc-Primeau

    Abstract: Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade DNN performance at inference time. To mitigate such degradation, existing methods typically add perturbations to the DNN weights during training to simulate inference on noisy hardware. However, this often requires knowledge about the target hardware and leads to a trade-off between DNN performance and… ▽ More

    Submitted 21 March, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: Preprint

  28. arXiv:2211.11109  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness

    Authors: Abdelrahman Zayed, Prasanna Parthasarathi, Goncalo Mordido, Hamid Palangi, Samira Shabanian, Sarath Chandar

    Abstract: Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes, which raises equity concerns. Prediction models may discover, use, or amplify spurious correlations based on gender or other protected personal characteristics, thus discriminating against marginalized groups. Mitigating gender bias has become an important research focus in natural l… ▽ More

    Submitted 24 November, 2022; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: In Proceedings of AAAI 2023

  29. arXiv:2211.05025  [pdf, other

    cs.CL

    Local Structure Matters Most in Most Languages

    Authors: Louis Clouâtre, Prasanna Parthasarathi, Amal Zouaq, Sarath Chandar

    Abstract: Many recent perturbation studies have found unintuitive results on what does and does not matter when performing Natural Language Understanding (NLU) tasks in English. Coding properties, such as the order of words, can often be removed through shuffling without impacting downstream performances. Such insight may be used to direct future research into English NLP models. As many improvements in mul… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  30. arXiv:2211.05015  [pdf, other

    cs.CL

    Detecting Languages Unintelligible to Multilingual Models through Local Structure Probes

    Authors: Louis Clouâtre, Prasanna Parthasarathi, Amal Zouaq, Sarath Chandar

    Abstract: Providing better language tools for low-resource and endangered languages is imperative for equitable growth. Recent progress with massively multilingual pretrained models has proven surprisingly effective at performing zero-shot transfer to a wide variety of languages. However, this transfer is not universal, with many languages not currently understood by multilingual approaches. It is estimated… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  31. arXiv:2210.15091  [pdf, other

    cs.CV cs.LG

    Segmentation of Multiple Sclerosis Lesions across Hospitals: Learn Continually or Train from Scratch?

    Authors: Enamundram Naga Karthik, Anne Kerbrat, Pierre Labauge, Tobias Granberg, Jason Talbott, Daniel S. Reich, Massimo Filippi, Rohit Bakshi, Virginie Callot, Sarath Chandar, Julien Cohen-Adad

    Abstract: Segmentation of Multiple Sclerosis (MS) lesions is a challenging problem. Several deep-learning-based methods have been proposed in recent years. However, most methods tend to be static, that is, a single model trained on a large, specialized dataset, which does not generalize well. Instead, the model should learn across datasets arriving sequentially from different hospitals by building upon the… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted at the Medical Imaging Meets NeurIPS (MedNeurIPS) Workshop 2022

  32. arXiv:2208.02377  [pdf, other

    cs.LG cs.AI stat.ML

    Improving Meta-Learning Generalization with Activation-Based Early-Stop**

    Authors: Simon Guiroy, Christopher Pal, Gonçalo Mordido, Sarath Chandar

    Abstract: Meta-Learning algorithms for few-shot learning aim to train neural networks capable of generalizing to novel tasks using only a few examples. Early-stop** is critical for performance, halting model training when it reaches optimal generalization to the new task distribution. Early-stop** mechanisms in Meta-Learning typically rely on measuring the model performance on labeled examples from a me… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: Accepted at CoLLAs 2022. To be published in Proceedings of Machine Learning Research (PMLR)

  33. arXiv:2207.04354  [pdf, other

    cs.LG cs.AI

    An Introduction to Lifelong Supervised Learning

    Authors: Shagun Sodhani, Mojtaba Faramarzi, Sanket Vaibhav Mehta, Pranshu Malviya, Mohamed Abdelsalam, Janarthanan Janarthanan, Sarath Chandar

    Abstract: This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the des… ▽ More

    Submitted 12 July, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

    Comments: Lifelong Learning Primer

  34. arXiv:2204.11464  [pdf, other

    cs.LG cs.AI

    Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

    Authors: Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm van Seijen

    Abstract: In recent years, a growing number of deep model-based reinforcement learning (RL) methods have been introduced. The interest in deep model-based RL is not surprising, given its many potential benefits, such as higher sample efficiency and the potential for fast adaption to changes in the environment. However, we demonstrate, using an improved version of the recently introduced Local Change Adaptat… ▽ More

    Submitted 25 June, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

  35. arXiv:2202.00710  [pdf, other

    cs.AI cs.LG

    Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers

    Authors: Amir Ardalan Kalantari, Mohammad Amini, Sarath Chandar, Doina Precup

    Abstract: Much of recent Deep Reinforcement Learning success is owed to the neural architecture's potential to learn and use effective internal representations of the world. While many current algorithms access a simulator to train with a large amount of data, in realistic settings, including while playing games that may be played against people, collecting experience can be quite costly. In this paper, we… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

  36. arXiv:2112.09153  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    An Empirical Investigation of the Role of Pre-training in Lifelong Learning

    Authors: Sanket Vaibhav Mehta, Darshan Patil, Sarath Chandar, Emma Strubell

    Abstract: The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-t… ▽ More

    Submitted 29 August, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

    Journal ref: Journal of Machine Learning Research 24 (2023) 1-50

  37. arXiv:2110.06990  [pdf, other

    cs.LG cs.AI cs.CV

    Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

    Authors: Gabriele Prato, Simon Guiroy, Ethan Caballero, Irina Rish, Sarath Chandar

    Abstract: Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance with increasing resources such as data, compute and model size provides a more comprehensive e… ▽ More

    Submitted 18 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

  38. arXiv:2108.05670  [pdf, other

    cs.LG cs.AI cs.DC

    Communication Optimization in Large Scale Federated Learning using Autoencoder Compressed Weight Updates

    Authors: Srikanth Chandar, Pravin Chandran, Raghavendra Bhat, Avinash Chakravarthi

    Abstract: Federated Learning (FL) solves many of this decade's concerns regarding data privacy and computation challenges. FL ensures no data leaves its source as the model is trained at where the data resides. However, FL comes with its own set of challenges. The communication of model weight updates in this distributed environment comes with significant network bandwidth costs. In this context, we propose… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: 7 pages, 11 figures, International Workshop on Federated and Transfer Learning for Data Sparsity and Confidentiality in Conjunction with IJCAI 2021 (FTL-IJCAI'21)

    Report number: Paper 14

  39. arXiv:2108.04840  [pdf, other

    cs.CL cs.LG cs.NE

    Post-hoc Interpretability for Neural NLP: A Survey

    Authors: Andreas Madsen, Siva Reddy, Sarath Chandar

    Abstract: Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations a… ▽ More

    Submitted 28 November, 2023; v1 submitted 10 August, 2021; originally announced August 2021.

    Journal ref: ACM Comput. Surv. 55, 8, Article 155 (December 2022)

  40. arXiv:2107.13955  [pdf, other

    cs.CL cs.AI

    Local Structure Matters Most: Perturbation Study in NLU

    Authors: Louis Clouatre, Prasanna Parthasarathi, Amal Zouaq, Sarath Chandar

    Abstract: Recent research analyzing the sensitivity of natural language understanding models to word-order perturbations has shown that neural models are surprisingly insensitive to the order of words. In this paper, we investigate this phenomenon by develo** order-altering perturbations on the order of words, subwords, and characters to analyze their effect on neural models' performance on language under… ▽ More

    Submitted 31 March, 2022; v1 submitted 29 July, 2021; originally announced July 2021.

    Comments: 11 pages, 13 figure + appendix

  41. arXiv:2106.14503  [pdf, other

    cs.LG cs.DC

    Weight Divergence Driven Divide-and-Conquer Approach for Optimal Federated Learning from non-IID Data

    Authors: Pravin Chandran, Raghavendra Bhat, Avinash Chakravarthi, Srikanth Chandar

    Abstract: Federated Learning allows training of data stored in distributed devices without the need for centralizing training data, thereby maintaining data privacy. Addressing the ability to handle data heterogeneity (non-identical and independent distribution or non-IID) is a key enabler for the wider deployment of Federated Learning. In this paper, we propose a novel Divide-and-Conquer training methodolo… ▽ More

    Submitted 29 June, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

  42. arXiv:2106.14213  [pdf, other

    cs.LG cs.AI cs.IR

    AI based Presentation Creator With Customized Audio Content Delivery

    Authors: Muvazima Mansoor, Srikanth Chandar, Ramamoorthy Srinath

    Abstract: In this paper, we propose an architecture to solve a novel problem statement that has stemmed more so in recent times with an increase in demand for virtual content delivery due to the COVID-19 pandemic. All educational institutions, workplaces, research centers, etc. are trying to bridge the gap of communication during these socially distanced times with the use of online content delivery. The tr… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  43. arXiv:2106.10708  [pdf, other

    cs.LG math.OC

    Memory Augmented Optimizers for Deep Learning

    Authors: Paul-Aymeric McRae, Prasanna Parthasarathi, Mahmoud Assran, Sarath Chandar

    Abstract: Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter updates in the right direction even when the gradients at any given step are not informative. Although the history of gradients summarized in meta-parameters or expl… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: 24 Pages. Currently under review

  44. arXiv:2106.10622  [pdf, other

    cs.CL

    Do Encoder Representations of Generative Dialogue Models Encode Sufficient Information about the Task ?

    Authors: Prasanna Parthasarathi, Joelle Pineau, Sarath Chandar

    Abstract: Predicting the next utterance in dialogue is contingent on encoding of users' input text to generate appropriate and relevant response in data-driven approaches. Although the semantic and syntactic quality of the language generated is evaluated, more often than not, the encoded representation of input is not evaluated. As the representation of the encoder is essential for predicting the appropriat… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted at SIGDial 2021. arXiv admin note: substantial text overlap with arXiv:2008.10427

  45. arXiv:2106.10619  [pdf, other

    cs.CL

    A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic loss

    Authors: Prasanna Parthasarathi, Mohamed Abdelsalam, Joelle Pineau, Sarath Chandar

    Abstract: Neural models trained for next utterance generation in dialogue task learn to mimic the n-gram sequences in the training set with training objectives like negative log-likelihood (NLL) or cross-entropy. Such commonly used training objectives do not foster generating alternate responses to a context. But, the effects of minimizing an alternate training objective that fosters a model to generate alt… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted at SIGDial 2021

  46. arXiv:2105.05155  [pdf, other

    cs.LG

    TAG: Task-based Accumulated Gradients for Lifelong learning

    Authors: Pranshu Malviya, Balaraman Ravindran, Sarath Chandar

    Abstract: When an agent encounters a continual stream of new tasks in the lifelong learning setting, it leverages the knowledge it gained from the earlier tasks to help learn the new tasks better. In such a scenario, identifying an efficient knowledge representation becomes a challenging problem. Most research works propose to either store a subset of examples from the past tasks in a replay buffer, dedicat… ▽ More

    Submitted 29 August, 2022; v1 submitted 11 May, 2021; originally announced May 2021.

    Comments: Published at 1st Conference on Lifelong Learning Agents, 2022

  47. arXiv:2105.03075  [pdf, other

    cs.CL cs.AI cs.LG

    A Survey of Data Augmentation Approaches for NLP

    Authors: Steven Y. Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, Eduard Hovy

    Abstract: Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data. Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data. In this paper, we present a comprehensi… ▽ More

    Submitted 1 December, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

    Comments: Accepted to ACL 2021 Findings. GitHub repo with paper list at https://github.com/styfeng/DataAug4NLP ; Talk at https://www.youtube.com/watch?v=kNBVesKUZCk&ab_channel=StevenFeng ; Podcast at https://www.youtube.com/watch?v=qmqyT_97Poc&ab_channel=GradientFlow and https://thedataexchange.media/data-augmentation-in-natural-language-processing

  48. arXiv:2103.03216  [pdf, other

    cs.LG cs.AI cs.MA

    Continuous Coordination As a Realistic Scenario for Lifelong Learning

    Authors: Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville, Sarath Chandar

    Abstract: Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of L… ▽ More

    Submitted 14 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: 19 pages with supplementary materials. Added results for Lifelong RL methods and some future work. Accepted to ICML 2021

  49. arXiv:2012.12477  [pdf, other

    cs.CV cs.AI cs.LG

    IIRC: Incremental Implicitly-Refined Classification

    Authors: Mohamed Abdelsalam, Mojtaba Faramarzi, Shagun Sodhani, Sarath Chandar

    Abstract: We introduce the "Incremental Implicitly-Refined Classi-fication (IIRC)" setup, an extension to the class incremental learning setup where the incoming batches of classes have two granularity levels. i.e., each sample could have a high-level (coarse) label like "bear" and a low-level (fine) label like "polar bear". Only one label is provided at a time, and the model has to figure out the other lab… ▽ More

    Submitted 11 January, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

  50. arXiv:2010.13190  [pdf, other

    cs.LG cs.NI stat.AP

    Machine Learning Based Network Coverage Guidance System

    Authors: Srikanth Chandar, Muvazima Mansoor, Mohina Ahmadi, Hrishikesh Badve, Deepesh Sahoo, Bharath Katragadda

    Abstract: With the advent of 4G, there has been a huge consumption of data and the availability of mobile networks has become paramount. Also, with the burst of network traffic based on user consumption, data availability and network anomalies have increased substantially. In this paper, we introduce a novel approach, to identify the regions that have poor network connectivity thereby providing feedback to… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Comments: 5 pages, 3 figures, Submitted to ITNAC IEEE 2020, Melbourne