Skip to main content

Showing 1–50 of 65 results for author: Rish, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00153  [pdf, other

    cs.LG

    $μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

    Authors: Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev, Edouard Oyallon, Irina Rish, Eugene Belilovsky

    Abstract: Learned optimizers (LOs) can significantly reduce the wall-clock training time of neural networks, substantially reducing training costs. However, they often suffer from poor meta-generalization, especially when training networks larger than those seen during meta-training. To address this, we use the recently proposed Maximal Update Parametrization ($μ$P), which allows zero-shot generalization of… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  2. arXiv:2404.07377  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.IT

    Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI

    Authors: Sahil Garg, Anderson Schneider, Anant Raj, Kashif Rasul, Yuriy Nevmyvaka, Sneihil Gopal, Amit Dhurandhar, Guillermo Cecchi, Irina Rish

    Abstract: Building on the remarkable achievements in generative sampling of natural images, we propose an innovative challenge, potentially overly ambitious, which involves generating samples of entire multivariate time series that resemble images. However, the statistical challenge lies in the small sample size, sometimes consisting of a few hundred subjects. This issue is especially problematic for deep g… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  3. arXiv:2403.08763  [pdf, other

    cs.LG cs.AI cs.CL

    Simple and Scalable Strategies to Continually Pre-train Large Language Models

    Authors: Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

    Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptati… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  4. arXiv:2402.13368  [pdf, other

    cs.LG cs.CV

    Unsupervised Concept Discovery Mitigates Spurious Correlations

    Authors: Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi

    Abstract: Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases. Addressing this challenge typically involves methods relying on prior knowledge and group annotation to remove spurious correlations, which may not be readily available in many applications. In this paper, we establish a novel connection between unsupervised object-centric lear… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  5. arXiv:2312.12868  [pdf, other

    cs.AI q-bio.NC

    Towards Machines that Trust: AI Agents Learn to Trust in the Trust Game

    Authors: Ardavan S. Nobandegani, Irina Rish, Thomas R. Shultz

    Abstract: Widely considered a cornerstone of human morality, trust shapes many aspects of human social interactions. In this work, we present a theoretical analysis of the $\textit{trust game}$, the canonical task for studying trust in behavioral and brain sciences, along with simulation results supporting our analysis. Specifically, leveraging reinforcement learning (RL) to train our AI agents, we systemat… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  6. arXiv:2310.08278  [pdf, other

    cs.LG cs.AI

    Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

    Authors: Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, Irina Rish

    Abstract: Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a… ▽ More

    Submitted 8 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: First two authors contributed equally. All data, models and code used are open-source. GitHub: https://github.com/time-series-foundation-models/lag-llama

  7. arXiv:2309.14021  [pdf, other

    cs.CL cs.AI

    LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

    Authors: Ayush Kaushal, Tejas Vaidhya, Irina Rish

    Abstract: Low Rank Decomposition of matrix - splitting a large matrix into a product of two smaller matrix offers a means for compression that reduces the parameters of a model without sparsification, and hence delivering more speedup on modern hardware. Moreover, unlike quantization, the compressed linear layers remain fully differentiable and all the parameters trainable, while being able to leverage the… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 9 pages

  8. Amplifying Pathological Detection in EEG Signaling Pathways through Cross-Dataset Transfer Learning

    Authors: Mohammad-Javad Darvishi-Bayazi, Mohammad Sajjad Ghaemi, Timothee Lesort, Md Rifat Arefin, Jocelyn Faubert, Irina Rish

    Abstract: Pathology diagnosis based on EEG signals and decoding brain activity holds immense importance in understanding neurological disorders. With the advancement of artificial intelligence methods and machine learning techniques, the potential for accurate data-driven diagnoses and effective treatments has grown significantly. However, applying machine learning algorithms to real-world datasets presents… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  9. arXiv:2308.04014  [pdf, other

    cs.CL cs.LG

    Continual Pre-Training of Large Language Models: How to (re)warm your model?

    Authors: Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

    Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data t… ▽ More

    Submitted 6 September, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  10. arXiv:2307.05735  [pdf, other

    cs.LG nlin.CD physics.data-an physics.med-ph

    Effective Latent Differential Equation Models via Attention and Multiple Shooting

    Authors: Germán Abrevaya, Mahta Ramezanian-Panahi, Jean-Christophe Gagnon-Audet, Pablo Polosecki, Irina Rish, Silvina Ponce Dawson, Guillermo Cecchi, Guillaume Dumas

    Abstract: Scientific Machine Learning (SciML) is a burgeoning field that synergistically combines domain-aware and interpretable models with agnostic machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. GOKU-UI not only broadens the original model's spectrum to incorporate other classes of differential equations, such as Stochastic Differenti… ▽ More

    Submitted 14 September, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

  11. arXiv:2306.14808  [pdf, other

    cs.LG

    Maximum State Entropy Exploration using Predecessor and Successor Representations

    Authors: Arnav Kumar Jain, Lucas Lehnert, Irina Rish, Glen Berseth

    Abstract: Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misplaced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition o… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  12. arXiv:2306.13253  [pdf, other

    cs.LG

    Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

    Authors: Pascal Jr. Tikeng Notsawo, Hattie Zhou, Mohammad Pezeshki, Irina Rish, Guillaume Dumas

    Abstract: This paper focuses on predicting the occurrence of grokking in neural networks, a phenomenon in which perfect generalization emerges long after signs of overfitting or memorization are observed. It has been reported that grokking can only be observed with certain hyper-parameters. This makes it critical to identify the parameters that lead to grokking. However, since grokking occurs after a large… ▽ More

    Submitted 28 September, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: 26 pages, 30 figures

    ACM Class: I.2.6

  13. arXiv:2304.13765  [pdf, other

    cs.AI

    Towards ethical multimodal systems

    Authors: Alexis Roger, Esma Aïmeur, Irina Rish

    Abstract: Generative AI systems (ChatGPT, DALL-E, etc) are expanding into multiple areas of our lives, from art Rombach et al. [2021] to mental health Rob Morris and Kareem Kouddous [2022]; their rapidly growing societal impact opens new opportunities, but also raises ethical concerns. The emerging field of AI alignment aims to make AI systems reflect human values. This paper focuses on evaluating the ethic… ▽ More

    Submitted 20 May, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: 5 pages, multimodal ethical dataset building, accepted in the NeurIPS 2023 MP2 workshop

    ACM Class: I.2.7

  14. arXiv:2302.01067  [pdf, other

    cs.AI cs.LG cs.SC

    A Survey on Compositional Generalization in Applications

    Authors: Baihan Lin, Djallel Bouneffouf, Irina Rish

    Abstract: The field of compositional generalization is currently experiencing a renaissance in AI, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical compositional generalization problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the compositiona… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  15. arXiv:2211.04742  [pdf, other

    cs.LG

    Knowledge Distillation for Federated Learning: a Practical Guide

    Authors: Alessio Mora, Irene Tenison, Paolo Bellavista, Irina Rish

    Abstract: Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. This paves the way for stronger privacy guarantees when building predictive models. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits: (i) Clients must implement the same model architectu… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: 9 pages, 1 figure

  16. arXiv:2210.14891  [pdf, other

    cs.LG cs.AI

    Broken Neural Scaling Laws

    Authors: Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger

    Abstract: We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models & extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, number of training steps, or… ▽ More

    Submitted 23 July, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Published as a conference paper at International Conference on Learning Representations (ICLR) 2023

    Journal ref: International Conference on Learning Representations (ICLR), 2023

  17. arXiv:2210.14161  [pdf, other

    cs.CV cs.AI

    Aligning MAGMA by Few-Shot Learning and Finetuning

    Authors: Jean-Charles Layoun, Alexis Roger, Irina Rish

    Abstract: The goal of vision-language modeling is to allow models to tie language understanding with visual inputs. The aim of this paper is to evaluate and align the Visual Language Model (VLM) called Multimodal Augmentation of Generative Models through Adapter-based finetuning (MAGMA) with human values. MAGMA is a VLM that is capable of image captioning and visual question-answering. We will evaluate its… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted by the Montreal AI Symposium conference in 2022

  18. arXiv:2210.04121  [pdf, other

    cs.AI cs.LG cs.MA q-bio.NC

    Cognitive Models as Simulators: The Case of Moral Decision-Making

    Authors: Ardavan S. Nobandegani, Thomas R. Shultz, Irina Rish

    Abstract: To achieve desirable performance, current AI systems often require huge amounts of training data. This is especially problematic in domains where collecting data is both expensive and time-consuming, e.g., where AI systems require having numerous interactions with humans, collecting feedback from them. In this work, we substantiate the idea of $\textit{cognitive models as simulators}$, which is to… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

  19. arXiv:2210.03150  [pdf, other

    cs.LG cs.AI

    Towards Out-of-Distribution Adversarial Robustness

    Authors: Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan

    Abstract: Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fails to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different $L_p$ norms, we show that there is potential for improvement against many commonly used attacks by adopting a domain generalisation approach. C… ▽ More

    Submitted 26 June, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: Version of NeurIPS 2023 submission

  20. arXiv:2207.04543  [pdf, other

    cs.LG cs.AI

    Challenging Common Assumptions about Catastrophic Forgetting

    Authors: Timothée Lesort, Oleksiy Ostapenko, Diganta Misra, Md Rifat Arefin, Pau Rodríguez, Laurent Charlin, Irina Rish

    Abstract: Building learning agents that can progressively learn and accumulate knowledge is the core goal of the continual learning (CL) research field. Unfortunately, training a model on new data usually compromises the performance on past data. In the CL literature, this effect is referred to as catastrophic forgetting (CF). CF has been largely studied, and a plethora of methods have been proposed to addr… ▽ More

    Submitted 15 May, 2023; v1 submitted 10 July, 2022; originally announced July 2022.

  21. arXiv:2205.00329  [pdf, other

    cs.LG cs.AI

    Continual Learning with Foundation Models: An Empirical Study of Latent Replay

    Authors: Oleksiy Ostapenko, Timothee Lesort, Pau Rodríguez, Md Rifat Arefin, Arthur Douillard, Irina Rish, Laurent Charlin

    Abstract: Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios. Our goal is twofold. First, we want to understand the compute-accuracy trade-off between CL i… ▽ More

    Submitted 2 July, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

  22. arXiv:2204.01640  [pdf, other

    cs.LG cs.CV

    APP: Anytime Progressive Pruning

    Authors: Diganta Misra, Bharat Runwal, Tianlong Chen, Zhangyang Wang, Irina Rish

    Abstract: With the latest advances in deep learning, there has been a lot of focus on the online learning paradigm due to its relevance in practical settings. Although many methods have been investigated for optimal learning settings in scenarios where the data stream is continuous over time, sparse networks training in such settings have often been overlooked. In this paper, we explore the problem of train… ▽ More

    Submitted 1 June, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: 21 pages including 4 pages of references. Preprint version

  23. arXiv:2203.09978  [pdf, other

    cs.LG stat.ML

    WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series

    Authors: Jean-Christophe Gagnon-Audet, Kartik Ahuja, Mohammad-Javad Darvishi-Bayazi, Pooneh Mousavi, Guillaume Dumas, Irina Rish

    Abstract: Machine learning models often fail to generalize well under distributional shifts. Understanding and overcoming these failures have led to a research field of Out-of-Distribution (OOD) generalization. Despite being extensively studied for static computer vision tasks, OOD generalization has been underexplored for time series tasks. To shine light on this gap, we present WOODS: eight challenging op… ▽ More

    Submitted 6 April, 2023; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: 47 pages, 21 figures

  24. arXiv:2201.13415  [pdf, other

    cs.NE

    Towards Scaling Difference Target Propagation by Learning Backprop Targets

    Authors: Maxence Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

    Abstract: The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on c… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

  25. arXiv:2201.11986  [pdf, other

    cs.LG cs.AI

    Gradient Masked Averaging for Federated Learning

    Authors: Irene Tenison, Sai Aravind Sreeramadas, Vaikkunth Mugunthan, Edouard Oyallon, Irina Rish, Eugene Belilovsky

    Abstract: Federated learning (FL) is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a unified global model without the need to share data amongst each other. A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms. Standard FL algorithms involve averaging of… ▽ More

    Submitted 14 November, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

  26. arXiv:2112.07066  [pdf, other

    cs.LG

    Continual Learning In Environments With Polynomial Mixing Times

    Authors: Matthew Riemer, Sharath Chandra Raparthy, Ignacio Cases, Gopeshh Subbaraj, Maximilian Puelma Touzel, Irina Rish

    Abstract: The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In… ▽ More

    Submitted 13 October, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted at NeurIPS 2022

  27. arXiv:2110.09419  [pdf, other

    cs.LG

    Compositional Attention: Disentangling Search and Retrieval

    Authors: Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie

    Abstract: Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected e… ▽ More

    Submitted 13 February, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

  28. arXiv:2110.06990  [pdf, other

    cs.LG cs.AI cs.CV

    Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

    Authors: Gabriele Prato, Simon Guiroy, Ethan Caballero, Irina Rish, Sarath Chandar

    Abstract: Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance with increasing resources such as data, compute and model size provides a more comprehensive e… ▽ More

    Submitted 18 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

  29. arXiv:2108.12461  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Approximate Bayesian Optimisation for Neural Networks

    Authors: Nadhir Hassen, Irina Rish

    Abstract: A body of work has been done to automate machine learning algorithm to highlight the importance of model choice. Automating the process of choosing the best forecasting model and its corresponding parameters can result to improve a wide range of real-world applications. Bayesian optimisation (BO) uses a blackbox optimisation methods to propose solutions according to an exploration-exploitation tra… ▽ More

    Submitted 31 August, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: 9 pages with 4 pages supplementary materials

  30. arXiv:2108.01005  [pdf, other

    cs.LG

    Sequoia: A Software Framework to Unify Continual Learning Research

    Authors: Fabrice Normandin, Florian Golemo, Oleksiy Ostapenko, Pau Rodriguez, Matthew D Riemer, Julio Hurtado, Khimya Khetarpal, Ryan Lindeborg, Lucas Cecchi, Timothée Lesort, Laurent Charlin, Irina Rish, Massimo Caccia

    Abstract: The field of Continual Learning (CL) seeks to develop algorithms that accumulate knowledge and skills over time through interaction with non-stationary environments. In practice, a plethora of evaluation procedures (settings) and algorithmic solutions (methods) exist, each with their own potentially disjoint set of assumptions. This variety makes measuring progress in CL difficult. We propose a ta… ▽ More

    Submitted 5 June, 2023; v1 submitted 2 August, 2021; originally announced August 2021.

  31. arXiv:2107.09539  [pdf, other

    cs.LG eess.SP

    Parametric Scattering Networks

    Authors: Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf

    Abstract: The wavelet scattering transform creates geometric invariants and deformation stability. In multiple signal domains, it has been shown to yield more discriminative representations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering tra… ▽ More

    Submitted 15 August, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    ACM Class: F.2.2; I.2.7

  32. arXiv:2106.06607  [pdf, other

    cs.LG stat.ML

    Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

    Authors: Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Jean-Christophe Gagnon-Audet, Yoshua Bengio, Ioannis Mitliagkas, Irina Rish

    Abstract: The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due t… ▽ More

    Submitted 20 November, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

  33. arXiv:2106.02266  [pdf, other

    cs.LG cs.AI

    SAND-mask: An Enhanced Gradient Masking Strategy for the Discovery of Invariances in Domain Generalization

    Authors: Soroosh Shahtalebi, Jean-Christophe Gagnon-Audet, Touraj Laleh, Mojtaba Faramarzi, Kartik Ahuja, Irina Rish

    Abstract: A major bottleneck in the real-world applications of machine learning models is their failure in generalizing to unseen domains whose data distribution is not i.i.d to the training domains. This failure often stems from learning non-generalizable features in the training domains that are spuriously correlated with the label of data. To address this shortcoming, there has been a growing surge of in… ▽ More

    Submitted 25 September, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

  34. arXiv:2106.01834  [pdf, other

    cs.LG cs.AI

    Continual Learning in Deep Networks: an Analysis of the Last Layer

    Authors: Timothée Lesort, Thomas George, Irina Rish

    Abstract: We study how different output layer parameterizations of a deep neural network affects learning and forgetting in continual learning settings. The following three effects can cause catastrophic forgetting in the output layer: (1) weights modifications, (2) interference, and (3) projection drift. In this paper, our goal is to provide more insights into how changing the output layer parameterization… ▽ More

    Submitted 17 August, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

  35. arXiv:2104.10322  [pdf, other

    cs.LG

    Gradient Masked Federated Optimization

    Authors: Irene Tenison, Sreya Francis, Irina Rish

    Abstract: Federated Averaging (FedAVG) has become the most popular federated learning algorithm due to its simplicity and low communication overhead. We use simple examples to show that FedAVG has the tendency to sew together the optima across the participating clients. These sewed optima exhibit poor generalization when used on a new client with new data distribution. Inspired by the invariance principles… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    ACM Class: I.2.0

    Journal ref: ICLR 2021 Distributed and Private Machine Learning(DPML) Workshop

  36. arXiv:2104.06557  [pdf, other

    cs.LG cs.AI cs.CR

    Towards Causal Federated Learning For Enhanced Robustness and Privacy

    Authors: Sreya Francis, Irene Tenison, Irina Rish

    Abstract: Federated Learning is an emerging privacy-preserving distributed machine learning approach to building a shared model by performing distributed training locally on participating devices (clients) and aggregating the local models into a global one. As this approach prevents data collection and aggregation, it helps in reducing associated privacy risks to a great extent. However, the data samples ac… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    ACM Class: I.2.0

    Journal ref: ICLR 2021 Distributed and Private Machine Learning(DPML) Workshop

  37. arXiv:2104.01678  [pdf, other

    cs.LG cs.AI

    Understanding Continual Learning Settings with Data Distribution Drift Analysis

    Authors: Timothée Lesort, Massimo Caccia, Irina Rish

    Abstract: Classical machine learning algorithms often assume that the data are drawn i.i.d. from a stationary probability distribution. Recently, continual learning emerged as a rapidly growing area of machine learning where this assumption is relaxed, i.e. where the data distribution is non-stationary and changes over time. This paper represents the state of data distribution by a context variable $c$. A d… ▽ More

    Submitted 10 July, 2022; v1 submitted 4 April, 2021; originally announced April 2021.

  38. arXiv:2012.13490  [pdf, other

    cs.LG cs.AI

    Towards Continual Reinforcement Learning: A Review and Perspectives

    Authors: Khimya Khetarpal, Matthew Riemer, Irina Rish, Doina Precup

    Abstract: In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations by mathematically characterizing two key properties… ▽ More

    Submitted 11 November, 2022; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: Journal of Artificial Intelligence Research (JAIR)

  39. arXiv:2010.16004  [pdf, other

    cs.CY cs.LG cs.MA cs.SI

    COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing

    Authors: Prateek Gupta, Tegan Maharaj, Martin Weiss, Nasim Rahaman, Hannah Alsdurf, Abhinav Sharma, Nanor Minoyan, Soren Harnois-Leblanc, Victor Schmidt, Pierre-Luc St. Charles, Tristan Deleu, Andrew Williams, Akshay Patel, Meng Qu, Olexa Bilaniuk, Gaétan Marceau Caron, Pierre Luc Carrier, Satya Ortiz-Gagné, Marc-Andre Rousseau, David Buckeridge, Joumana Ghosn, Yang Zhang, Bernhard Schölkopf, Jian Tang, Irina Rish , et al. (4 additional authors not shown)

    Abstract: The rapid global spread of COVID-19 has led to an unprecedented demand for effective methods to mitigate the spread of the disease, and various digital contact tracing (DCT) methods have emerged as a component of the solution. In order to make informed public health choices, there is a need for tools which allow evaluation and comparison of DCT methods. We introduce an agent-based compartmental si… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

  40. arXiv:2010.12536  [pdf, other

    cs.LG cs.AI cs.MA cs.SI

    Predicting Infectiousness for Proactive Contact Tracing

    Authors: Yoshua Bengio, Prateek Gupta, Tegan Maharaj, Nasim Rahaman, Martin Weiss, Tristan Deleu, Eilif Muller, Meng Qu, Victor Schmidt, Pierre-Luc St-Charles, Hannah Alsdurf, Olexa Bilanuik, David Buckeridge, Gáetan Marceau Caron, Pierre-Luc Carrier, Joumana Ghosn, Satya Ortiz-Gagne, Chris Pal, Irina Rish, Bernhard Schölkopf, Abhinav Sharma, Jian Tang, Andrew Williams

    Abstract: The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between pri… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  41. arXiv:2010.09473  [pdf, other

    cs.LG cs.AI

    Double-Linear Thompson Sampling for Context-Attentive Bandits

    Authors: Djallel Bouneffouf, Raphaël Féraud, Sohini Upadhyay, Yasaman Khazaeni, Irina Rish

    Abstract: In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration;however, the agent has a freedom to choose which variables to observe. We de… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:1906.09384

  42. arXiv:2006.04621  [pdf, other

    cs.LG stat.ML

    Adversarial Feature Desensitization

    Authors: Pouya Bashivan, Reza Bayat, Adam Ibrahim, Kartik Ahuja, Mojtaba Faramarzi, Touraj Laleh, Blake Aaron Richards, Irina Rish

    Abstract: Neural networks are known to be vulnerable to adversarial attacks -- slight but carefully constructed perturbations of the inputs which can drastically impair the network's performance. Many defense methods have been proposed for improving robustness of deep networks by training them on adversarially perturbed inputs. However, these models often remain vulnerable to new types of attacks not seen d… ▽ More

    Submitted 4 January, 2022; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: Accepted at Neurips 2021

  43. arXiv:2005.08502  [pdf, other

    cs.CR cs.AI cs.CY

    COVI White Paper

    Authors: Hannah Alsdurf, Edmond Belliveau, Yoshua Bengio, Tristan Deleu, Prateek Gupta, Daphne Ippolito, Richard Janda, Max Jarvie, Tyler Kolody, Sekoul Krastev, Tegan Maharaj, Robert Obryk, Dan Pilat, Valerie Pisano, Benjamin Prud'homme, Meng Qu, Nasim Rahaman, Irina Rish, Jean-Francois Rousseau, Abhinav Sharma, Brooke Struck, Jian Tang, Martin Weiss, Yun William Yu

    Abstract: The SARS-CoV-2 (Covid-19) pandemic has caused significant strain on public health institutions around the world. Contact tracing is an essential tool to change the course of the Covid-19 pandemic. Manual contact tracing of Covid-19 cases has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through… ▽ More

    Submitted 27 July, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: 64 pages, 1 figure

  44. arXiv:2005.04544  [pdf, other

    cs.AI cs.LG q-bio.NC stat.ML

    Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

    Authors: Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf, Jenna Reinen, Irina Rish

    Abstract: Artificial behavioral agents are often evaluated based on their consistent behaviors and performance to take sequential actions in an environment to maximize some notion of cumulative reward. However, human decision making in real life usually involves different strategies and behavioral trajectories that lead to the same empirical outcome. Motivated by clinical literature of a wide range of neuro… ▽ More

    Submitted 27 December, 2021; v1 submitted 9 May, 2020; originally announced May 2020.

    Comments: Proceeding of HBAI 2020. This article supersedes and extends our work arXiv:1706.02897 (MAB) and arXiv:1906.11286 (RL) into the Contextual Bandit (CB) framework. It generalized extensively into multi-armed bandits, contextual bandits and RL settings to create a unified framework of human behavioral agents

    Journal ref: In Human Brain and Artificial Intelligence (pp. 14-33). Springer 2021

  45. arXiv:2004.00161  [pdf, other

    cs.CV cs.LG eess.IV

    Towards Lifelong Self-Supervision For Unpaired Image-to-Image Translation

    Authors: Victor Schmidt, Makesh Narsimhan Sreedhar, Mostafa ElAraby, Irina Rish

    Abstract: Unpaired Image-to-Image Translation (I2IT) tasks often suffer from lack of data, a problem which self-supervised learning (SSL) has recently been very popular and successful at tackling. Leveraging auxiliary tasks such as rotation prediction or generative colorization, SSL can produce better and more robust representations in a low data regime. Training such tasks along an I2IT task is however com… ▽ More

    Submitted 31 March, 2020; originally announced April 2020.

  46. arXiv:2003.05856  [pdf, other

    cs.AI cs.LG

    Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning

    Authors: Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexandre Lacoste, David Vazquez, Laurent Charlin

    Abstract: Continual learning studies agents that learn from streams of tasks without forgetting previous ones while adapting to new ones. Two recent continual-learning scenarios have opened new avenues of research. In meta-continual learning, the model is pre-trained to minimize catastrophic forgetting of previous tasks. In continual-meta learning, the aim is to train agents for faster remembering of previo… ▽ More

    Submitted 20 January, 2021; v1 submitted 12 March, 2020; originally announced March 2020.

    Journal ref: NeurIPS 2020

  47. arXiv:1906.11286  [pdf, other

    cs.LG cs.AI cs.MA q-bio.NC stat.ML

    A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry

    Authors: Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf, Jenna Reinen, Irina Rish

    Abstract: Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework for reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help u… ▽ More

    Submitted 14 April, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: Published in AAMAS 2020 as a full paper. This article supersedes our work arXiv:1706.02897 into RL setting and extends extensively into RL games, cognitive modeling, and gambling tasks in lifelong learning setting

  48. arXiv:1904.10040  [pdf, ps, other

    cs.LG stat.ML

    A Survey on Practical Applications of Multi-Armed and Contextual Bandits

    Authors: Djallel Bouneffouf, Irina Rish

    Abstract: In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms mot… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

    Comments: under review by IJCAI 2019 Survey

  49. arXiv:1904.09330  [pdf, other

    cs.NE

    Continual Learning with Self-Organizing Maps

    Authors: Pouya Bashivan, Martin Schrimpf, Robert Ajemian, Irina Rish, Matthew Riemer, Yuhai Tu

    Abstract: Despite remarkable successes achieved by modern neural networks in a wide range of applications, these networks perform best in domain-specific stationary environments where they are trained only once on large-scale controlled data repositories. When exposed to non-stationary learning environments, current neural networks tend to forget what they had previously learned, a phenomena known as catast… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

    Comments: Continual Learning Workshop - NeurIPS 2018

  50. arXiv:1810.11910  [pdf, other

    cs.LG cs.AI stat.ML

    Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference

    Authors: Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, Gerald Tesauro

    Abstract: Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient al… ▽ More

    Submitted 2 May, 2019; v1 submitted 28 October, 2018; originally announced October 2018.

    Comments: ICLR 2019