Search | arXiv e-print repository

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

Authors: MohammadReza Davari, Eugene Belilovsky

Abstract: The rapid development of AI systems has been greatly influenced by the emergence of foundation models. A common approach for targeted problems involves fine-tuning these pre-trained foundation models for specific target tasks, resulting in a rapid spread of models fine-tuned across a diverse array of tasks. This work focuses on the problem of merging multiple fine-tunings of the same foundation mo… ▽ More The rapid development of AI systems has been greatly influenced by the emergence of foundation models. A common approach for targeted problems involves fine-tuning these pre-trained foundation models for specific target tasks, resulting in a rapid spread of models fine-tuned across a diverse array of tasks. This work focuses on the problem of merging multiple fine-tunings of the same foundation model derived from a spectrum of auxiliary tasks. We introduce a new simple method, Model Breadcrumbs, which consists of a sparsely defined set of weights that carve out a trajectory within the weight space of a pre-trained model, enhancing task performance when traversed. These breadcrumbs are constructed by subtracting the weights from a pre-trained model before and after fine-tuning, followed by a sparsification process that eliminates weight outliers and negligible perturbations. Our experiments demonstrate the effectiveness of Model Breadcrumbs to simultaneously improve performance across multiple tasks. This contribution aligns with the evolving paradigm of updatable machine learning, reminiscent of the collaborative principles underlying open-source software development, fostering a community-driven effort to reliably update machine learning models. Our method is shown to be more efficient and unlike previous proposals does not require hyperparameter tuning for each new task added. Through extensive experimentation involving various models, tasks, and modalities we establish that integrating Model Breadcrumbs offers a simple, efficient, and highly effective approach for constructing multi-task models and facilitating updates to foundation models. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2305.12542 [pdf, other]

ToxBuster: In-game Chat Toxicity Buster with BERT

Authors: Zachary Yang, Yasmine Maricar, MohammadReza Davari, Nicolas Grenon-Godbout, Reihaneh Rabbany

Abstract: Detecting toxicity in online spaces is challenging and an ever more pressing problem given the increase in social media and gaming consumption. We introduce ToxBuster, a simple and scalable model trained on a relatively large dataset of 194k lines of game chat from Rainbow Six Siege and For Honor, carefully annotated for different kinds of toxicity. Compared to the existing state-of-the-art, ToxBu… ▽ More Detecting toxicity in online spaces is challenging and an ever more pressing problem given the increase in social media and gaming consumption. We introduce ToxBuster, a simple and scalable model trained on a relatively large dataset of 194k lines of game chat from Rainbow Six Siege and For Honor, carefully annotated for different kinds of toxicity. Compared to the existing state-of-the-art, ToxBuster achieves 82.95% (+7) in precision and 83.56% (+57) in recall. This improvement is obtained by leveraging past chat history and metadata. We also study the implication towards real-time and post-game moderation as well as the model transferability from one game to another. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 11 pages, 3 figures

arXiv:2303.14771 [pdf, other]

Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning

Authors: Nader Asadi, MohammadReza Davari, Sudhir Mudur, Rahaf Aljundi, Eugene Belilovsky

Abstract: In Continual learning (CL) balancing effective adaptation while combating catastrophic forgetting is a central challenge. Many of the recent best-performing methods utilize various forms of prior task data, e.g. a replay buffer, to tackle the catastrophic forgetting problem. Having access to previous task data can be restrictive in many real-world scenarios, for example when task data is sensitive… ▽ More In Continual learning (CL) balancing effective adaptation while combating catastrophic forgetting is a central challenge. Many of the recent best-performing methods utilize various forms of prior task data, e.g. a replay buffer, to tackle the catastrophic forgetting problem. Having access to previous task data can be restrictive in many real-world scenarios, for example when task data is sensitive or proprietary. To overcome the necessity of using previous tasks' data, in this work, we start with strong representation learning methods that have been shown to be less prone to forgetting. We propose a holistic approach to jointly learn the representation and class prototypes while maintaining the relevance of old class prototypes and their embedded similarities. Specifically, samples are mapped to an embedding space where the representations are learned using a supervised contrastive loss. Class prototypes are evolved continually in the same latent space, enabling learning and prediction at any point. To continually adapt the prototypes without kee** any prior task data, we propose a novel distillation loss that constrains class prototypes to maintain relative similarities as compared to new task data. This method yields state-of-the-art performance in the task-incremental setting, outperforming methods relying on large amounts of data, and provides strong performance in the class-incremental setting without using any stored data points. △ Less

Submitted 6 June, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: Accepted at ICML 2023

arXiv:2210.16156 [pdf, other]

Reliability of CKA as a Similarity Measure in Deep Learning

Authors: MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, Eugene Belilovsky

Abstract: Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained… ▽ More Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of conclusions about similarity and dissimilarity of these various representations have been made using CKA. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation of CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counter-intuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics. △ Less

Submitted 16 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

arXiv:2203.13381 [pdf, other]

Probing Representation Forgetting in Supervised and Unsupervised Continual Learning

Authors: MohammadReza Davari, Nader Asadi, Sudhir Mudur, Rahaf Aljundi, Eugene Belilovsky

Abstract: Continual Learning research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representat… ▽ More Continual Learning research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representation, is typically measured or observed by evaluating the decrease in old task performance. However, a model's representation can change without losing knowledge about prior tasks. In this work we consider the concept of representation forgetting, observed by using the difference in performance of an optimal linear classifier before and after a new task is introduced. Using this tool we revisit a number of standard continual learning benchmarks and observe that, through this lens, model representations trained without any explicit control for forgetting often experience small representation forgetting and can sometimes be comparable to methods which explicitly control for forgetting, especially in longer task sequences. We also show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning. Based on our results, we show that a simple yet competitive approach is to learn representations continually with standard supervised contrastive learning while constructing prototypes of class samples when queried on old samples. △ Less

Submitted 5 April, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: Accepted at CVPR 2022

arXiv:2008.02017 [pdf, ps, other]

doi 10.1145/3343413.3378010

The Role of Word-Eye-Fixations for Query Term Prediction

Authors: Masoud Davari, Daniel Hienert, Dagmar Kern, Stefan Dietze

Abstract: Throughout the search process, the user's gaze on inspected SERPs and websites can reveal his or her search interests. Gaze behavior can be captured with eye tracking and described with word-eye-fixations. Word-eye-fixations contain the user's accumulated gaze fixation duration on each individual word of a web page. In this work, we analyze the role of word-eye-fixations for predicting query terms… ▽ More Throughout the search process, the user's gaze on inspected SERPs and websites can reveal his or her search interests. Gaze behavior can be captured with eye tracking and described with word-eye-fixations. Word-eye-fixations contain the user's accumulated gaze fixation duration on each individual word of a web page. In this work, we analyze the role of word-eye-fixations for predicting query terms. We investigate the relationship between a range of in-session features, in particular, gaze data, with the query terms and train models for predicting query terms. We use a dataset of 50 search sessions obtained through a lab study in the social sciences domain. Using established machine learning models, we can predict query terms with comparably high accuracy, even with only little training data. Feature analysis shows that the categories Fixation, Query Relevance and Session Topic contain the most effective features for our task. △ Less

Submitted 5 August, 2020; originally announced August 2020.

Journal ref: In CHIIR 2020, Proceedings of the 2020 Conference on Human Information Interaction and Retrieval, March 2020, Pages 422-426

arXiv:2006.03399 [pdf, ps, other]

Single-machine scheduling with an external resource

Authors: Dirk Briskorn, Morteza Davari, Jannik Matuschke

Abstract: This paper studies the complexity of single-machine scheduling with an external resource, which is rented for a non-interrupted period. Jobs that need this external resource are executed only when the external resource is available. There is a cost associated with the scheduling of jobs and a cost associated with the duration of the renting period of the external resource. We look at four classes… ▽ More This paper studies the complexity of single-machine scheduling with an external resource, which is rented for a non-interrupted period. Jobs that need this external resource are executed only when the external resource is available. There is a cost associated with the scheduling of jobs and a cost associated with the duration of the renting period of the external resource. We look at four classes of problems with an external resource: a class of problems where the renting period is budgeted and the scheduling cost needs to be minimized, a class of problems where the scheduling cost is budgeted and the renting period needs to be minimized, a class of two-objective problems where both, the renting period and the scheduling cost, are to be minimized, and a class of problems where a linear combination of the scheduling cost and the renting period is minimized. We provide a thorough complexity analysis (NP-hardness proofs and (pseudo-)polynomial algorithms) for different members of these four classes. △ Less

Submitted 11 September, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

arXiv:1909.06736 [pdf, other]

Identifying Multiple Interaction Events from Tactile Data during Robot-Human Object Transfer

Authors: Mohammad-Javad Davari, Michael Hegedus, Kamal Gupta, Mehran Mehrandezh

Abstract: During a robot to human object handover task, several intended or unintended events may occur with the object - it may be pulled, pushed, bumped or simply held - by the human receiver. We show that it is possible to differentiate between these events solely via tactile sensors. Training data from tactile sensors were recorded during interaction of human subjects with the object held by a 3-finger… ▽ More During a robot to human object handover task, several intended or unintended events may occur with the object - it may be pulled, pushed, bumped or simply held - by the human receiver. We show that it is possible to differentiate between these events solely via tactile sensors. Training data from tactile sensors were recorded during interaction of human subjects with the object held by a 3-finger robotic hand. A Bag of Words approach was used to automatically extract effective features from the tactile data. A Support Vector Machine was used to distinguish between the four events with over 95 percent average accuracy. △ Less

Submitted 15 September, 2019; originally announced September 2019.

Comments: 7 pages, accepted for 2019 IEEE ROMAN

arXiv:1904.11018 [pdf, other]

Toponym Identification in Epidemiology Articles - A Deep Learning Approach

Authors: MohammadReza Davari, Leila Kosseim, Tien D. Bui

Abstract: When analyzing the spread of viruses, epidemiologists often need to identify the location of infected hosts. This information can be found in public databases, such as GenBank, however, information provided in these databases are usually limited to the country or state level. More fine-grained localization information requires phylogeographers to manually read relevant scientific articles. In this… ▽ More When analyzing the spread of viruses, epidemiologists often need to identify the location of infected hosts. This information can be found in public databases, such as GenBank, however, information provided in these databases are usually limited to the country or state level. More fine-grained localization information requires phylogeographers to manually read relevant scientific articles. In this work we propose an approach to automate the process of place name identification from medical (epidemiology) articles. The focus of this paper is to propose a deep learning based model for toponym detection and experiment with the use of external linguistic features and domain specific information. The model was evaluated using a collection of 105 epidemiology articles from PubMed Central provided by the recent SemEval task 12. Our best detection model achieves an F1 score of $80.13\%$, a significant improvement compared to the state of the art of $69.84\%$. These results underline the importance of domain specific embedding as well as specific linguistic features in toponym detection in medical journals. △ Less

Submitted 27 April, 2019; v1 submitted 24 April, 2019; originally announced April 2019.

Comments: 12 pages. pre-print from Proceedings of CICLing 2019: 20th International Conference on Computational Linguistics and Intelligent Text Processing

Showing 1–9 of 9 results for author: Davari, M