Search | arXiv e-print repository

Fisher Mask Nodes for Language Model Merging

Authors: Thennal D K, Ganesh Nathan, Suchithra M S

Abstract: Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of task-specific fine-tuned models. As these models typically only perform one task well, additional training or ensembling is required in multi-task scenarios. The growi… ▽ More Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of task-specific fine-tuned models. As these models typically only perform one task well, additional training or ensembling is required in multi-task scenarios. The growing field of model merging provides a solution, dealing with the challenge of combining multiple task-specific models into a single multi-task model. In this study, we introduce a novel model merging method for Transformers, combining insights from previous work in Fisher-weighted averaging and the use of Fisher information in model pruning. Utilizing the Fisher information of mask nodes within the Transformer architecture, we devise a computationally efficient weighted-averaging scheme. Our method exhibits a regular and significant performance increase across various models in the BERT family, outperforming full-scale Fisher-weighted averaging in a fraction of the computational cost, with baseline performance improvements of up to +6.5 and a speedup between 57.4x and 321.7x across models. Our results prove the potential of our method in current multi-task learning environments and suggest its scalability and adaptability to new model architectures and learning scenarios. △ Less

Submitted 3 May, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2205.11117 [pdf, other]

PyRelationAL: a python library for active learning research and development

Authors: Paul Scherer, Thomas Gaudelet, Alison Pouplin, Alice Del Vecchio, Suraj M S, Oliver Bolton, Jyothish Soman, Jake P. Taylor-King, Lindsay Edwards

Abstract: In constrained real-world scenarios, where it may be challenging or costly to generate data, disciplined methods for acquiring informative new data points are of fundamental importance for the efficient training of machine learning (ML) models. Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data through strategically querying… ▽ More In constrained real-world scenarios, where it may be challenging or costly to generate data, disciplined methods for acquiring informative new data points are of fundamental importance for the efficient training of machine learning (ML) models. Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data through strategically querying new data points that are the most useful for a particular task. Here, we introduce PyRelationAL, an open source library for AL research. We describe a modular toolkit that is compatible with diverse ML frameworks (e.g. PyTorch, scikit-learn, TensorFlow, JAX). Furthermore, the library implements a wide range of published methods and provides API access to wide-ranging benchmark datasets and AL task configurations based on existing literature. The library is supplemented by an expansive set of tutorials, demos, and documentation to help users get started. PyRelationAL is maintained using modern software engineering practices -- with an inclusive contributor code of conduct -- to promote long term library quality and utilisation. PyRelationAL is available under a permissive Apache licence on PyPi and at https://github.com/RelationRx/pyrelational. △ Less

Submitted 17 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: Updated paper reflecting 1.0.0 release

arXiv:2202.04202 [pdf, other]

RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro

Authors: Paul Bertin, Jarrid Rector-Brooks, Deepak Sharma, Thomas Gaudelet, Andrew Anighoro, Torsten Gross, Francisco Martinez-Pena, Eileen L. Tang, Suraj M S, Cristian Regep, Jeremy Hayter, Maksym Korablyov, Nicholas Valiante, Almer van der Sloot, Mike Tyers, Charles Roberts, Michael M. Bronstein, Luke L. Lairson, Jake P. Taylor-King, Yoshua Bengio

Abstract: For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not… ▽ More For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not necessarily generalise out of distribution. We employ a sequential model optimization search utilising a deep learning model to quickly discover synergistic drug combinations active against a cancer cell line, requiring substantially less screening than an exhaustive evaluation. Our small scale wet lab experiments only account for evaluation of ~5% of the total search space. After only 3 rounds of ML-guided in vitro experimentation (including a calibration round), we find that the set of drug pairs queried is enriched for highly synergistic combinations; two additional rounds of ML-guided experiments were performed to ensure reproducibility of trends. Remarkably, we rediscover drug combinations later confirmed to be under study within clinical trials. Moreover, we find that drug embeddings generated using only structural information begin to reflect mechanisms of action. Prior in silico benchmarking suggests we can enrich search queries by a factor of ~5-10x for highly synergistic drug combinations by using sequential rounds of evaluation when compared to random selection, or by a factor of >3x when using a pretrained model selecting all drug combinations at a single time point. △ Less

Submitted 2 March, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

arXiv:1911.08426 [pdf]

doi 10.30534/ijatcse/2019/84842019

A Study on various state of the art of the Art Face Recognition System using Deep Learning Techniques

Authors: Sukhada Chokkadi, Sannidhan M S, Sudeepa K B, Abhir Bhandary

Abstract: Considering the existence of very large amount of available data repositories and reach to the very advanced system of hardware, systems meant for facial identification ave evolved enormously over the past few decades. Sketch recognition is one of the most important areas that have evolved as an integral component adopted by the agencies of law administration in current trends of forensic science.… ▽ More Considering the existence of very large amount of available data repositories and reach to the very advanced system of hardware, systems meant for facial identification ave evolved enormously over the past few decades. Sketch recognition is one of the most important areas that have evolved as an integral component adopted by the agencies of law administration in current trends of forensic science. Matching of derived sketches to photo images of face is also a difficult assignment as the considered sketches are produced upon the verbal explanation depicted by the eye witness of the crime scene and may have scarcity of sensitive elements that exist in the photograph as one can accurately depict due to the natural human error. Substantial amount of the novel research work carried out in this area up late used recognition system through traditional extraction and classification models. But very recently, few researches work focused on using deep learning techniques to take an advantage of learning models for the feature extraction and classification to rule out potential domain challenges. The first part of this review paper basically focuses on deep learning techniques used in face recognition and matching which as improved the accuracy of face recognition technique with training of huge sets of data. This paper also includes a survey on different techniques used to match composite sketches to human images which includes component-based representation approach, automatic composite sketch recognition technique etc. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Journal ref: International Journal of Advanced Trends in Computer Science and Engineering, 8(4), July- August 2019, 1590

arXiv:1312.2323 [pdf]

Architectural Pattern of Health Care System Using GSM Networks

Authors: Meiappane. A, Dr. V. Prasanna Venkatesan, Selva Murugan. S, Arun. A, Ramachandran. A

Abstract: Large-scale networked environments, such as the Internet, possess the characteristics of centralised data, centralised access and centralised control; this gives the user a powerful mechanism for building and integrating large repositories of centralised information from diverse resources set. However, a centralised network system with GSM Networks development for a hospital information systems or… ▽ More Large-scale networked environments, such as the Internet, possess the characteristics of centralised data, centralised access and centralised control; this gives the user a powerful mechanism for building and integrating large repositories of centralised information from diverse resources set. However, a centralised network system with GSM Networks development for a hospital information systems or a health care information portal is still in its infancy. The shortcomings of the currently available tools have made the use of mobile devices more appealing. In mobile computing, the issues such as low bandwidth, high latency wireless Networks, loss or degradation of wireless connections, and network errors or failures need to be dealt with. Other issues to be addressed include system adaptability, reliability, robustness, extensibility, flexibility, and maintainability. GSM approach has emerged as the most viable approach for development of intelligent software applications for wireless mobile devices in a centralized environment, which gives higher band width of 900 MHz for transmission. The e-healthcare system that we have developed provides support for physicians, nurses, pharmacists and other healthcare professionals, as well as for patients and medical devices used to monitor patients. In this paper, we present the architecture and the demonstration prototype. △ Less

Submitted 9 December, 2013; originally announced December 2013.

Comments: 7 pages

Journal ref: (IJCTE), ISSN: 1793-8201. vol. 3, no. 1, pp. 64-70, February 2011

Showing 1–5 of 5 results for author: S, S M