-
Progress and Challenges for the Application of Machine Learning for Neglected Tropical Diseases
Authors:
Chung Yuen Khew,
Rahmad Akbar,
Norfarhan Mohd. Assaad
Abstract:
Neglected tropical diseases (NTDs) continue to affect the livelihood of individuals in countries in the Southeast Asia and Western Pacific region. These diseases have been long existing and have caused devastating health problems and economic decline to people in low- and middle-income (develo**) countries. An estimated 1.7 billion of the world's population suffer one or more NTDs annually, this…
▽ More
Neglected tropical diseases (NTDs) continue to affect the livelihood of individuals in countries in the Southeast Asia and Western Pacific region. These diseases have been long existing and have caused devastating health problems and economic decline to people in low- and middle-income (develo**) countries. An estimated 1.7 billion of the world's population suffer one or more NTDs annually, this puts approximately one in five individuals at risk for NTDs. In addition to health and social impact, NTDs inflict significant financial burden to patients, close relatives, and are responsible for billions of dollars lost in revenue from reduced labor productivity in develo** countries alone. There is an urgent need to better improve the control and eradication or elimination efforts towards NTDs. This can be achieved by utilizing machine learning tools to better the surveillance, prediction and detection program, and combat NTDs through the discovery of new therapeutics against these pathogens. This review surveys the current application of machine learning tools for NTDs and the challenges to elevate the state-of-the-art of NTDs surveillance, management, and treatment.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
ImmunoLingo: Linguistics-based formalization of the antibody language
Authors:
Mai Ha Vu,
Philippe A. Robert,
Rahmad Akbar,
Bartlomiej Swiatczak,
Geir Kjetil Sandve,
Dag Trygve Truslew Haug,
Victor Greiff
Abstract:
Apparent parallels between natural language and biological sequence have led to a recent surge in the application of deep language models (LMs) to the analysis of antibody and other biological sequences. However, a lack of a rigorous linguistic formalization of biological sequence languages, which would define basic components, such as lexicon (i.e., the discrete units of the language) and grammar…
▽ More
Apparent parallels between natural language and biological sequence have led to a recent surge in the application of deep language models (LMs) to the analysis of antibody and other biological sequences. However, a lack of a rigorous linguistic formalization of biological sequence languages, which would define basic components, such as lexicon (i.e., the discrete units of the language) and grammar (i.e., the rules that link sequence well-formedness, structure, and meaning) has led to largely domain-unspecific applications of LMs, which do not take into account the underlying structure of the biological sequences studied. A linguistic formalization, on the other hand, establishes linguistically-informed and thus domain-adapted components for LM applications. It would facilitate a better understanding of how differences and similarities between natural language and biological sequences influence the quality of LMs, which is crucial for the design of interpretable models with extractable sequence-functions relationship rules, such as the ones underlying the antibody specificity prediction problem. Deciphering the rules of antibody specificity is crucial to accelerating rational and in silico biotherapeutic drug design. Here, we formalize the properties of the antibody language and thereby establish not only a foundation for the application of linguistic tools in adaptive immune receptor analysis but also for the systematic immunolinguistic studies of immune receptor specificity in general.
△ Less
Submitted 29 November, 2022; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Linguistically inspired roadmap for building biologically reliable protein language models
Authors:
Mai Ha Vu,
Rahmad Akbar,
Philippe A. Robert,
Bartlomiej Swiatczak,
Victor Greiff,
Geir Kjetil Sandve,
Dag Trygve Truslew Haug
Abstract:
Deep neural-network-based language models (LMs) are increasingly applied to large-scale protein sequence data to predict protein function. However, being largely black-box models and thus challenging to interpret, current protein LM approaches do not contribute to a fundamental understanding of sequence-function map**s, hindering rule-based biotherapeutic drug development. We argue that guidance…
▽ More
Deep neural-network-based language models (LMs) are increasingly applied to large-scale protein sequence data to predict protein function. However, being largely black-box models and thus challenging to interpret, current protein LM approaches do not contribute to a fundamental understanding of sequence-function map**s, hindering rule-based biotherapeutic drug development. We argue that guidance drawn from linguistics, a field specialized in analytical rule extraction from natural language data, can aid with building more interpretable protein LMs that are more likely to learn relevant domain-specific rules. Differences between protein sequence data and linguistic sequence data require the integration of more domain-specific knowledge in protein LMs compared to natural language LMs. Here, we provide a linguistics-based roadmap for protein LM pipeline choices with regard to training data, tokenization, token embedding, sequence embedding, and model interpretation. Incorporating linguistic ideas into protein LMs enables the development of next-generation interpretable machine-learning models with the potential of uncovering the biological mechanisms underlying sequence-function relationships.
△ Less
Submitted 28 April, 2023; v1 submitted 3 July, 2022;
originally announced July 2022.
-
AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation
Authors:
Asif Khan,
Alexander I. Cowen-Rivers,
Antoine Grosnit,
Derrick-Goh-Xin Deik,
Philippe A. Robert,
Victor Greiff,
Eva Smorodina,
Puneet Rawat,
Kamil Dreczkowski,
Rahmad Akbar,
Rasul Tutunov,
Dany Bou-Ammar,
Jun Wang,
Amos Storkey,
Haitham Bou-Ammar
Abstract:
Antibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3 region located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a priority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies. However, the combinatorial nature of CDRH3 sequence space makes it imposs…
▽ More
Antibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3 region located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a priority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies. However, the combinatorial nature of CDRH3 sequence space makes it impossible to search for an optimal binding sequence exhaustively and efficiently using computational approaches. Here, we present \texttt{AntBO}: a combinatorial Bayesian optimisation framework enabling efficient \textit{in silico} design of the CDRH3 region. Ideally, antibodies are expected to have high target specificity and developability. We introduce a CDRH3 trust region that restricts the search to sequences with favourable developability scores to achieve this goal. For benchmarking, \texttt{AntBO} uses the \texttt{Absolut!} software suite as a black-box oracle to score the target specificity and affinity of designed antibodies \textit{in silico} in an unconstrained fashion~\citep{robert2021one}. The experiments performed for $159$ discretised antigens used in \texttt{Absolut!} demonstrate the benefit of \texttt{AntBO} in designing CDRH3 regions with diverse biophysical properties. In under $200$ calls to black-box oracle, \texttt{AntBO} can suggest antibody sequences that outperform the best binding sequence drawn from 6.9 million experimentally obtained CDRH3s and a commonly used genetic algorithm baseline. Additionally, \texttt{AntBO} finds very-high affinity CDRH3 sequences in only 38 protein designs whilst requiring no domain knowledge. We conclude \texttt{AntBO} brings automated antibody design methods closer to what is practically viable for in vitro experimentation.
△ Less
Submitted 14 October, 2022; v1 submitted 29 January, 2022;
originally announced January 2022.
-
Comparison of downscaling techniques for high resolution soil moisture map**
Authors:
Sabah Sabaghy,
Jeffrey Walker,
Luigi Renzullo,
Ruzbeh Akbar,
Steven Chan,
Julian Chaubell,
Narendra Das,
R. Scott Dunbar,
Dara Entekhabi,
Anouk Gevaert,
Thomas Jackson,
Olivier Merlin,
Mahta Moghaddam,
**zheng Peng,
Jeffrey Piepmeier,
Maria Piles,
Gerard Portal,
Christoph Rudiger,
Vivien Stefan,
Xiaoling Wu,
Nan Ye,
Simon Yueh
Abstract:
Soil moisture impacts exchanges of water, energy and carbon fluxes between the land surface and the atmosphere. Passive microwave remote sensing at L-band can capture spatial and temporal patterns of soil moisture in the landscape. Both ESA and NASA have launched L-band radiometers, in the form of the SMOS and SMAP satellites respectively, to monitor soil moisture globally, every 3-day at about 40…
▽ More
Soil moisture impacts exchanges of water, energy and carbon fluxes between the land surface and the atmosphere. Passive microwave remote sensing at L-band can capture spatial and temporal patterns of soil moisture in the landscape. Both ESA and NASA have launched L-band radiometers, in the form of the SMOS and SMAP satellites respectively, to monitor soil moisture globally, every 3-day at about 40 km resolution. However, their coarse scale restricts the range of applications. While SMAP included an L-band radar to downscale the radiometer soil moisture to 9 km, the radar failed after 3 months and this initial approach is not applicable to develo** a consistent long term soil moisture product across the two missions anymore. Existing optical-, radiometer-, and oversampling-based downscaling methods could be an alternative to the radar-based approach for delivering such data. Nevertheless, retrieval of a consistent high resolution soil moisture product remains a challenge, and there has been no comprehensive inter-comparison of the alternate approaches. This research undertakes an assessment of the different downscaling approaches using the SMAPEx-4 field campaign data
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires
Authors:
Alex J. Brown,
Igor Snapkov,
Rahmad Akbar,
Milena Pavlović,
Enkelejda Miho,
Geir K. Sandve,
Victor Greiff
Abstract:
The adaptive immune system is a natural diagnostic and therapeutic. It recognizes threats earlier than clinical symptoms manifest and neutralizes antigen with exquisite specificity. Recognition specificity and broad reactivity is enabled via adaptive B- and T-cell receptors: the immune receptor repertoire. The human immune system, however, is not omnipotent. Our natural defense system sometimes lo…
▽ More
The adaptive immune system is a natural diagnostic and therapeutic. It recognizes threats earlier than clinical symptoms manifest and neutralizes antigen with exquisite specificity. Recognition specificity and broad reactivity is enabled via adaptive B- and T-cell receptors: the immune receptor repertoire. The human immune system, however, is not omnipotent. Our natural defense system sometimes loses the battle to parasites and microbes and even turns against us in the case of cancer, autoimmune and inflammatory disease. A long-standing dream of immunoengineers has been, therefore, to mechanistically understand how the immune system sees, reacts and remembers antigens. Only very recently, experimental and computational methods have achieved sufficient quantitative resolution to start querying and engineering adaptive immunity with great precision. In specific, these innovations have been applied with the greatest fervency and success in immunotherapy, autoimmunity and vaccine design. The work here highlights advances, challenges and future directions of quantitative approaches which seek to advance the fundamental understanding of immunological phenomena, and reverse engineer the immune system to produce auspicious biopharmaceutical drugs and immunodiagnostics. Our review indicates that the merger of fundamental immunology, computational immunology and digital-biotechnology minimizes black box engineering, thereby advancing both immunological knowledge and as well immunoengineering methodologies.
△ Less
Submitted 8 April, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.
-
A Green Enterprise Computing Architecture for Develo** Countries
Authors:
Rabia Akbar,
Tahir Azim
Abstract:
Develo** countries often have access to limited energy resources, which frequently results in power cuts and failures. During these power cuts, enterprises rely on backup sources for power such as uninterruptible power supplies (UPS) and electric generators. This paper proposes AnywareDC, an architecture that builds on the recent work on Anyware to reduce energy utilization in the presence of su…
▽ More
Develo** countries often have access to limited energy resources, which frequently results in power cuts and failures. During these power cuts, enterprises rely on backup sources for power such as uninterruptible power supplies (UPS) and electric generators. This paper proposes AnywareDC, an architecture that builds on the recent work on Anyware to reduce energy utilization in the presence of such intermittent power supplies. Anyware reduces energy usage by providing enterprise users laptops instead of desktops, while maintaining performance using a central compute cluster. Our basic insight is that in the presence of power cuts, only the routers and the cluster needs to be provided power: the laptops can continue to run on their own batteries. This reduces both energy usage and UPS load allowing it to supply power for longer, thus also saving generator fuel costs. Simulations show that this architecture reduces energy usage by up to 80% compared to one not using Anyware, and by up to 20% compared to Anyware.
△ Less
Submitted 4 February, 2016;
originally announced February 2016.
-
Lepton Polarization Asymmetries of $H\toγτ^+τ^-$ Decays in Standard Model
Authors:
Rabia Akbar,
Ishtiaq Ahmed,
M. Jamil Aslam
Abstract:
Recently, CMS and ATLAS collaborations at LHC announced a Higgs like particle with mass near $125$GeV. Regarding this, to explore its intrinsic properties, different observables are needed to be measured precisely at the LHC for various decay channels of the Higgs. In this context, we calculate the final state lepton polarization asymmetries, namely, single lepton polarization asymmetries ($P_i$)…
▽ More
Recently, CMS and ATLAS collaborations at LHC announced a Higgs like particle with mass near $125$GeV. Regarding this, to explore its intrinsic properties, different observables are needed to be measured precisely at the LHC for various decay channels of the Higgs. In this context, we calculate the final state lepton polarization asymmetries, namely, single lepton polarization asymmetries ($P_i$) and double lepton polarization asymmetries ($P_{ij}$) in the SM for radiative semileptonic Higgs decay $H\toγτ^+τ^-$. In the phenomenological analysis of these lepton polarization asymmetries both tree and loop level diagrams are considered and it is found that these diagrams give important contributions in the evaluation of said asymmetries. Interestingly, it is found that in $P_{ij}$ the tree level diagrams contribute separately, which however, are missing in the calculations of $P_i$ and the lepton forward-backward asymmetries ($A_{FB}$). Similar to the other observables such as the decay rate and the lepton forward-backward asymmetries, the $τ$-lepton polarization asymmetries would be interesting observables. The experimental study of these observables will provide a fertile ground to explore the intrinsic properties of the SM Higgs boson and its dynamics as well as help us to extract the signatures of the possible new physics beyond the SM.
△ Less
Submitted 4 January, 2014;
originally announced January 2014.
-
Biologically Inspired Execution Framework for Vulnerable Workflow Systems
Authors:
Sohail Safdar,
Mohd. Fadzil B. Hassan,
Muhammad Aasim Qureshi,
Rehan Akbar
Abstract:
The main objective of the research is to introduce a biologically inspired execution framework for workflow systems under threat due to some intrusion attack. Usually vulnerable systems need to be stop and put into wait state, hence to insure the data security and privacy while being recovered. This research ensures the availability of services and data to the end user by kee** the data securi…
▽ More
The main objective of the research is to introduce a biologically inspired execution framework for workflow systems under threat due to some intrusion attack. Usually vulnerable systems need to be stop and put into wait state, hence to insure the data security and privacy while being recovered. This research ensures the availability of services and data to the end user by kee** the data security, privacy and integrity intact. To achieve the specified goals, the behavior of chameleons and concept of hibernation has been considered in combination. Hence the workflow systems become more robust using biologically inspired methods and remain available to the business consumers safely even in a vulnerable state.
△ Less
Submitted 2 November, 2009;
originally announced November 2009.
-
A O(E) Time Shortest Path Algorithm For Non Negative Weighted Undirected Graphs
Authors:
Muhammad Aasim Qureshi,
Dr. Fadzil B. Hassan,
Sohail Safdar,
Rehan Akbar
Abstract:
In most of the shortest path problems like vehicle routing problems and network routing problems, we only need an efficient path between two points source and destination, and it is not necessary to calculate the shortest path from source to all other nodes. This paper concentrates on this very idea and presents an algorithm for calculating shortest path for (i) nonnegative weighted undirected g…
▽ More
In most of the shortest path problems like vehicle routing problems and network routing problems, we only need an efficient path between two points source and destination, and it is not necessary to calculate the shortest path from source to all other nodes. This paper concentrates on this very idea and presents an algorithm for calculating shortest path for (i) nonnegative weighted undirected graphs (ii) unweighted undirected graphs. The algorithm completes its execution in O(E) for all graphs except few in which longer path (in terms of number of edges) from source to some node makes it best selection for that node. The main advantage of the algorithms is its simplicity and it does not need complex data structures for implementations.
△ Less
Submitted 2 November, 2009;
originally announced November 2009.