-
HGTDR: Advancing Drug Repurposing with Heterogeneous Graph Transformers
Authors:
Ali Gharizadeh,
Karim Abbasi,
Amin Ghareyazi,
Mohammad R. K. Mofrad,
Hamid R. Rabiee
Abstract:
Motivation: Drug repurposing is a viable solution for reducing the time and cost associated with drug development. However, thus far, the proposed drug repurposing approaches still need to meet expectations. Therefore, it is crucial to offer a systematic approach for drug repurposing to achieve cost savings and enhance human lives. In recent years, using biological network-based methods for drug r…
▽ More
Motivation: Drug repurposing is a viable solution for reducing the time and cost associated with drug development. However, thus far, the proposed drug repurposing approaches still need to meet expectations. Therefore, it is crucial to offer a systematic approach for drug repurposing to achieve cost savings and enhance human lives. In recent years, using biological network-based methods for drug repurposing has generated promising results. Nevertheless, these methods have limitations. Primarily, the scope of these methods is generally limited concerning the size and variety of data they can effectively handle. Another issue arises from the treatment of heterogeneous data, which needs to be addressed or converted into homogeneous data, leading to a loss of information. A significant drawback is that most of these approaches lack end-to-end functionality, necessitating manual implementation and expert knowledge in certain stages. Results: We propose a new solution, HGTDR (Heterogeneous Graph Transformer for Drug Repurposing), to address the challenges associated with drug repurposing. HGTDR is a three-step approach for knowledge graph-based drug re-purposing: 1) constructing a heterogeneous knowledge graph, 2) utilizing a heterogeneous graph transformer network, and 3) computing relationship scores using a fully connected network. By leveraging HGTDR, users gain the ability to manipulate input graphs, extract information from diverse entities, and obtain their desired output. In the evaluation step, we demonstrate that HGTDR performs comparably to previous methods. Furthermore, we review medical studies to validate our method's top ten drug repurposing suggestions, which have exhibited promising results. We also demon-strated HGTDR's capability to predict other types of relations through numerical and experimental validation, such as drug-protein and disease-protein inter-relations.
△ Less
Submitted 18 May, 2024; v1 submitted 12 May, 2024;
originally announced May 2024.
-
Role of Pore Dilation in Molecular Transport through the Nuclear Pore Complex: Insights from Polymer Scaling Theory
Authors:
Atsushi Matsuda,
Mohammad R. K. Mofrad
Abstract:
Recent studies have suggested that the Nuclear Pore Complex (NPC) plays a significant role in mechanotransduction. When a force is exerted, the NPC's diameter widens, leading to an increased molecular flux into the nucleus. In this study, we sought to further explore this phenomenon and quantitativelly assess the impact of pore dilation on molecular transport through the NPC. Utilizing the scaling…
▽ More
Recent studies have suggested that the Nuclear Pore Complex (NPC) plays a significant role in mechanotransduction. When a force is exerted, the NPC's diameter widens, leading to an increased molecular flux into the nucleus. In this study, we sought to further explore this phenomenon and quantitativelly assess the impact of pore dilation on molecular transport through the NPC. Utilizing the scaling theory of polymers, we developed a theoretical model to examine the relationship between pore size and the molecular transport rate. Our model posits that the mesh structure inside the pore, formed by FG-Nups, significantly influences the transport rate. Consequently, we propose that the transport rate is exponentially related to the pore size. To validate our model, we conducted extensive Brownian dynamics simulations. Our results demonstrated that the model accurately represents the transport dynamics except for exceptionally small molecules. For these molecules, the local mesh structure becomes less significant, and instead, they perceive the global structure of the pore. We also identified a critical threshold value, which allows for an estimation of whether a given molecule falls within the scope of our model. Our findings provide valuable insights into the dynamics of molecular transport in the NPC and pave the way for future research on the NPC's role in mechanotransduction.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Beyond the Hype: Assessing the Performance, Trustworthiness, and Clinical Suitability of GPT3.5
Authors:
Salmonn Talebi,
Elizabeth Tong,
Mohammad R. K. Mofrad
Abstract:
The use of large language models (LLMs) in healthcare is gaining popularity, but their practicality and safety in clinical settings have not been thoroughly assessed. In high-stakes environments like medical settings, trust and safety are critical issues for LLMs. To address these concerns, we present an approach to evaluate the performance and trustworthiness of a GPT3.5 model for medical image p…
▽ More
The use of large language models (LLMs) in healthcare is gaining popularity, but their practicality and safety in clinical settings have not been thoroughly assessed. In high-stakes environments like medical settings, trust and safety are critical issues for LLMs. To address these concerns, we present an approach to evaluate the performance and trustworthiness of a GPT3.5 model for medical image protocol assignment. We compare it with a fine-tuned BERT model and a radiologist. In addition, we have a radiologist review the GPT3.5 output to evaluate its decision-making process. Our evaluation dataset consists of 4,700 physician entries across 11 imaging protocol classes spanning the entire head. Our findings suggest that the GPT3.5 performance falls behind BERT and a radiologist. However, GPT3.5 outperforms BERT in its ability to explain its decision, detect relevant word indicators, and model calibration. Furthermore, by analyzing the explanations of GPT3.5 for misclassifications, we reveal systematic errors that need to be resolved to enhance its safety and suitability for clinical use.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Electronic and magnetic properties of perovskite selenite and tellurite compounds: CoSeO$_3$, NiSeO$_3$, CoTeO$_3$ and NiTeO$_3$
Authors:
A. Rafi M. Iasir,
Todd Lombardi,
Qiangsheng Lu,
Amir M. Mofrad,
Mitchel Vaninger,
Xiaoqian Zhang,
David J. Singh
Abstract:
Selenium and tellurium are among the few elements that form $AB$O$_3$ perovskite structures with a four valent ion in the $A$ site. This leads to highly distorted structures and unusual magnetic behavior. Here we investigate the Co and Ni selenite and tellurite compounds, CoSeO$_3$, CoTeO$_3$, NiSeO$_3$ and NiTeO$_3$ using first principles calculations. We find an interplay of crystal field and Ja…
▽ More
Selenium and tellurium are among the few elements that form $AB$O$_3$ perovskite structures with a four valent ion in the $A$ site. This leads to highly distorted structures and unusual magnetic behavior. Here we investigate the Co and Ni selenite and tellurite compounds, CoSeO$_3$, CoTeO$_3$, NiSeO$_3$ and NiTeO$_3$ using first principles calculations. We find an interplay of crystal field and Jahn-Teller distortions that underpin the electronic and magnetic properties. While all compounds are predicted to show an insulating G-type antiferromagnetic ground state, there is a considerable difference in the anisotropy of the exchange interactions between the Ni and Co compounds. This is related to the Jahn-Teller distortion. Finally, we observe that these four compounds show characteristics generally associated with Mott insulators, even when described at the level of standard density functional theory. These are then dense bulk band or Slater, Mott-type insulators.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
ProDyn0: Inferring calponin homology domain stretching behavior using graph neural networks
Authors:
Ali Madani,
Cyna Shirazinejad,
Jia Rui Ong,
Hengameh Shams,
Mohammad Mofrad
Abstract:
Graph neural networks are a quickly emerging field for non-Euclidean data that leverage the inherent graphical structure to predict node, edge, and global-level properties of a system. Protein properties can not easily be understood as a simple sum of their parts (i.e. amino acids), therefore, understanding their dynamical properties in the context of graphs is attractive for revealing how perturb…
▽ More
Graph neural networks are a quickly emerging field for non-Euclidean data that leverage the inherent graphical structure to predict node, edge, and global-level properties of a system. Protein properties can not easily be understood as a simple sum of their parts (i.e. amino acids), therefore, understanding their dynamical properties in the context of graphs is attractive for revealing how perturbations to their structure can affect their global function. To tackle this problem, we generate a database of 2020 mutated calponin homology (CH) domains undergoing large-scale separation in molecular dynamics. To predict the mechanosensitive force response, we develop neural message passing networks and residual gated graph convnets which predict the protein dependent force separation at 86.63 percent, 81.59 kJ/mol/nm MAE, 76.99 psec MAE for force mode classification, max force magnitude, max force time respectively-- significantly better than non-graph-based deep learning techniques. Towards uniting geometric learning techniques and biophysical observables, we premiere our simulation database as a benchmark dataset for further development/evaluation of graph neural network architectures.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Partitioning Graphs for the Cloud using Reinforcement Learning
Authors:
Mohammad Hasanzadeh Mofrad,
Rami Melhem,
Mohammad Hammoud
Abstract:
In this paper, we propose Revolver, a parallel graph partitioning algorithm capable of partitioning large-scale graphs on a single shared-memory machine. Revolver employs an asynchronous processing framework, which leverages reinforcement learning and label propagation to adaptively partition a graph. In addition, it adopts a vertex-centric view of the graph where each vertex is assigned an autono…
▽ More
In this paper, we propose Revolver, a parallel graph partitioning algorithm capable of partitioning large-scale graphs on a single shared-memory machine. Revolver employs an asynchronous processing framework, which leverages reinforcement learning and label propagation to adaptively partition a graph. In addition, it adopts a vertex-centric view of the graph where each vertex is assigned an autonomous agent responsible for selecting a suitable partition for it, distributing thereby the computation across all vertices. The intuition behind using a vertex-centric view is that it naturally fits the graph partitioning problem, which entails that a graph can be partitioned using local information provided by each vertex's neighborhood. We fully implemented and comprehensively tested Revolver using nine real-world graphs. Our results show that Revolver is scalable and can outperform three popular and state-of-the-art graph partitioners via producing comparable localized partitions, yet without sacrificing the load balance across partitions.
△ Less
Submitted 17 July, 2019; v1 submitted 15 July, 2019;
originally announced July 2019.
-
UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages
Authors:
Ehsaneddin Asgari,
Fabienne Braune,
Benjamin Roth,
Christoph Ringlstetter,
Mohammad R. K. Mofrad
Abstract:
In this paper, we introduce UniSent universal sentiment lexica for $1000+$ languages. Sentiment lexica are vital for sentiment analysis in absence of document-level annotations, a very common scenario for low-resource languages. To the best of our knowledge, UniSent is the largest sentiment resource to date in terms of the number of covered languages, including many low resource ones. In this work…
▽ More
In this paper, we introduce UniSent universal sentiment lexica for $1000+$ languages. Sentiment lexica are vital for sentiment analysis in absence of document-level annotations, a very common scenario for low-resource languages. To the best of our knowledge, UniSent is the largest sentiment resource to date in terms of the number of covered languages, including many low resource ones. In this work, we use a massively parallel Bible corpus to project sentiment information from English to other languages for sentiment analysis on Twitter data. We introduce a method called DomDrift to mitigate the huge domain mismatch between Bible and Twitter by a confidence weighting scheme that uses domain-specific embeddings to compare the nearest neighbors for a candidate sentiment word in the source (Bible) and target (Twitter) domain. We evaluate the quality of UniSent in a subset of languages for which manually created ground truth was available, Macedonian, Czech, German, Spanish, and French. We show that the quality of UniSent is comparable to manually created sentiment resources when it is used as the sentiment seed for the task of word sentiment prediction on top of embedding representations. In addition, we show that emoticon sentiments could be reliably predicted in the Twitter domain using only UniSent and monolingual embeddings in German, Spanish, French, and Italian. With the publication of this paper, we release the UniSent sentiment lexica.
△ Less
Submitted 28 November, 2019; v1 submitted 21 April, 2019;
originally announced April 2019.
-
A Bi-population Particle Swarm Optimizer for Learning Automata based Slow Intelligent System
Authors:
Mohammad Hasanzadeh Mofrad,
S. K. Chang
Abstract:
Particle Swarm Optimization (PSO) is an Evolutionary Algorithm (EA) that utilizes a swarm of particles to solve an optimization problem. Slow Intelligence System (SIS) is a learning framework which slowly learns the solution to a problem performing a series of operations. Moreover, Learning Automata (LA) are minuscule but effective decision making entities which are best suited to act as a control…
▽ More
Particle Swarm Optimization (PSO) is an Evolutionary Algorithm (EA) that utilizes a swarm of particles to solve an optimization problem. Slow Intelligence System (SIS) is a learning framework which slowly learns the solution to a problem performing a series of operations. Moreover, Learning Automata (LA) are minuscule but effective decision making entities which are best suited to act as a controller component. In this paper, we combine two isolate populations of PSO to forge the Adaptive Intelligence Optimizer (AIO) which harnesses the advantages of a bi-population PSO to escape from the local minimum and avoid premature convergence. Furthermore, using the rich framework of SIS and the nifty control theory that LA derived from, we find the perfect matching between SIS and LA where acting slowly is the pillar of both of them. Both SIS and LA need time to converge to the optimal decision where this enables AIO to outperform standard PSO having an incomparable performance on evolutionary optimization benchmark functions.
△ Less
Submitted 2 April, 2018;
originally announced April 2018.
-
Fast and accurate classification of echocardiograms using deep learning
Authors:
Ali Madani,
Ramy Arnaout,
Mohammad Mofrad,
Rima Arnaout
Abstract:
Echocardiography is essential to modern cardiology. However, human interpretation limits high throughput analysis, limiting echocardiography from reaching its full clinical and research potential for precision medicine. Deep learning is a cutting-edge machine-learning technique that has been useful in analyzing medical images but has not yet been widely applied to echocardiography, partly due to t…
▽ More
Echocardiography is essential to modern cardiology. However, human interpretation limits high throughput analysis, limiting echocardiography from reaching its full clinical and research potential for precision medicine. Deep learning is a cutting-edge machine-learning technique that has been useful in analyzing medical images but has not yet been widely applied to echocardiography, partly due to the complexity of echocardiograms' multi view, multi modality format. The essential first step toward comprehensive computer assisted echocardiographic interpretation is determining whether computers can learn to recognize standard views. To this end, we anonymized 834,267 transthoracic echocardiogram (TTE) images from 267 patients (20 to 96 years, 51 percent female, 26 percent obese) seen between 2000 and 2017 and labeled them according to standard views. Images covered a range of real world clinical variation. We built a multilayer convolutional neural network and used supervised learning to simultaneously classify 15 standard views. Eighty percent of data used was randomly chosen for training and 20 percent reserved for validation and testing on never seen echocardiograms. Using multiple images from each clip, the model classified among 12 video views with 97.8 percent overall test accuracy without overfitting. Even on single low resolution images, test accuracy among 15 views was 91.7 percent versus 70.2 to 83.5 percent for board-certified echocardiographers. Confusional matrices, occlusion experiments, and saliency map** showed that the model finds recognizable similarities among related views and classifies using clinically relevant image features. In conclusion, deep neural networks can classify essential echocardiographic views simultaneously and with high accuracy. Our results provide a foundation for more complex deep learning assisted echocardiographic interpretation.
△ Less
Submitted 26 June, 2017;
originally announced June 2017.
-
Leveraging Intel SGX to Create a Nondisclosure Cryptographic library
Authors:
Mohammad Hasanzadeh Mofrad,
Adam Lee
Abstract:
Enforcing integrity and confidentiality of users' application code and data is a challenging mission that any software developer working on an online production grade service is facing. Since cryptology is not a widely understood subject, people on the cutting edge of research and industry are always seeking for new technologies to naturally expand the security of their programs and systems. Intel…
▽ More
Enforcing integrity and confidentiality of users' application code and data is a challenging mission that any software developer working on an online production grade service is facing. Since cryptology is not a widely understood subject, people on the cutting edge of research and industry are always seeking for new technologies to naturally expand the security of their programs and systems. Intel Software Guard Extension (Intel SGX) is an Intel technology for developers who are looking to protect their software binaries from plausible attacks using hardware instructions. The Intel SGX puts sensitive code and data into CPU-hardened protected regions called enclaves. In this project we leverage the Intel SGX to produce a secure cryptographic library which keeps the generated keys inside an enclave restricting use and dissemination of confidential cryptographic keys. Using enclaves to store the keys we maintain a small Trusted Computing Base (TCB) where we also perform computation on temporary buffers to and from untrusted application code. As a proof of concept, we implemented hashes and symmetric encryption algorithms inside the enclave where we stored hashes, Initialization Vectors (IVs) and random keys and open sourced the code (https://github.com/hmofrad/CryptoEnclave).
△ Less
Submitted 2 April, 2018; v1 submitted 12 May, 2017;
originally announced May 2017.
-
Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance
Authors:
Ehsaneddin Asgari,
Mohammad R. K. Mofrad
Abstract:
We introduce a new measure of distance between languages based on word embedding, called word embedding language divergence (WELD). WELD is defined as divergence between unified similarity distribution of words between languages. Using such a measure, we perform language comparison for fifty natural languages and twelve genetic languages. Our natural language dataset is a collection of sentence-al…
▽ More
We introduce a new measure of distance between languages based on word embedding, called word embedding language divergence (WELD). WELD is defined as divergence between unified similarity distribution of words between languages. Using such a measure, we perform language comparison for fifty natural languages and twelve genetic languages. Our natural language dataset is a collection of sentence-aligned parallel corpora from bible translations for fifty languages spanning a variety of language families. Although we use parallel corpora, which guarantees having the same content in all languages, interestingly in many cases languages within the same family cluster together. In addition to natural languages, we perform language comparison for the coding regions in the genomes of 12 different organisms (4 plants, 6 animals, and two human subjects). Our result confirms a significant high-level difference in the genetic language model of humans/animals versus plants. The proposed method is a step toward defining a quantitative measure of similarity between languages, with applications in languages classification, genre identification, dialect identification, and evaluation of translations.
△ Less
Submitted 28 April, 2016;
originally announced April 2016.
-
A New Approach for Scalable Analysis of Microbial Communities
Authors:
Ehsaneddin Asgari,
Kiavash Garakani,
Mohammad R. K Mofrad
Abstract:
Microbial communities play important roles in the function and maintenance of various biosystems, ranging from human body to the environment. Current methods for analysis of microbial communities are typically based on taxonomic phylogenetic alignment using 16S rRNA metagenomic or Whole Genome Sequencing data. In typical characterizations of microbial communities, studies deal with billions of mic…
▽ More
Microbial communities play important roles in the function and maintenance of various biosystems, ranging from human body to the environment. Current methods for analysis of microbial communities are typically based on taxonomic phylogenetic alignment using 16S rRNA metagenomic or Whole Genome Sequencing data. In typical characterizations of microbial communities, studies deal with billions of micobial sequences, aligning them to a phylogenetic tree. We introduce a new approach for the efficient analysis of microbial communities. Our new reference-free analysis tech- nique is based on n-gram sequence analysis of 16S rRNA data and reduces the processing data size dramatically (by 105 fold), without requiring taxonomic alignment. The proposed approach is applied to characterize phenotypic microbial community differ- ences in different settings. Specifically, we applied this approach in classification of microbial com- munities across different body sites, characterization of oral microbiomes associated with healthy and diseased individuals, and classification of microbial communities longitudinally during the develop- ment of infants. Different dimensionality reduction methods are introduced that offer a more scalable analysis framework, while minimizing the loss in classification accuracies. Among dimensionality re- duction techniques, we propose a continuous vector representation for microbial communities, which can widely be used for deep learning applications in microbial informatics.
△ Less
Submitted 1 December, 2015;
originally announced December 2015.
-
ProtVec: A Continuous Distributed Representation of Biological Sequences
Authors:
Ehsaneddin Asgari,
Mohammad R. K. Mofrad
Abstract:
We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we…
▽ More
We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%+-0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB) with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined.
△ Less
Submitted 26 May, 2016; v1 submitted 17 March, 2015;
originally announced March 2015.