-
SAF: Smart Aggregation Framework for Revealing Atoms Importance Rank and Improving Prediction Rates in Drug Discovery
Authors:
Ronen Taub,
Yonatan Savir
Abstract:
Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening a large chemical space in silico. A successful approach for representing molecules is to treat them as a graph and utilize graph neural networks. One of the key limitations of such methods is the necessity to represent compounds with different numbers of atoms, which requires ag…
▽ More
Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening a large chemical space in silico. A successful approach for representing molecules is to treat them as a graph and utilize graph neural networks. One of the key limitations of such methods is the necessity to represent compounds with different numbers of atoms, which requires aggregating the atom's information. Common aggregation operators, such as averaging, result in loss of information at the atom level. In this work, we propose a novel aggregating approach where each atom is weighted non-linearly using the Boltzmann distribution with a hyperparameter analogous to temperature. We show that using this weighted aggregation improves the ability of the gold standard message-passing neural network to predict antibiotic activity. Moreover, by changing the temperature hyperparameter, our approach can reveal the atoms that are important for activity prediction in a smooth and consistent way, thus providing a novel, regulated attention mechanism for graph neural networks. We further validate our method by showing that it recapitulates the functional group in beta-Lactam antibiotics. The ability of our approach to rank the atoms' importance for a desired function can be used within any graph neural network to provide interpretability of the results and predictions at the node level.
△ Less
Submitted 12 September, 2023;
originally announced October 2023.
-
Harnessing Digital Pathology And Causal Learning To Improve Eosinophilic Esophagitis Dietary Treatment Assignment
Authors:
Eliel Aknin,
Ariel Larey,
Julie M. Caldwell,
Margaret H. Collins,
Juan P. Abonia,
Seema S. Aceves,
Nicoleta C. Arva,
Mirna Chehade,
Evan S. Dellon,
Nirmala Gonsalves,
Sandeep K. Gupta,
John Leung,
Kathryn A. Peterson,
Tetsuo Shoda,
Jonathan M. Spergel,
Marc E. Rothenberg,
Yonatan Savir
Abstract:
Eosinophilic esophagitis (EoE) is a chronic, food antigen-driven, allergic inflammatory condition of the esophagus associated with elevated esophageal eosinophils. EoE is a top cause of chronic dysphagia after GERD. Diagnosis of EoE relies on counting eosinophils in histological slides, a manual and time-consuming task that limits the ability to extract complex patient-dependent features. The trea…
▽ More
Eosinophilic esophagitis (EoE) is a chronic, food antigen-driven, allergic inflammatory condition of the esophagus associated with elevated esophageal eosinophils. EoE is a top cause of chronic dysphagia after GERD. Diagnosis of EoE relies on counting eosinophils in histological slides, a manual and time-consuming task that limits the ability to extract complex patient-dependent features. The treatment of EoE includes medication and food elimination. A personalized food elimination plan is crucial for engagement and efficiency, but previous attempts failed to produce significant results. In this work, on the one hand, we utilize AI for inferring histological features from the entire biopsy slide, features that cannot be extracted manually. On the other hand, we develop causal learning models that can process this wealth of data. We applied our approach to the 'Six-Food vs. One-Food Eosinophilic Esophagitis Diet Study', where 112 symptomatic adults aged 18-60 years with active EoE were assigned to either a six-food elimination diet (6FED) or a one-food elimination diet (1FED) for six weeks. Our results show that the average treatment effect (ATE) of the 6FED treatment compared with the 1FED treatment is not significant, that is, neither diet was superior to the other. We examined several causal models and show that the best treatment strategy was obtained using T-learner with two XGBoost modules. While 1FED only and 6FED only provide improvement for 35%-38% of the patients, which is not significantly different from a random treatment assignment, our causal model yields a significantly better improvement rate of 58.4%. This study illustrates the significance of AI in enhancing treatment planning by analyzing molecular features' distribution in histological slides through causal learning. Our approach can be harnessed for other conditions that rely on histology for diagnosis and treatment.
△ Less
Submitted 16 April, 2023;
originally announced April 2023.
-
Symbiotic Message Passing Model for Transfer Learning between Anti-Fungal and Anti-Bacterial Domains
Authors:
Ronen Taub,
Tanya Wasserman,
Yonatan Savir
Abstract:
Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening billions of compounds. For example, a successful approach is representing the molecules as a graph and utilizing graph neural networks (GNN). Yet, these approaches still require experimental measurements of thousands of compounds to construct a proper training set. While in some…
▽ More
Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening billions of compounds. For example, a successful approach is representing the molecules as a graph and utilizing graph neural networks (GNN). Yet, these approaches still require experimental measurements of thousands of compounds to construct a proper training set. While in some domains it is easier to acquire experimental data, in others it might be more limited. For example, it is easier to test the compounds on bacteria than perform in-vivo experiments. Thus, a key question is how to utilize information from a large available dataset together with a small subset of compounds where both domains are measured to predict compounds' effect on the second, experimentally less available domain. Current transfer learning approaches for drug discovery, including training of pre-trained modules or meta-learning, have limited success. In this work, we develop a novel method, named Symbiotic Message Passing Neural Network (SMPNN), for merging graph-neural-network models from different domains. Using routing new message passing lanes between them, our approach resolves some of the potential conflicts between the different domains, and implicit constraints induced by the larger datasets. By collecting public data and performing additional high-throughput experiments, we demonstrate the advantage of our approach by predicting anti-fungal activity from anti-bacterial activity. We compare our method to the standard transfer learning approach and show that SMPNN provided better and less variable performances. Our approach is general and can be used to facilitate information transfer between any two domains such as different organisms, different organelles, or different environments.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Between Generating Noise and Generating Images: Noise in the Correct Frequency Improves the Quality of Synthetic Histopathology Images for Digital Pathology
Authors:
Nati Daniel,
Eliel Aknin,
Ariel Larey,
Yoni Peretz,
Guy Sela,
Yael Fisher,
Yonatan Savir
Abstract:
Artificial intelligence and machine learning techniques have the promise to revolutionize the field of digital pathology. However, these models demand considerable amounts of data, while the availability of unbiased training data is limited. Synthetic images can augment existing datasets, to improve and validate AI algorithms. Yet, controlling the exact distribution of cellular features within the…
▽ More
Artificial intelligence and machine learning techniques have the promise to revolutionize the field of digital pathology. However, these models demand considerable amounts of data, while the availability of unbiased training data is limited. Synthetic images can augment existing datasets, to improve and validate AI algorithms. Yet, controlling the exact distribution of cellular features within them is still challenging. One of the solutions is harnessing conditional generative adversarial networks that take a semantic mask as an input rather than a random noise. Unlike other domains, outlining the exact cellular structure of tissues is hard, and most of the input masks depict regions of cell types. However, using polygon-based masks introduce inherent artifacts within the synthetic images - due to the mismatch between the polygon size and the single-cell size. In this work, we show that introducing random single-pixel noise with the appropriate spatial frequency into a polygon semantic mask can dramatically improve the quality of the synthetic images. We used our platform to generate synthetic images of immunohistochemistry-treated lung biopsies. We test the quality of the images using a three-fold validation procedure. First, we show that adding the appropriate noise frequency yields 87% of the similarity metrics improvement that is obtained by adding the actual single-cell features. Second, we show that the synthetic images pass the Turing test. Finally, we show that adding these synthetic images to the train set improves AI performance in terms of PD-L1 semantic segmentation performances. Our work suggests a simple and powerful approach for generating synthetic data on demand to unbias limited datasets to improve the algorithms' accuracy and validate their robustness.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
DEPAS: De-novo Pathology Semantic Masks using a Generative Model
Authors:
Ariel Larey,
Nati Daniel,
Eliel Aknin,
Yael Fisher,
Yonatan Savir
Abstract:
The integration of artificial intelligence into digital pathology has the potential to automate and improve various tasks, such as image analysis and diagnostic decision-making. Yet, the inherent variability of tissues, together with the need for image labeling, lead to biased datasets that limit the generalizability of algorithms trained on them. One of the emerging solutions for this challenge i…
▽ More
The integration of artificial intelligence into digital pathology has the potential to automate and improve various tasks, such as image analysis and diagnostic decision-making. Yet, the inherent variability of tissues, together with the need for image labeling, lead to biased datasets that limit the generalizability of algorithms trained on them. One of the emerging solutions for this challenge is synthetic histological images. However, debiasing real datasets require not only generating photorealistic images but also the ability to control the features within them. A common approach is to use generative methods that perform image translation between semantic masks that reflect prior knowledge of the tissue and a histological image. However, unlike other image domains, the complex structure of the tissue prevents a simple creation of histology semantic masks that are required as input to the image translation model, while semantic masks extracted from real images reduce the process's scalability. In this work, we introduce a scalable generative model, coined as DEPAS, that captures tissue structure and generates high-resolution semantic masks with state-of-the-art quality. We demonstrate the ability of DEPAS to generate realistic semantic maps of tissue for three types of organs: skin, prostate, and lung. Moreover, we show that these masks can be processed using a generative image translation model to produce photorealistic histology images of two types of cancer with two different types of staining techniques. Finally, we harness DEPAS to generate multi-label semantic masks that capture different cell types distributions and use them to produce histological images with on-demand cellular features. Overall, our work provides a state-of-the-art solution for the challenging task of generating synthetic histological images while controlling their semantic information in a scalable way.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Harnessing Artificial Intelligence to Infer Novel Spatial Biomarkers for the Diagnosis of Eosinophilic Esophagitis
Authors:
Ariel Larey,
Eliel Aknin,
Nati Daniel,
Garrett A. Osswald,
Julie M. Caldwell,
Mark Rochman,
Tanya Wasserman,
Margaret H. Collins,
Nicoleta C. Arva,
Guang-Yu Yang,
Marc E. Rothenberg,
Yonatan Savir
Abstract:
Eosinophilic esophagitis (EoE) is a chronic allergic inflammatory condition of the esophagus associated with elevated esophageal eosinophils. Second only to gastroesophageal reflux disease, EoE is one of the leading causes of chronic refractory dysphagia in adults and children. EoE diagnosis requires enumerating the density of esophageal eosinophils in esophageal biopsies, a somewhat subjective ta…
▽ More
Eosinophilic esophagitis (EoE) is a chronic allergic inflammatory condition of the esophagus associated with elevated esophageal eosinophils. Second only to gastroesophageal reflux disease, EoE is one of the leading causes of chronic refractory dysphagia in adults and children. EoE diagnosis requires enumerating the density of esophageal eosinophils in esophageal biopsies, a somewhat subjective task that is time-consuming, thus reducing the ability to process the complex tissue structure. Previous artificial intelligence (AI) approaches that aimed to improve histology-based diagnosis focused on recapitulating identification and quantification of the area of maximal eosinophil density. However, this metric does not account for the distribution of eosinophils or other histological features, over the whole slide image. Here, we developed an artificial intelligence platform that infers local and spatial biomarkers based on semantic segmentation of intact eosinophils and basal zone distributions. Besides the maximal density of eosinophils (referred to as Peak Eosinophil Count [PEC]) and a maximal basal zone fraction, we identify two additional metrics that reflect the distribution of eosinophils and basal zone fractions. This approach enables a decision support system that predicts EoE activity and classifies the histological severity of EoE patients. We utilized a cohort that includes 1066 biopsy slides from 400 subjects to validate the system's performance and achieved a histological severity classification accuracy of 86.70%, sensitivity of 84.50%, and specificity of 90.09%. Our approach highlights the importance of systematically analyzing the distribution of biopsy features over the entire slide and paves the way towards a personalized decision support system that will assist not only in counting cells but can also potentially improve diagnosis and provide treatment prediction.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
PECNet: A Deep Multi-Label Segmentation Network for Eosinophilic Esophagitis Biopsy Diagnostics
Authors:
Nati Daniel,
Ariel Larey,
Eliel Aknin,
Garrett A. Osswald,
Julie M. Caldwell,
Mark Rochman,
Margaret H. Collins,
Guang-Yu Yang,
Nicoleta C. Arva,
Kelley E. Capocelli,
Marc E. Rothenberg,
Yonatan Savir
Abstract:
Background. Eosinophilic esophagitis (EoE) is an allergic inflammatory condition of the esophagus associated with elevated numbers of eosinophils. Disease diagnosis and monitoring requires determining the concentration of eosinophils in esophageal biopsies, a time-consuming, tedious and somewhat subjective task currently performed by pathologists. Methods. Herein, we aimed to use machine learning…
▽ More
Background. Eosinophilic esophagitis (EoE) is an allergic inflammatory condition of the esophagus associated with elevated numbers of eosinophils. Disease diagnosis and monitoring requires determining the concentration of eosinophils in esophageal biopsies, a time-consuming, tedious and somewhat subjective task currently performed by pathologists. Methods. Herein, we aimed to use machine learning to identify, quantitate and diagnose EoE. We labeled more than 100M pixels of 4345 images obtained by scanning whole slides of H&E-stained sections of esophageal biopsies derived from 23 EoE patients. We used this dataset to train a multi-label segmentation deep network. To validate the network, we examined a replication cohort of 1089 whole slide images from 419 patients derived from multiple institutions. Findings. PECNet segmented both intact and not-intact eosinophils with a mean intersection over union (mIoU) of 0.93. This segmentation was able to quantitate intact eosinophils with a mean absolute error of 0.611 eosinophils and classify EoE disease activity with an accuracy of 98.5%. Using whole slide images from the validation cohort, PECNet achieved an accuracy of 94.8%, sensitivity of 94.3%, and specificity of 95.14% in reporting EoE disease activity. Interpretation. We have developed a deep learning multi-label semantic segmentation network that successfully addresses two of the main challenges in EoE diagnostics and digital pathology, the need to detect several types of small features simultaneously and the ability to analyze whole slides efficiently. Our results pave the way for an automated diagnosis of EoE and can be utilized for other conditions with similar challenges.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Machine learning approach for biopsy-based identification of eosinophilic esophagitis reveals importance of global features
Authors:
Tomer Czyzewski,
Nati Daniel,
Mark Rochman,
Julie M. Caldwell,
Garrett A. Osswald,
Margaret H. Collins,
Marc E. Rothenberg,
Yonatan Savir
Abstract:
Goal: Eosinophilic esophagitis (EoE) is an allergic inflammatory condition characterized by eosinophil accumulation in the esophageal mucosa. EoE diagnosis includes a manual assessment of eosinophil levels in mucosal biopsies - a time-consuming, laborious task that is difficult to standardize. One of the main challenges in automating this process, like many other biopsy-based diagnostics, is detec…
▽ More
Goal: Eosinophilic esophagitis (EoE) is an allergic inflammatory condition characterized by eosinophil accumulation in the esophageal mucosa. EoE diagnosis includes a manual assessment of eosinophil levels in mucosal biopsies - a time-consuming, laborious task that is difficult to standardize. One of the main challenges in automating this process, like many other biopsy-based diagnostics, is detecting features that are small relative to the size of the biopsy. Results: In this work, we utilized hematoxylin- and eosin-stained slides from esophageal biopsies from patients with active EoE and control subjects to develop a platform based on a deep convolutional neural network (DCNN) that can classify esophageal biopsies with an accuracy of 85%, sensitivity of 82.5%, and specificity of 87%. Moreover, by combining several downscaling and crop** strategies, we show that some of the features contributing to the correct classification are global rather than specific, local features. Conclusions: We report the ability of artificial intelligence to identify EoE using computer vision analysis of esophageal biopsy slides. Further, the DCNN features associated with EoE are based on not only local eosinophils but also global histologic changes. Our approach can be used for other conditions that rely on biopsy-based histologic diagnostics.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Optimal Design of a Molecular Recognizer: Molecular Recognition as a Bayesian Signal Detection Problem
Authors:
Yonatan Savir,
Tsvi Tlusty
Abstract:
Numerous biological functions-such as enzymatic catalysis, the immune response system, and the DNA-protein regulatory network-rely on the ability of molecules to specifically recognize target molecules within a large pool of similar competitors in a noisy biochemical environment. Using the basic framework of signal detection theory, we treat the molecular recognition process as a signal detection…
▽ More
Numerous biological functions-such as enzymatic catalysis, the immune response system, and the DNA-protein regulatory network-rely on the ability of molecules to specifically recognize target molecules within a large pool of similar competitors in a noisy biochemical environment. Using the basic framework of signal detection theory, we treat the molecular recognition process as a signal detection problem and examine its overall performance. Thus, we evaluate the optimal properties of a molecular recognizer in the presence of competition and noise. Our analysis reveals that the optimal design undergoes a "phase transition" as the structural properties of the molecules and interaction energies between them vary. In one phase, the recognizer should be complementary in structure to its target (like a lock and a key), while in the other, conformational changes upon binding, which often accompany molecular recognition, enhance recognition quality. Using this framework, the abundance of conformational changes may be explained as a result of increasing the fitness of the recognizer. Furthermore, this analysis may be used in future design of artificial signal processing devices based on biomolecules.
△ Less
Submitted 26 July, 2010;
originally announced July 2010.
-
Molecular Recognition as an Information Channel: The Role of Conformational Changes
Authors:
Yonatan Savir,
Tsvi Tlusty
Abstract:
Molecular recognition, which is essential in processing information in biological systems, takes place in a crowded noisy biochemical environment and requires the recognition of a specific target within a background of various similar competing molecules. We consider molecular recognition as a transmission of information via a noisy channel and use this analogy to gain insights on the optimal, or…
▽ More
Molecular recognition, which is essential in processing information in biological systems, takes place in a crowded noisy biochemical environment and requires the recognition of a specific target within a background of various similar competing molecules. We consider molecular recognition as a transmission of information via a noisy channel and use this analogy to gain insights on the optimal, or fittest, molecular recognizer. We focus on the optimal structural properties of the molecules such as flexibility and conformation. We show that conformational changes upon binding, which often occur during molecular recognition, may optimize the detection performance of the recognizer. We thus suggest a generic design principle termed 'conformational proofreading' in which deformation enhances detection. We evaluate the optimal flexibility of the molecular recognizer, which is analogous to the stochasticity in a decision unit. In some scenarios, a flexible recognizer, i.e., a stochastic decision unit, performs better than a rigid, deterministic one. As a biological example, we discuss conformational changes during homologous recombination, the process of genetic exchange between two DNA strands.
△ Less
Submitted 26 July, 2010;
originally announced July 2010.