Skip to main content

Showing 1–12 of 12 results for author: Tropsha, A

.
  1. arXiv:2406.01825  [pdf, other

    cs.LG cs.AI

    EMOE: Expansive Matching of Experts for Robust Uncertainty Based Rejection

    Authors: Yunni Qu, James Wellnitz, Alexander Tropsha, Junier Oliva

    Abstract: Expansive Matching of Experts (EMOE) is a novel method that utilizes support-expanding, extrapolatory pseudo-labeling to improve prediction and uncertainty based rejection on out-of-distribution (OOD) points. We propose an expansive data augmentation technique that generates OOD instances in a latent space, and an empirical trial based approach to filter out augmented expansive points for pseudo-l… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  2. arXiv:2403.10478  [pdf, other

    q-bio.QM

    An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models

    Authors: Michael Brocidiacono, Konstantin I. Popov, Alexander Tropsha

    Abstract: Structure-based virtual screening (SBVS) is a key workflow in computational drug discovery. SBVS models are assessed by measuring the enrichment of known active molecules over decoys in retrospective screens. However, the standard formula for enrichment cannot estimate model performance on very large libraries. Additionally, current screening benchmarks cannot easily be used with machine learning… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 10 pages, 4 figures, and 4 tables. The source code is available at https://github.com/molecularmodelinglab/bigbind

  3. arXiv:2402.07970  [pdf, other

    cs.IR cs.LG

    Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search

    Authors: Kathryn E. Kirchoff, James Wellnitz, Joshua E. Hochuli, Travis Maxfield, Konstantin I. Popov, Shawn Gomez, Alexander Tropsha

    Abstract: Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  4. arXiv:2310.02744  [pdf, other

    cs.LG

    SALSA: Semantically-Aware Latent Space Autoencoder

    Authors: Kathryn E. Kirchoff, Travis Maxfield, Alexander Tropsha, Shawn M. Gomez

    Abstract: In deep learning for drug discovery, chemical data are often represented as simplified molecular-input line-entry system (SMILES) sequences which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecular representations th… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  5. arXiv:2308.06347  [pdf, other

    stat.ME q-bio.BM

    The N-ary in the Coal Mine: Avoiding Mixture Model Failure with Proper Validation

    Authors: Travis Maxfield, Joshua Hochuli, James Wellnitz, Cleber Melo-Filho, Konstantin I. Popov, Eugene Muratov, Alex Tropsha

    Abstract: Modeling the properties of chemical mixtures is a difficult but important part of any modeling process intended to be applicable to the often messy and impure phenomena of everyday life, including food and environmental safety, healthcare, etc. Part of this difficulty stems from the increased complexity of designing suitable model validation schemes for mixture data, a fact which has been elucidat… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: 22 pages, 1 figure

  6. arXiv:2307.12090  [pdf, other

    q-bio.QM

    PLANTAIN: Diffusion-inspired Pose Score Minimization for Fast and Accurate Molecular Docking

    Authors: Michael Brocidiacono, Konstantin I. Popov, David Ryan Koes, Alexander Tropsha

    Abstract: Molecular docking aims to predict the 3D pose of a small molecule in a protein binding site. Traditional docking methods predict ligand poses by minimizing a physics-inspired scoring function. Recently, a diffusion model has been proposed that iteratively refines a ligand pose. We combine these two approaches by training a pose scoring function in a diffusion-inspired manner. In our method, PLANTA… ▽ More

    Submitted 25 July, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

    Comments: Camera-ready submission to ICML CompBio workshop. 5 pages and 1 figure

  7. arXiv:2011.07959  [pdf

    cs.IR cs.CL cs.LG

    Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets

    Authors: Rahul Yedida, Saad Mohammad Abrar, Cleber Melo-Filho, Eugene Muratov, Rada Chirkova, Alexander Tropsha

    Abstract: Objective: We aim to learn potential novel cures for diseases from unstructured text sources. More specifically, we seek to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text. Materials and Methods: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the… ▽ More

    Submitted 22 October, 2020; originally announced November 2020.

    Comments: initial submission

  8. arXiv:1712.00422  [pdf, other

    cond-mat.mtrl-sci

    The AFLOW Fleet for Materials Discovery

    Authors: Cormac Toher, Corey Oses, David Hicks, Eric Gossett, Frisco Rose, Pinku Nath, Demet Usanmaz, Denise C. Ford, Eric Perim, Camilo E. Calderon, Jose J. Plata, Yoav Lederer, Michal Jahnátek, Wahyu Setyawan, Shidong Wang, Junkai Xue, Kevin Rasch, Roman V. Chepulskii, Richard H. Taylor, Geena Gomez, Harvey Shi, Andrew R. Supka, Rabih Al Rahal Al Orabi, Priya Gopal, Frank T. Cerasoli , et al. (26 additional authors not shown)

    Abstract: The traditional paradigm for materials discovery has been recently expanded to incorporate substantial data driven research. With the intent to accelerate the development and the deployment of new technologies, the AFLOW Fleet for computational materials design automates high-throughput first principles calculations, and provides tools for data verification and dissemination for a broad community… ▽ More

    Submitted 1 December, 2017; originally announced December 2017.

    Comments: 14 pages, 8 figures

  9. arXiv:1711.10907  [pdf

    cs.AI cs.LG stat.ML

    Deep Reinforcement Learning for De-Novo Drug Design

    Authors: Mariya Popova, Olexandr Isayev, Alexander Tropsha

    Abstract: We propose a novel computational strategy for de novo design of molecules with desired properties termed ReLeaSE (Reinforcement Learning for Structural Evolution). Based on deep and reinforcement learning approaches, ReLeaSE integrates two deep neural networks - generative and predictive - that are trained separately but employed jointly to generate novel targeted chemical libraries. ReLeaSE emplo… ▽ More

    Submitted 31 May, 2018; v1 submitted 29 November, 2017; originally announced November 2017.

    Journal ref: Science Advances, 2018, vol. 4, no. 7, eaap7885

  10. arXiv:1711.10744  [pdf, other

    cond-mat.mtrl-sci physics.comp-ph

    AFLOW-ML: A RESTful API for machine-learning predictions of materials properties

    Authors: Eric Gossett, Cormac Toher, Corey Oses, Olexandr Isayev, Fleur Legrain, Frisco Rose, Eva Zurek, Jesús Carrete, Natalio Mingo, Alexander Tropsha, Stefano Curtarolo

    Abstract: Machine learning approaches, enabled by the emergence of comprehensive databases of materials properties, are becoming a fruitful direction for materials analysis. As a result, a plethora of models have been constructed and trained on existing data to predict properties of new systems. These powerful methods allow researchers to target studies only at interesting materials $\unicode{x2014}$ neglec… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: 10 pages, 2 figures

  11. arXiv:1608.04782  [pdf, other

    cond-mat.mtrl-sci

    Universal Fragment Descriptors for Predicting Electronic Properties of Inorganic Crystals

    Authors: Olexandr Isayev, Corey Oses, Cormac Toher, Eric Gossett, Stefano Curtarolo, Alexander Tropsha

    Abstract: Historically, materials discovery has been driven by a laborious trial-and-error process. The growth of materials databases and emerging informatics approaches finally offer the opportunity to transform this practice into data- and knowledge-driven rational design. By using data from the AFLOW repository for high-throughput ab-initio calculations, we have generated Quantitative Materials Structure… ▽ More

    Submitted 24 March, 2017; v1 submitted 16 August, 2016; originally announced August 2016.

    Comments: 14 pages, 7 figures

  12. Local kernel canonical correlation analysis with application to virtual drug screening

    Authors: Daniel Samarov, J. S. Marron, Yufeng Liu, Christopher Grulke, Alexander Tropsha

    Abstract: Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biolog… ▽ More

    Submitted 15 February, 2012; originally announced February 2012.

    Comments: Published in at http://dx.doi.org/10.1214/11-AOAS472 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS472

    Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 3, 2169-2196