Skip to main content

Showing 1–19 of 19 results for author: Thost, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.08147  [pdf, other

    cs.LG q-bio.BM

    Representing Molecules as Random Walks Over Interpretable Grammars

    Authors: Michael Sun, Minghao Guo, Weize Yuan, Veronika Thost, Crystal Elaine Owens, Aristotle Franklin Grosz, Sharvaa Selvan, Katelyn Zhou, Hassan Mohiuddin, Benjamin J Pedretti, Zachary P Smith, Jie Chen, Wojciech Matusik

    Abstract: Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representin… ▽ More

    Submitted 2 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  2. arXiv:2311.17327  [pdf, other

    cs.LG

    Improving Self-supervised Molecular Representation Learning using Persistent Homology

    Authors: Yuankai Luo, Lei Shi, Veronika Thost

    Abstract: Self-supervised learning (SSL) has great potential for molecular representation learning given the complexity of molecular graphs, the large amounts of unlabelled data available, the considerable cost of obtaining labels experimentally, and the hence often only small training datasets. The importance of the topic is reflected in the variety of paradigms and architectures that have been investigate… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023

  3. arXiv:2309.01788  [pdf, other

    cs.LG q-bio.QM

    Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

    Authors: Minghao Guo, Veronika Thost, Samuel W Song, Adithya Balachandran, Payel Das, Jie Chen, Wojciech Matusik

    Abstract: The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, w… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 22 pages, 10 figures; ICML 2023

  4. arXiv:2210.13148  [pdf, other

    cs.LG cs.AI

    Transformers over Directed Acyclic Graphs

    Authors: Yuankai Luo, Veronika Thost, Lei Shi

    Abstract: Transformer models have recently gained popularity in graph representation learning as they have the potential to learn complex relationships beyond the ones captured by regular graph neural networks. The main research question is how to inject the structural bias of graphs into the transformer architecture, and several proposals have been made for undirected molecular graphs and, recently, also f… ▽ More

    Submitted 30 October, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2023

  5. arXiv:2203.08031  [pdf, other

    cs.LG q-bio.BM

    Data-Efficient Graph Grammar Learning for Molecular Generation

    Authors: Minghao Guo, Veronika Thost, Beichen Li, Payel Das, Jie Chen, Wojciech Matusik

    Abstract: The problem of molecular generation has received significant attention recently. Existing methods are typically based on deep neural networks and require training on large datasets with tens of thousands of samples. In practice, however, the size of class-specific chemical datasets is usually limited (e.g., dozens of samples) due to labor-intensive experimentation and data collection. This present… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: ICLR 2022 oral

  6. arXiv:2109.03341  [pdf

    cs.AI cs.SE

    Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

    Authors: Yufan Zhuang, Sahil Suneja, Veronika Thost, Giacomo Domeniconi, Alessandro Morari, Jim Laredo

    Abstract: Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel gra… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

    Comments: Submitted June 2020

  7. arXiv:2107.04894  [pdf, other

    cs.LG

    Improving Inductive Link Prediction Using Hyper-Relational Facts

    Authors: Mehdi Ali, Max Berrendorf, Mikhail Galkin, Veronika Thost, Tengfei Ma, Volker Tresp, Jens Lehmann

    Abstract: For many years, link prediction on knowledge graphs (KGs) has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based \glspl{kg}, whereas their richer counterparts, hyper-relatio… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

  8. arXiv:2105.13975  [pdf, other

    cs.LG cs.AI stat.ML

    Relation Matters in Sampling: A Scalable Multi-Relational Graph Neural Network for Drug-Drug Interaction Prediction

    Authors: Arthur Feeney, Rishabh Gupta, Veronika Thost, Rico Angell, Gayathri Chandu, Yash Adhikari, Tengfei Ma

    Abstract: Sampling is an established technique to scale graph neural networks to large graphs. Current approaches however assume the graphs to be homogeneous in terms of relations and ignore relation types, critically important in biomedical graphs. Multi-relational graphs contain various types of relations that usually come with variable frequency and have different importance for the problem at hand. We p… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  9. arXiv:2105.12655  [pdf, other

    cs.SE cs.AI

    CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

    Authors: Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, Frederick Reiss

    Abstract: Over the last several decades, software has been woven into the fabric of every aspect of our society. As software development surges and code infrastructure of enterprise applications ages, it is now more critical than ever to increase software development productivity and modernize legacy applications. Advances in deep learning and machine learning algorithms have enabled numerous breakthroughs,… ▽ More

    Submitted 29 August, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

    Comments: 22 pages including references

  10. arXiv:2101.07965  [pdf, other

    cs.LG cs.AI

    Directed Acyclic Graph Neural Networks

    Authors: Veronika Thost, Jie Chen

    Abstract: Graph-structured data ubiquitously appears in science and engineering. Graph neural networks (GNNs) are designed to exploit the relational inductive bias exhibited in graphs; they have been shown to outperform other forms of neural networks in scenarios where structure information supplements node features. The most common GNN architecture aggregates information from neighborhoods based on message… ▽ More

    Submitted 2 February, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

    Comments: ICLR 2021. Code is available at https://github.com/vthost/DAGNN

  11. arXiv:2006.12641  [pdf, ps, other

    cs.CL cs.LG cs.PL

    Exploring Software Naturalness through Neural Language Models

    Authors: Luca Buratti, Saurabh Pujar, Mihaela Bornea, Scott McCarley, Yunhui Zheng, Gaetano Rossiello, Alessandro Morari, Jim Laredo, Veronika Thost, Yufan Zhuang, Giacomo Domeniconi

    Abstract: The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks. Present approaches to code analysis depend heavily on features derived from the Abstract Syntax Tree (AST) while our trans… ▽ More

    Submitted 24 June, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

  12. arXiv:2003.09508  [pdf, ps, other

    cs.LO

    Temporal Conjunctive Query Answering in the Extended DL-Lite Family

    Authors: Stefan Borgwardt, Veronika Thost

    Abstract: Ontology-based query answering (OBQA) augments classical query answering in databases by domain knowledge encoded in an ontology. Systems for OBQA use the ontological knowledge to infer new information that is not explicitly given in the data. Moreover, they usually employ the open-world assumption, which means that knowledge that is not stated explicitly in the data and that is not inferred is no… ▽ More

    Submitted 20 March, 2020; originally announced March 2020.

  13. arXiv:2002.00423  [pdf, other

    cs.AI cs.LG cs.LO

    An Experimental Study of Formula Embeddings for Automated Theorem Proving in First-Order Logic

    Authors: Ibrahim Abdelaziz, Veronika Thost, Maxwell Crouse, Achille Fokoue

    Abstract: Automated theorem proving in first-order logic is an active research area which is successfully supported by machine learning. While there have been various proposals for encoding logical formulas into numerical vectors -- from simple strings to more involved graph-based embeddings -- little is known about how these different encodings compare. In this paper, we study and experimentally compare pa… ▽ More

    Submitted 15 March, 2020; v1 submitted 2 February, 2020; originally announced February 2020.

    Comments: 7 pages, preprint, under review

  14. arXiv:1911.06904  [pdf, other

    cs.AI cs.LG cs.LO cs.SC

    Improving Graph Neural Network Representations of Logical Formulae with Subgraph Pooling

    Authors: Maxwell Crouse, Ibrahim Abdelaziz, Cristina Cornelio, Veronika Thost, Lingfei Wu, Kenneth Forbus, Achille Fokoue

    Abstract: Recent advances in the integration of deep learning with automated theorem proving have centered around the representation of logical formulae as inputs to deep learning systems. In particular, there has been a growing interest in adapting structure-aware neural methods to work with the underlying graph representations of logical expressions. While more effective than character and token-level app… ▽ More

    Submitted 5 June, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

  15. arXiv:1911.02065  [pdf, other

    cs.AI cs.LG cs.LO

    A Deep Reinforcement Learning Approach to First-Order Logic Theorem Proving

    Authors: Maxwell Crouse, Ibrahim Abdelaziz, Bassem Makni, Spencer Whitehead, Cristina Cornelio, Pavan Kapanipathi, Kavitha Srinivas, Veronika Thost, Michael Witbrock, Achille Fokoue

    Abstract: Automated theorem provers have traditionally relied on manually tuned heuristics to guide how they perform proof search. Deep reinforcement learning has been proposed as a way to obviate the need for such heuristics, however, its deployment in automated theorem proving remains a challenge. In this paper we introduce TRAIL, a system that applies deep reinforcement learning to saturation-based theor… ▽ More

    Submitted 15 September, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

  16. arXiv:1911.02060  [pdf, other

    cs.CL cs.AI

    Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

    Authors: Pavan Kapanipathi, Veronika Thost, Siva Sankalp Patel, Spencer Whitehead, Ibrahim Abdelaziz, Avinash Balakrishnan, Maria Chang, Kshitij Fadnis, Chulaka Gunasekara, Bassem Makni, Nicholas Mattei, Kartik Talamadupula, Achille Fokoue

    Abstract: Textual entailment is a fundamental task in natural language processing. Most approaches for solving the problem use only the textual content present in training data. A few approaches have shown that information from external knowledge sources like knowledge graphs (KGs) can add value, in addition to the textual content, by providing background knowledge that may be critical for a task. However,… ▽ More

    Submitted 21 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

  17. arXiv:1909.07095  [pdf, other

    cs.AI cs.LO

    RuDaS: Synthetic Datasets for Rule Learning and Evaluation Tools

    Authors: Cristina Cornelio, Veronika Thost

    Abstract: Logical rules are a popular knowledge representation language in many domains, representing background knowledge and encoding information that can be derived from given facts in a compact form. However, rule formulation is a complex process that requires deep domain expertise,and is further challenged by today's often large, heterogeneous, and incomplete knowledge graphs. Several approaches for le… ▽ More

    Submitted 12 February, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

  18. arXiv:1808.02055  [pdf, ps, other

    cs.LO

    Metric Temporal Extensions of DL-Lite and Interval-Rigid Names

    Authors: Veronika Thost

    Abstract: The DL-Lite description logics allow for modeling domain knowledge on top of databases and for efficient reasoning. We focus on metric temporal extensions of DL-Lite_bool and its fragments, and study the complexity of satisfiability. In particular, we investigate the influence of rigid and interval-rigid symbols, which allow for modeling knowledge that remains valid over (some) time. We show that… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

  19. arXiv:1808.01877  [pdf, ps, other

    cs.LO

    Query Answering for Rough EL Ontologies (Extended Technical Report)

    Authors: Rafael PeƱaloza, Veronika Thost, Anni-Yasmin Turhan

    Abstract: Querying large datasets with incomplete and vague data is still a challenge. Ontology-based query answering extends standard database query answering by background knowledge from an ontology to augment incomplete data. We focus on ontologies written in rough description logics (DLs), which allow to represent vague knowledge by partitioning the domain of discourse into classes of indiscernible elem… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

    Comments: Extended version of a paper accepted at KR 2018