Skip to main content

Showing 1–23 of 23 results for author: Ester, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10867  [pdf, other

    cs.LG q-bio.BM

    Geometric-informed GFlowNets for Structure-Based Drug Design

    Authors: Grayson Lee, Tony Shen, Martin Ester

    Abstract: The rise of cost involved with drug discovery and current speed of which they are discover, underscore the need for more efficient structure-based drug design (SBDD) methods. We employ Generative Flow Networks (GFlowNets), to effectively explore the vast combinatorial space of drug-like molecules, which traditional virtual screening methods fail to cover. We introduce a novel modification to the G… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at MoML 2024 as Spotlight

  2. arXiv:2405.02842  [pdf, other

    cs.LG

    IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs

    Authors: Yuzhen Mao, Martin Ester, Ke Li

    Abstract: One limitation of existing Transformer-based models is that they cannot handle very long sequences as input since their self-attention operations exhibit quadratic time and space complexity. This problem becomes especially acute when Transformers are deployed on hardware platforms equipped only with CPUs. To address this issue, we propose a novel method for accelerating self-attention at inference… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  3. arXiv:2312.10570  [pdf, other

    cs.LG stat.ME

    Adversarially Balanced Representation for Continuous Treatment Effect Estimation

    Authors: Amirreza Kazemi, Martin Ester

    Abstract: Individual treatment effect (ITE) estimation requires adjusting for the covariate shift between populations with different treatments, and deep representation learning has shown great promise in learning a balanced representation of covariates. However the existing methods mostly consider the scenario of binary treatments. In this paper, we consider the more practical and challenging scenario in w… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  4. arXiv:2310.03223  [pdf, other

    cs.LG

    TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design

    Authors: Tony Shen, Seonghwan Seo, Grayson Lee, Mohit Pandey, Jason R Smith, Artem Cherkasov, Woo Youn Kim, Martin Ester

    Abstract: Searching the vast chemical space for drug-like and synthesizable molecules with high binding affinity to a protein pocket is a challenging task in drug discovery. Recently, molecular deep generative models have been introduced which promise to be more efficient than exhaustive virtual screening, by directly generating molecules based on the protein structure. However, since they learn the distrib… ▽ More

    Submitted 7 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023 AID3 and at NeurIPS 2023 GenBio as Spotlight

    Journal ref: NeurIPS 2023 Generative AI and Biology (GenBio) Workshop

  5. arXiv:2208.05119  [pdf, other

    cs.LG physics.chem-ph

    Semi-Supervised Junction Tree Variational Autoencoder for Molecular Property Prediction

    Authors: Atia Hamidizadeh, Tony Shen, Martin Ester

    Abstract: Molecular Representation Learning is essential to solving many drug discovery and computational chemistry problems. It is a challenging problem due to the complex structure of molecules and the vast chemical space. Graph representations of molecules are more expressive than traditional representations, such as molecular fingerprints. Therefore, they can improve the performance of machine learning… ▽ More

    Submitted 14 January, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

  6. arXiv:2207.07781  [pdf, other

    cs.LG

    Subgroup Discovery in Unstructured Data

    Authors: Ali Arab, Dev Arora, Jialin Lu, Martin Ester

    Abstract: Subgroup discovery is a descriptive and exploratory data mining technique to identify subgroups in a population that exhibit interesting behavior with respect to a variable of interest. Subgroup discovery has numerous applications in knowledge discovery and hypothesis generation, yet it remains inapplicable for unstructured, high-dimensional data such as images. This is because subgroup discovery… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    ACM Class: I.5.0; I.2.6

  7. arXiv:2206.07579  [pdf, ps, other

    cs.LG cs.AI

    A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions

    Authors: Sheng Zhou, Hongjia Xu, Zhuonan Zheng, Jiawei Chen, Zhao li, Jiajun Bu, Jia Wu, Xin Wang, Wenwu Zhu, Martin Ester

    Abstract: Clustering is a fundamental machine learning task which has been widely studied in the literature. Classic clustering methods follow the assumption that data are represented as features in a vectorized form through various representation learning techniques. As the data become increasingly complicated and complex, the shallow (traditional) clustering methods can no longer handle the high-dimension… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: Github Repo: https://github.com/zhoushengisnoob/DeepClustering

  8. arXiv:2205.09281  [pdf, other

    cs.LG stat.ML

    Causal Inference from Small High-dimensional Datasets

    Authors: Raquel Aoki, Martin Ester

    Abstract: Many methods have been proposed to estimate treatment effects with observational data. Often, the choice of the method considers the application's characteristics, such as type of treatment and outcome, confounding effect, and the complexity of the data. These methods implicitly assume that the sample size is large enough to train such models, especially the neural network-based estimators. What i… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Comments: 12 pages, 3 figures

  9. arXiv:2112.07574  [pdf, other

    cs.LG stat.ML

    Multi-treatment Effect Estimation from Biomedical Data

    Authors: Raquel Aoki, Yizhou Chen, Martin Ester

    Abstract: This work proposes the M3E2, a multi-task learning neural network model to estimate the effect of multiple treatments. In contrast to existing methods, M3E2 can handle multiple treatment effects applied simultaneously to the same unit, continuous and binary treatments, and many covariates. We compared M3E2 with three baselines in three synthetic benchmark datasets: two with multiple treatments and… ▽ More

    Submitted 5 January, 2023; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: 4 figures, 10 pages

    Journal ref: Pacific Symposium on Biocomputing, 2023

  10. arXiv:2111.04936  [pdf, other

    cs.LG cs.HC

    An Interactive Visualization Tool for Understanding Active Learning

    Authors: Zihan Wang, Jialin Lu, Oliver Snow, Martin Ester

    Abstract: Despite recent progress in artificial intelligence and machine learning, many state-of-the-art methods suffer from a lack of explainability and transparency. The ability to interpret the predictions made by machine learning models and accurately evaluate these models is crucially important. In this paper, we present an interactive visualization tool to elucidate the training process of active lear… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021 workshop on Human-Centered AI; 7 pages; 5 figures

  11. arXiv:2011.07739  [pdf, other

    cs.IR

    CoSam: An Efficient Collaborative Adaptive Sampler for Recommendation

    Authors: Jiawei Chen, Chengquan Jiang, Can Wang, Sheng Zhou, Yan Feng, Chun Chen, Martin Ester, Xiangnan He

    Abstract: Sampling strategies have been widely applied in many recommendation systems to accelerate model learning from implicit feedback data. A typical strategy is to draw negative instances with uniform distribution, which however will severely affect model's convergency, stability, and even recommendation accuracy. A promising solution for this problem is to over-sample the ``difficult'' (a.k.a informat… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: 21pages, submitting to TOIS

  12. arXiv:2011.00179  [pdf, other

    cs.LG cs.CV stat.ML

    Combining Domain-Specific Meta-Learners in the Parameter Space for Cross-Domain Few-Shot Classification

    Authors: Shuman Peng, Weilian Song, Martin Ester

    Abstract: The goal of few-shot classification is to learn a model that can classify novel classes using only a few training examples. Despite the promising results shown by existing meta-learning algorithms in solving the few-shot classification problem, there still remains an important challenge: how to generalize to unseen domains while meta-learning on multiple seen domains? In this paper, we propose an… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: Code coming soon at https://github.com/shumanpng/CosML

  13. arXiv:2009.12658  [pdf, other

    cs.LG stat.ML

    Domain Generalization via Semi-supervised Meta Learning

    Authors: Hossein Sharifi-Noghabi, Hossein Asghari, Nazanin Mehrasa, Martin Ester

    Abstract: The goal of domain generalization is to learn from multiple source domains to generalize to unseen target domains under distribution discrepancy. Current state-of-the-art methods in this area are fully supervised, but for many real-world problems it is hardly possible to obtain enough labeled samples. In this paper, we propose the first method of domain generalization to leverage unlabeled samples… ▽ More

    Submitted 30 September, 2020; v1 submitted 26 September, 2020; originally announced September 2020.

  14. arXiv:2006.04435  [pdf, other

    cs.LG cs.AI stat.ML

    CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

    Authors: Xiang Li, Ben Kao, Caihua Shan, Dawei Yin, Martin Ester

    Abstract: We study the problem of applying spectral clustering to cluster multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity matrix that reflects the proximity of objects. For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apar… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

  15. arXiv:2003.07952  [pdf, other

    cs.LG stat.AP stat.ML

    ParKCa: Causal Inference with Partially Known Causes

    Authors: Raquel Aoki, Martin Ester

    Abstract: Methods for causal inference from observational data are an alternative for scenarios where collecting counterfactual data or realizing a randomized experiment is not possible. Adopting a stacking approach, our proposed method ParKCA combines the results of several causal inference methods to learn new causes in applications with some known causes and many potential causes. We validate ParKCA in t… ▽ More

    Submitted 11 November, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Comments: 12 pages, 4 figures, Pacific Symposium on Biocomputing - 2021 World Scientific Publishing Co., Singapore, http://psb.stanford.edu/

    MSC Class: 62P10 ACM Class: I.2.1

    Journal ref: Pacific Symposium on Biocomputing - 2021 World Scientific Publishing Co., Singapore, http://psb.stanford.edu/

  16. arXiv:1911.05954  [pdf, other

    cs.LG stat.ML

    Hierarchical Graph Pooling with Structure Learning

    Authors: Zhen Zhang, Jiajun Bu, Martin Ester, Jianfeng Zhang, Chengwei Yao, Zhi Yu, Can Wang

    Abstract: Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representat… ▽ More

    Submitted 25 December, 2019; v1 submitted 14 November, 2019; originally announced November 2019.

    Comments: Typo corrected, reference added and code is available at https://github.com/cszhangzhen/HGP-SL

  17. arXiv:1910.12207  [pdf, other

    cs.LG cs.AI

    An Active Approach for Model Interpretation

    Authors: Jialin Lu, Martin Ester

    Abstract: Model interpretation, or explanation of a machine learning classifier, aims to extract generalizable knowledge from a trained classifier into a human-understandable format, for various purposes such as model assessment, debugging and trust. From a computaional viewpoint, it is formulated as approximating the target classifier using a simpler interpretable model, such as rule models like a decision… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019 workshop on Human-Centric Machine Learning

  18. arXiv:1807.09741  [pdf, other

    cs.LG stat.ML

    PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction

    Authors: Qingyuan Feng, Evgenia Dueva, Artem Cherkasov, Martin Ester

    Abstract: In silico drug-target interaction (DTI) prediction is an important and challenging problem in biomedical research with a huge potential benefit to the pharmaceutical industry and patients. Most existing methods for DTI prediction including deep learning models generally have binary endpoints, which could be an oversimplification of the problem, and those methods are typically unable to handle cold… ▽ More

    Submitted 21 August, 2019; v1 submitted 25 July, 2018; originally announced July 2018.

  19. Detecting Singleton Review Spammers Using Semantic Similarity

    Authors: Vlad Sandulescu, Martin Ester

    Abstract: Online reviews have increasingly become a very important resource for consumers when making purchases. Though it is becoming more and more difficult for people to make well-informed buying decisions without being deceived by fake reviews. Prior works on the opinion spam problem mostly considered classifying fake reviews using behavioral user patterns. They focused on prolific users who write more… ▽ More

    Submitted 9 September, 2016; originally announced September 2016.

    Comments: 6 pages, WWW 2015

    ACM Class: I.7.0; J.4

    Journal ref: WWW '15 Companion Proceedings of the 24th International Conference on World Wide Web, 2015, p.971-976

  20. arXiv:1605.07980  [pdf, other

    cs.IR

    Structural Analysis of User Choices for Mobile App Recommendation

    Authors: Bin Liu, Yao Wu, Neil Zhenqiang Gong, Junjie Wu, Hui Xiong, Martin Ester

    Abstract: Advances in smartphone technology have promoted the rapid development of mobile apps. However, the availability of a huge number of mobile apps in application stores has imposed the challenge of finding the right apps to meet the user needs. Indeed, there is a critical demand for personalized app recommendations. Along this line, there are opportunities and challenges posed by two unique character… ▽ More

    Submitted 25 May, 2016; originally announced May 2016.

  21. arXiv:0904.1931  [pdf

    cs.DB cs.AI q-bio.GN

    KiWi: A Scalable Subspace Clustering Algorithm for Gene Expression Analysis

    Authors: Obi L. Griffith, Byron J. Gao, Mikhail Bilenky, Yuliya Prichyna, Martin Ester, Steven J. M. Jones

    Abstract: Subspace clustering has gained increasing popularity in the analysis of gene expression data. Among subspace cluster models, the recently introduced order-preserving sub-matrix (OPSM) has demonstrated high promise. An OPSM, essentially a pattern-based subspace cluster, is a subset of rows and columns in a data matrix for which all the rows induce the same linear ordering of columns. Existing OPS… ▽ More

    Submitted 13 April, 2009; originally announced April 2009.

    Comments: International Conference on Bioinformatics and Biomedical Engineering (iCBBE), 2009

  22. arXiv:0811.4458  [pdf, other

    cs.LG cs.AI

    Learning Class-Level Bayes Nets for Relational Data

    Authors: Oliver Schulte, Hassan Khosravi, Flavia Moser, Martin Ester

    Abstract: Many databases store data in relational format, with different types of entities and information about links between the entities. The field of statistical-relational learning (SRL) has developed a number of new statistical models for such data. In this paper we focus on learning class-level or first-order dependencies, which model the general database statistics over attributes of linked object… ▽ More

    Submitted 20 October, 2009; v1 submitted 26 November, 2008; originally announced November 2008.

    Comments: 14 pages (2 column)

    Report number: TR 2008-17, School of Computing Science, Simon Fraser University ACM Class: I.2.6

  23. arXiv:0710.2083  [pdf, ps, other

    cs.DB cs.LG cs.LO

    Association Rules in the Relational Calculus

    Authors: Oliver Schulte, Flavia Moser, Martin Ester, Zhiyong Lu

    Abstract: One of the most utilized data mining tasks is the search for association rules. Association rules represent significant relationships between items in transactions. We extend the concept of association rule to represent a much broader class of associations, which we refer to as \emph{entity-relationship rules.} Semantically, entity-relationship rules express associations between properties of re… ▽ More

    Submitted 10 October, 2007; originally announced October 2007.

    Comments: 16 pages, 13 tables

    Report number: SFU School of Computing Science, TR 2007-23