Skip to main content

Showing 1–50 of 110 results for author: Yuan, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.03133  [pdf, other

    cs.CY cs.AI cs.LG stat.ML

    Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness

    Authors: Yingfang Yuan, Kefan Chen, Mehdi Rizvi, Lynne Baillie, Wei Pang

    Abstract: The growing interest in fair AI development is evident. The ''Leave No One Behind'' initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation an… ▽ More

    Submitted 24 May, 2024; originally announced July 2024.

  2. arXiv:2407.00099  [pdf, other

    q-bio.NC cs.LG stat.AP

    Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data

    Authors: Yubai Yuan, Babak Shahbaba, Norbert Fortin, Keiland Cooper, Qing Nie, Annie Qu

    Abstract: Detecting dynamic patterns of task-specific responses shared across heterogeneous datasets is an essential and challenging problem in many scientific applications in medical science and neuroscience. In our motivating example of rodent electrophysiological data, identifying the dynamical patterns in neuronal activity associated with ongoing cognitive demands and behavior is key to uncovering the n… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  3. arXiv:2405.19697  [pdf, other

    math.OC cs.AI cs.LG stat.ML

    Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity

    Authors: Yan Yang, Bin Gao, Ya-xiang Yuan

    Abstract: Bilevel reinforcement learning (RL), which features intertwined two-level problems, has attracted growing interest recently. The inherent non-convexity of the lower-level RL problem is, however, to be an impediment to develo** bilevel optimization methods. By employing the fixed point equation associated with the regularized RL, we characterize the hyper-gradient via fully first-order informatio… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 43 pages, 1 figure, 1 table

  4. arXiv:2405.17591  [pdf, other

    stat.ME

    Individualized Dynamic Mediation Analysis Using Latent Factor Models

    Authors: Yijiao Zhang, Yubai Yuan, Yuexia Zhang, Zhongyi Zhu, Annie Qu

    Abstract: Mediation analysis plays a crucial role in causal inference as it can investigate the pathways through which treatment influences outcome. Most existing mediation analysis assumes that mediation effects are static and homogeneous within populations. However, mediation effects usually change over time and exhibit significant heterogeneity in many real-world applications. Additionally, the presence… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 25 pages, 3 figures, 3 tables

  5. arXiv:2404.03331  [pdf, other

    math.OC cs.LG stat.ML

    LancBiO: dynamic Lanczos-aided bilevel optimization via Krylov subspace

    Authors: Bin Gao, Yan Yang, Ya-xiang Yuan

    Abstract: Bilevel optimization, with broad applications in machine learning, has an intricate hierarchical structure. Gradient-based methods have emerged as a common approach to large-scale bilevel problems. However, the computation of the hyper-gradient, which involves a Hessian inverse vector product, confines the efficiency and is regarded as a bottleneck. To circumvent the inverse, we construct a sequen… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 35 pages, 11 figures, 1 table

  6. arXiv:2404.00776  [pdf, other

    cs.LG cs.DB stat.ML

    PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning

    Authors: Weihua Hu, Yiwen Yuan, Zecheng Zhang, Akihiro Nitta, Kaidi Cao, Vid Kocijan, Jure Leskovec, Matthias Fey

    Abstract: We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g.,… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: https://github.com/pyg-team/pytorch-frame

  7. arXiv:2402.08539  [pdf

    cs.LG stat.AP

    Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning

    Authors: Mingyang Li, Hongyu Liu, Yixuan Li, Zejun Wang, Yuan Yuan, Honglin Dai

    Abstract: This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data re… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  8. arXiv:2401.07012  [pdf

    cs.LG eess.SY stat.ML

    An ADRC-Incorporated Stochastic Gradient Descent Algorithm for Latent Factor Analysis

    Authors: **li Li, Ye Yuan

    Abstract: High-dimensional and incomplete (HDI) matrix contains many complex interactions between numerous nodes. A stochastic gradient descent (SGD)-based latent factor analysis (LFA) model is remarkably effective in extracting valuable information from an HDI matrix. However, such a model commonly encounters the problem of slow convergence because a standard SGD algorithm only considers the current learni… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  9. arXiv:2312.14426  [pdf, other

    cs.LG stat.ML

    Room Occupancy Prediction: Exploring the Power of Machine Learning and Temporal Insights

    Authors: Siqi Mao, Ya** Yuan, Yinpu Li, Ziren Wang, Yuanxin Yao, Yixin Kang

    Abstract: Energy conservation in buildings is a paramount concern to combat greenhouse gas emissions and combat climate change. The efficient management of room occupancy, involving actions like lighting control and climate adjustment, is a pivotal strategy to curtail energy consumption. In contexts where surveillance technology isn't viable, non-intrusive sensors are employed to estimate room occupancy. In… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  10. arXiv:2309.16951  [pdf, other

    stat.ML cs.LG

    Beyond Tides and Time: Machine Learning Triumph in Water Quality

    Authors: Yinpu Li, Siqi Mao, Ya** Yuan, Ziren Wang, Yixin Kang, Yuanxin Yao

    Abstract: Water resources are essential for sustaining human livelihoods and environmental well being. Accurate water quality prediction plays a pivotal role in effective resource management and pollution mitigation. In this study, we assess the effectiveness of five distinct predictive models linear regression, Random Forest, XGBoost, LightGBM, and MLP neural network, in forecasting pH values within the ge… ▽ More

    Submitted 6 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 19 pages, 7 figures

  11. arXiv:2309.05762  [pdf, other

    stat.ME stat.AP

    Statistical and Practical Considerations in Planning and Conduct of Dose Optimization Trials

    Authors: Ying Yuan, Heng Zhou, Suyu Liu

    Abstract: The US Food and Drug Administration launched Project Optimus with the aim of shifting the paradigm of dose-finding and selection towards identifying the optimal biological dose that offers the best balance between benefit and risk, rather than the maximum tolerated dose. However, achieving dose optimization is a challenging task that involves a variety of factors and is considerably more complicat… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  12. arXiv:2308.13033  [pdf, other

    stat.CO

    A Strength and Sparsity Preserving Algorithm for Generating Weighted, Directed Networks with Predetermined Assortativity

    Authors: Yelie Yuan, Jun Yan, Panpan Zhang

    Abstract: Degree-preserving rewiring is a widely used technique for generating unweighted networks with given assortativity, but for weighted networks, it is unclear how an analog would preserve the strengths and other critical network features such as sparsity level. This study introduces a novel approach for rewiring weighted networks to achieve desired directed assortativity. The method utilizes a mixed… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  13. arXiv:2308.09790  [pdf, other

    stat.ML cs.LG cs.SI

    A Two-Part Machine Learning Approach to Characterizing Network Interference in A/B Testing

    Authors: Yuan Yuan, Kristen M. Altenburger

    Abstract: The reliability of controlled experiments, commonly referred to as "A/B tests," is often compromised by network interference, where the outcomes of individual units are influenced by interactions with others. Significant challenges in this domain include the lack of accounting for complex social network structures and the difficulty in suitably characterizing network interference. To address these… ▽ More

    Submitted 29 June, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: 47 pages

  14. arXiv:2308.08152  [pdf, other

    econ.EM stat.ME

    Estimating Effects of Long-Term Treatments

    Authors: Shan Huang, Chen Wang, Yuan Yuan, **glong Zhao, **g**g Zhang

    Abstract: Estimating the effects of long-term treatments in A/B testing presents a significant challenge. Such treatments -- including updates to product functions, user interface designs, and recommendation algorithms -- are intended to remain in the system for a long period after their launches. On the other hand, given the constraints of conducting long-term experiments, practitioners often rely on short… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  15. arXiv:2306.09694  [pdf, ps, other

    math.OC cs.LG math.NA stat.ML

    Linear convergence of forward-backward accelerated algorithms without knowledge of the modulus of strong convexity

    Authors: Bowen Li, Bin Shi, Ya-xiang Yuan

    Abstract: A significant milestone in modern gradient-based optimization was achieved with the development of Nesterov's accelerated gradient descent (NAG) method. This forward-backward technique has been further advanced with the introduction of its proximal generalization, commonly known as the fast iterative shrinkage-thresholding algorithm (FISTA), which enjoys widespread application in image science and… ▽ More

    Submitted 8 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: 17 pages, 3 figures; To appear in SIAM Journal on Optimization

  16. arXiv:2305.12279  [pdf, other

    stat.ME

    SAM: Self-adapting Mixture Prior to Dynamically Borrow Information from Historical Data in Clinical Trials

    Authors: Peng Yang, Yuansong Zhao, Lei Nie, Jonathon Vallejo, Ying Yuan

    Abstract: Mixture priors provide an intuitive way to incorporate historical data while accounting for potential prior-data conflict by combining an informative prior with a non-informative prior. However, pre-specifying the mixing weight for each component remains a crucial challenge. Ideally, the mixing weight should reflect the degree of prior-data conflict, which is often unknown beforehand, posing a sig… ▽ More

    Submitted 8 September, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

  17. arXiv:2303.16841  [pdf, other

    cs.LG stat.ML

    Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees

    Authors: Ziwen Wang, Yancheng Yuan, Jiaming Ma, Tieyong Zeng, Defeng Sun

    Abstract: In this paper, we propose a randomly projected convex clustering model for clustering a collection of $n$ high dimensional data points in $\mathbb{R}^d$ with $K$ hidden clusters. Compared to the convex clustering model for clustering original data with dimension $d$, we prove that, under some mild conditions, the perfect recovery of the cluster membership assignments of the convex clustering model… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  18. arXiv:2302.09612  [pdf, other

    stat.ME stat.AP

    Design and Sample Size Determination for Multiple-dose Randomized Phase II Trials for Dose Optimization

    Authors: Peng Yang, Daniel Li, Ruitao Lin, Bo Huang, Ying Yuan

    Abstract: The conventional more-is-better dose selection paradigm, which targets the maximum tolerated dose (MTD), is not suitable for the development of targeted therapies and immunotherapies as the efficacy of these novel therapies may not increase with the dose. The U.S. Food and Drug Administration (FDA) has launched Project Optimus "to reform the dose optimization and dose selection paradigm in oncolog… ▽ More

    Submitted 29 August, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

  19. arXiv:2302.05513  [pdf, other

    stat.ME

    De-confounding causal inference using latent multiple-mediator pathways

    Authors: Yubai Yuan, Annie Qu

    Abstract: Causal effect estimation from observational data is one of the essential problems in causal inference. However, most estimation methods rely on the strong assumption that all confounders are observed, which is impractical and untestable in the real world. We develop a mediation analysis framework inferring the latent confounder for debiasing both direct and indirect causal effects. Specifically, w… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  20. Crowdsourcing Utilizing Subgroup Structure of Latent Factor Modeling

    Authors: Qi Xu, Yubai Yuan, Junhui Wang, Annie Qu

    Abstract: Crowdsourcing has emerged as an alternative solution for collecting large scale labels. However, the majority of recruited workers are not domain experts, so their contributed labels could be noisy. In this paper, we propose a two-stage model to predict the true labels for multicategory classification tasks in crowdsourcing. In the first stage, we fit the observed labels with a latent factor model… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

  21. Generating General Preferential Attachment Networks with R Package wdnet

    Authors: Yelie Yuan, Tiandong Wang, Jun Yan, Panpan Zhang

    Abstract: Preferential attachment (PA) network models have a wide range of applications in various scientific disciplines. Efficient generation of large-scale PA networks helps uncover their structural properties and facilitate the development of associated analytical methodologies. Existing software packages only provide limited functions for this purpose with restricted configurations and efficiency. We p… ▽ More

    Submitted 15 October, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: 19 pages, 4 figures

    Journal ref: J. data sci. 21(2023), no. 3, 538-556

  22. arXiv:2212.06319  [pdf, ps, other

    math.OC cs.LG math.ST stat.ML

    Linear Convergence of ISTA and FISTA

    Authors: Bowen Li, Bin Shi, Ya-xiang Yuan

    Abstract: In this paper, we revisit the class of iterative shrinkage-thresholding algorithms (ISTA) for solving the linear inverse problem with sparse representation, which arises in signal and image processing. It is shown in the numerical experiment to deblur an image that the convergence behavior in the logarithmic-scale ordinate tends to be linear instead of logarithmic, approximating to be flat. Making… ▽ More

    Submitted 14 January, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: 16 pages, 4 figures

  23. arXiv:2211.02046  [pdf

    stat.ME

    Seamless Phase 2-3 Design: A Useful Strategy to Reduce the Sample Size for Dose Optimization

    Authors: Liyun Jiang, Ying Yuan

    Abstract: The traditional more-is-better dose selection paradigm, developed based on cytotoxic chemotherapeutics, is often problematic When applied to the development of novel molecularly targeted agents (e.g., kinase inhibitors, monoclonal antibodies, and antibody-drug conjugates). The US Food and Drug Administration (FDA) initiated Project Optimus to reform the dose optimization and dose selection paradig… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  24. arXiv:2211.01610  [pdf, ps, other

    math.OC cs.LG math.ST stat.ML

    Proximal Subgradient Norm Minimization of ISTA and FISTA

    Authors: Bowen Li, Bin Shi, Ya-xiang Yuan

    Abstract: For first-order smooth optimization, the research on the acceleration phenomenon has a long-time history. Until recently, the mechanism leading to acceleration was not successfully uncovered by the gradient correction term and its equivalent implicit-velocity form. Furthermore, based on the high-resolution differential equation framework with the corresponding emerging techniques, phase-space repr… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: 17 pages, 4 figures

  25. arXiv:2210.17426  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Trade-off Between Efficiency and Consistency for Removal-based Explanations

    Authors: Yifan Zhang, Haowei He, Zhiquan Tan, Yang Yuan

    Abstract: In the current landscape of explanation methodologies, most predominant approaches, such as SHAP and LIME, employ removal-based techniques to evaluate the impact of individual features by simulating various scenarios with specific features omitted. Nonetheless, these methods primarily emphasize efficiency in the original context, often resulting in general inconsistencies. In this paper, we demons… ▽ More

    Submitted 20 October, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2023. Code: https://github.com/trusty-ai/efficient-consistent-explanation

  26. arXiv:2210.00173  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Predictive Inference with Feature Conformal Prediction

    Authors: Jiaye Teng, Chuan Wen, Dinghuai Zhang, Yoshua Bengio, Yang Gao, Yang Yuan

    Abstract: Conformal prediction is a distribution-free technique for establishing valid prediction intervals. Although conventionally people conduct conformal prediction in the output space, this is not the only possibility. In this paper, we propose feature conformal prediction, which extends the scope of conformal prediction to semantic feature spaces by leveraging the inductive bias of deep representation… ▽ More

    Submitted 8 April, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: Published as a conference paper at ICLR 2023

  27. arXiv:2209.01655  [pdf, other

    stat.ME

    DROID: Dose-ranging Approach to Optimizing Dose in Oncology Drug Development

    Authors: Beibei Guo, Ying Yuan

    Abstract: In the era of targeted therapy, there has been increasing concern about the development of oncology drugs based on the "more is better" paradigm, developed decades ago for chemotherapy. Recently, the US Food and Drug Administration (FDA) initiated Project Optimus to reform the dose optimization and dose selection paradigm in oncology drug development. To accommodate this paradigm shifting, we prop… ▽ More

    Submitted 4 September, 2022; originally announced September 2022.

  28. arXiv:2209.00383  [pdf, other

    cs.CV stat.ML

    TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut

    Authors: Yangtao Wang, Xi Shen, Yuan Yuan, Yuming Du, Maomao Li, Shell Xu Hu, James L Crowley, Dominique Vaufreydaz

    Abstract: In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features l… ▽ More

    Submitted 5 December, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2202.11539

  29. A State Transition Model for Mobile Notifications via Survival Analysis

    Authors: Yi** Yuan, **g Zhang, Shaunak Chatterjee, Shipeng Yu, Romer Rosales

    Abstract: Mobile notifications have become a major communication channel for social networking services to keep users informed and engaged. As more mobile applications push notifications to users, they constantly face decisions on what to send, when and how. A lack of research and methodology commonly leads to heuristic decision making. Many notifications arrive at an inappropriate moment or introduce too m… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: 9 pages, 7 figures. Published in WSDM 19'

    ACM Class: I.2.6

    Journal ref: WSDM 2019 Pages 123-131

  30. arXiv:2207.03029  [pdf, other

    cs.LG stat.ML

    Multi-objective Optimization of Notifications Using Offline Reinforcement Learning

    Authors: Prakruthi Prabhakar, Yi** Yuan, Guangyu Yang, Wensheng Sun, Ajith Muralidharan

    Abstract: Mobile notification systems play a major role in a variety of applications to communicate, send alerts and reminders to the users to inform them about news, events or messages. In this paper, we formulate the near-real-time notification decision problem as a Markov Decision Process where we optimize for multiple objectives in the rewards. We propose an end-to-end offline reinforcement learning fra… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: 9 pages, 6 figures, to be published in KDD 22'

    ACM Class: I.2.6

  31. arXiv:2206.01515  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding Deep Learning via Decision Boundary

    Authors: Shiye Lei, Fengxiang He, Yancheng Yuan, Dacheng Tao

    Abstract: This paper discovers that the neural network with lower decision boundary (DB) variability has better generalizability. Two new notions, algorithm DB variability and $(ε, η)$-data DB variability, are proposed to measure the decision boundary variability from the algorithm and data perspectives. Extensive experiments show significant negative correlations between the decision boundary variability a… ▽ More

    Submitted 24 December, 2023; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: Accepted by IEEE TNNLS

  32. arXiv:2205.15638  [pdf, other

    cs.LG cs.DS stat.ME

    Differentiable Invariant Causal Discovery

    Authors: Yu Wang, An Zhang, Xiang Wang, Yancheng Yuan, Xiangnan He, Tat-Seng Chua

    Abstract: Learning causal structure from observational data is a fundamental challenge in machine learning. However, the majority of commonly used differentiable causal discovery methods are non-identifiable, turning this problem into a continuous optimization task prone to data biases. In many real-life situations, data is collected from different environments, in which the functional relations remain cons… ▽ More

    Submitted 29 September, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: 22 pages, 11 figures

  33. arXiv:2202.11539  [pdf, other

    cs.CV stat.ML

    Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

    Authors: Yangtao Wang, Xi Shen, Shell Hu, Yuan Yuan, James Crowley, Dominique Vaufreydaz

    Abstract: Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connect… ▽ More

    Submitted 24 March, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

    Journal ref: CVPR 2022 - Conference on Computer Vision and Pattern Recognition, Jun 2022, New Orleans, United States

  34. arXiv:2202.06054  [pdf, other

    cs.LG math.ST stat.ML

    Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression

    Authors: **g Xu, Jiaye Teng, Yang Yuan, Andrew Chi-Chih Yao

    Abstract: One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent even for overparameterized linear regression. In many scenarios, this failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. This paper demons… ▽ More

    Submitted 21 November, 2023; v1 submitted 12 February, 2022; originally announced February 2022.

  35. arXiv:2201.03451  [pdf, other

    stat.CO

    An Efficient Algorithm for Generating Directed Networks with Predetermined Assortativity Measures

    Authors: Tiandong Wang, Jun Yan, Yelie Yuan, Panpan Zhang

    Abstract: Assortativity coefficients are important metrics to analyze both directed and undirected networks. In general, it is not guaranteed that the fitted model will always agree with the assortativity coefficients in the given network, and the structure of directed networks is more complicated than the undirected ones. Therefore, we provide a remedy by proposing a degree-preserving rewiring algorithm, c… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

  36. arXiv:2201.00068  [pdf, other

    stat.ME stat.AP

    Bayesian Nonparametric Common Atoms Regression for Generating Synthetic Controls in Clinical Trials

    Authors: Noirrit Kiran Chandra, Abhra Sarkar, John F. de Groot, Ying Yuan, Peter Müller

    Abstract: The availability of electronic health records (EHR) has opened opportunities to supplement increasingly expensive and difficult to carry out randomized controlled trials (RCT) with evidence from readily available real world data. In this paper, we use EHR data to construct synthetic control arms for treatment-only single arm trials. We propose a novel nonparametric Bayesian common atoms mixture mo… ▽ More

    Submitted 6 May, 2023; v1 submitted 31 December, 2021; originally announced January 2022.

  37. arXiv:2112.10880  [pdf, other

    stat.ME stat.AP

    BOP2-DC: Bayesian optimal phase II designs with dual-criterion decision making

    Authors: Yujie Zhao, Daniel Li, Rong Liu, Ying Yuan

    Abstract: The conventional phase II trial design paradigm is to make the go/no-go decision based on the hypothesis testing framework. Statistical significance itself alone, however, may not be sufficient to establish that the drug is clinically effective enough to warrant confirmatory phase III trials. We propose the Bayesian optimal phase II trial design with dual-criterion decision making (BOP2-DC), which… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

  38. arXiv:2111.05265  [pdf, other

    cs.SI cs.LG stat.ML

    High-order joint embedding for multi-level link prediction

    Authors: Yubai Yuan, Annie Qu

    Abstract: Link prediction infers potential links from observed networks, and is one of the essential problems in network analyses. In contrast to traditional graph representation modeling which only predicts two-way pairwise relations, we propose a novel tensor-based joint network embedding approach on simultaneously encoding pairwise links and hyperlinks onto a latent space, which captures the dependency b… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

    Comments: 35 pages

  39. arXiv:2111.04871  [pdf, other

    stat.ML cs.LG

    Query-augmented Active Metric Learning

    Authors: Yujia Deng, Yubai Yuan, Haoda Fu, Annie Qu

    Abstract: In this paper we propose an active metric learning method for clustering with pairwise constraints. The proposed method actively queries the label of informative instance pairs, while estimating underlying metrics by incorporating unlabeled instance pairs, which leads to a more accurate and efficient clustering process. In particular, we augment the queried constraints by generating more pairwise… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

  40. arXiv:2106.11312  [pdf, other

    cs.CY cs.LG stat.ML

    Feedback Sha**: A Modeling Approach to Nurture Content Creation

    Authors: Ye Tu, Chun Lo, Yi** Yuan, Shaunak Chatterjee

    Abstract: Social media platforms bring together content creators and content consumers through recommender systems like newsfeed. The focus of such recommender systems has thus far been primarily on modeling the content consumer preferences and optimizing for their experience. However, it is equally critical to nurture content creation by prioritizing the creators' interests, as quality content forms the se… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Journal ref: KDD 2019: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

  41. arXiv:2106.06153  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Understanding Generalization via Decomposing Excess Risk Dynamics

    Authors: Jiaye Teng, Jianhao Ma, Yang Yuan

    Abstract: Generalization is one of the fundamental issues in machine learning. However, traditional techniques like uniform convergence may be unable to explain generalization under overparameterization. As alternative approaches, techniques based on stability analyze the training dynamics and derive algorithm-dependent generalization bounds. Unfortunately, the stability-based bounds are still far from expl… ▽ More

    Submitted 19 March, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted by ICLR 2022

  42. arXiv:2103.04556  [pdf, other

    cs.DS cs.AI cs.LG stat.ML

    T-SCI: A Two-Stage Conformal Inference Algorithm with Guaranteed Coverage for Cox-MLP

    Authors: Jiaye Teng, Zeren Tan, Yang Yuan

    Abstract: It is challenging to deal with censored data, where we only have access to the incomplete information of survival time instead of its exact value. Fortunately, under linear predictor assumption, people can obtain guaranteed coverage for the confidence band of survival time using methods like Cox Regression. However, when relaxing the linear assumption with neural networks (e.g., Cox-MLP (Katzman e… ▽ More

    Submitted 14 April, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

  43. arXiv:2103.03471  [pdf, other

    math.ST eess.SP stat.ML

    Joint Network Topology Inference via Structured Fusion Regularization

    Authors: Yanli Yuan, De Wen Soh, Xiao Yang, Kun Guo, Tony Q. S. Quek

    Abstract: Joint network topology inference represents a canonical problem of jointly learning multiple graph Laplacian matrices from heterogeneous graph signals. In such a problem, a widely employed assumption is that of a simple common component shared among multiple networks. However, in practice, a more intricate topological pattern, comprising simultaneously of sparse, homogeneity and heterogeneity comp… ▽ More

    Submitted 8 July, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  44. arXiv:2101.05389  [pdf, other

    stat.AP

    Assortativity measures for weighted and directed networks

    Authors: Yelie Yuan, Jun Yan, Panpan Zhang

    Abstract: Assortativity measures the tendency of a vertex in a network being connected by other vertexes with respect to some vertex-specific features. Classical assortativity coefficients are defined for unweighted and undirected networks with respect to vertex degree. We propose a class of assortativity coefficients that capture the assortative characteristics and structure of weighted and directed networ… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

  45. arXiv:2012.02378  [pdf, ps, other

    stat.ME stat.AP

    Optimal Bayesian hierarchical model to accelerate the development of tissue-agnostic drugs and basket trials

    Authors: Liyun Jiang, Lei Nie, Fangrong Yan, Ying Yuan

    Abstract: Tissue-agnostic trials enroll patients based on their genetic biomarkers, not tumor type, in an attempt to determine if a new drug can successfully treat disease conditions based on biomarkers. The Bayesian hierarchical model (BHM) provides an attractive approach to design phase II tissue-agnostic trials by allowing information borrowing across multiple disease types. In this article, we elucidate… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

  46. arXiv:2011.12508  [pdf, other

    cs.LG stat.ML

    Causal inference using deep neural networks

    Authors: Ye Yuan, Xueying Ding, Ziv Bar-Joseph

    Abstract: Causal inference from observation data is a core problem in many scientific fields. Here we present a general supervised deep learning framework that infers causal interactions by transforming the input vectors to an image-like representation for every pair of inputs. Given a training dataset we first construct a normalized empirical probability density distribution (NEPDF) matrix. We then train a… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

  47. arXiv:2011.10020  [pdf

    stat.AP

    Modelling fertility potential in survivors of childhood cancer: An introduction to modern statistical and computational methods

    Authors: L. Yu, Z. Lu, P. C. Nathan, S. Mostoufi-Moab, Y. Yuan

    Abstract: Statistical and computational methods are widely used in today's scientific studies. Using a female fertility potential in childhood cancer survivors as an example, we illustrate how these methods can be used to extract insight regarding biological processes from noisy observational data in order to inform decision making. We start by contextualizing the computational methods with the working exam… ▽ More

    Submitted 19 November, 2020; originally announced November 2020.

    Comments: 15 pages, 9 figures and 1 table

  48. arXiv:2011.06446  [pdf, other

    stat.CO cs.LG math.NA

    Subgroup-based Rank-1 Lattice Quasi-Monte Carlo

    Authors: Yueming Lyu, Yuan Yuan, Ivor W. Tsang

    Abstract: Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice properties for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming because of an exhaustive computer sea… ▽ More

    Submitted 28 October, 2020; originally announced November 2020.

    Comments: NeurIPS 2020

  49. Causal Network Motifs: Identifying Heterogeneous Spillover Effects in A/B Tests

    Authors: Yuan Yuan, Kristen M. Altenburger, Farshad Kooti

    Abstract: Randomized experiments, or "A/B" tests, remain the gold standard for evaluating the causal effect of a policy intervention or product change. However, experimental settings, such as social networks, where users are interacting and influencing one another, may violate conventional assumptions of no interference for credible causal inference. Existing solutions to the network setting include account… ▽ More

    Submitted 15 February, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: 12 pages; to appear in the Web Conference (WWW) 2021

  50. arXiv:2010.09822  [pdf, other

    stat.ME

    Is the new model better? One metric says yes, but the other says no. Which metric do I use?

    Authors: Qian M. Zhou, Zhe Lu, Russell J. Brooke, Melissa M Hudson, Yan Yuan

    Abstract: Incremental value (IncV) evaluates the performance change from an existing risk model to a new model. It is one of the key considerations in deciding whether a new risk model performs better than the existing one. Problems arise when different IncV metrics contradict each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a sli… ▽ More

    Submitted 15 December, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: 25 pages, 6 figures, 1 table. Compared to Version 1, the title and overall structure of the manuscript have been changed significantly