Skip to main content

Showing 1–50 of 394 results for author: Chen, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.00224  [pdf, other

    cs.CV stat.AP

    Multimodal Prototy** for cancer survival prediction

    Authors: Andrew H. Song, Richard J. Chen, Guillaume Jaume, Anurag J. Vaidya, Alexander S. Baras, Faisal Mahmood

    Abstract: Multimodal survival methods combining gigapixel histology whole-slide images (WSIs) and transcriptomic profiles are particularly promising for patient prognostication and stratification. Current approaches involve tokenizing the WSIs into smaller patches (>10,000 patches) and transcriptomics into gene groups, which are then integrated using a Transformer for predicting outcomes. However, this proc… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: ICML 2024

  2. arXiv:2406.16988  [pdf, other

    cs.LG stat.ML

    MD tree: a model-diagnostic tree grown on loss landscape

    Authors: Yefan Zhou, Jianlong Chen, Qinxue Cao, Konstantin Schürholt, Yaoqing Yang

    Abstract: This paper considers "model diagnosis", which we formulate as a classification problem. Given a pre-trained neural network (NN), the goal is to predict the source of failure from a set of failure modes (such as a wrong hyperparameter, inadequate model size, and insufficient data) without knowing the training configuration of the pre-trained NN. The conventional diagnosis approach uses training and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: ICML 2024, first two authors contributed equally

  3. arXiv:2406.12531  [pdf, other

    cs.LG stat.ML

    TREE: Tree Regularization for Efficient Execution

    Authors: Lena Schmid, Daniel Biebert, Christian Hakert, Kuan-Hsun Chen, Michel Lang, Markus Pauly, Jian-Jia Chen

    Abstract: The rise of machine learning methods on heavily resource constrained devices requires not only the choice of a suitable model architecture for the target platform, but also the optimization of the chosen model with regard to execution time consumption for inference in order to optimally utilize the available resources. Random forests and decision trees are shown to be a suitable model for such a s… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.01252  [pdf, other

    cs.CL cs.AI stat.ML

    Towards Scalable Automated Alignment of LLMs: A Survey

    Authors: Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

    Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2405.17490  [pdf, other

    cs.LG stat.ML

    Revisit, Extend, and Enhance Hessian-Free Influence Functions

    Authors: Ziao Yang, Han Yue, Jian Chen, Hongfu Liu

    Abstract: Influence functions serve as crucial tools for assessing sample influence in model interpretation, subset training set selection, noisy label detection, and more. By employing the first-order Taylor extension, influence functions can estimate sample influence without the need for expensive model retraining. However, applying influence functions directly to deep models presents challenges, primaril… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2405.16672  [pdf, other

    stat.ML cs.LG stat.ME

    Transfer Learning Under High-Dimensional Graph Convolutional Regression Model for Node Classification

    Authors: Jiachen Chen, Danyang Huang, Liyuan Wang, Kathryn L. Lunetta, Debarghya Mukherjee, Huimin Cheng

    Abstract: Node classification is a fundamental task, but obtaining node classification labels can be challenging and expensive in many real-world scenarios. Transfer learning has emerged as a promising solution to address this challenge by leveraging knowledge from source domains to enhance learning in a target domain. Existing transfer learning methods for node classification primarily focus on integrating… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  7. arXiv:2405.15885  [pdf, other

    cs.LG stat.ML

    Diffusion Bridge Implicit Models

    Authors: Kaiwen Zheng, Guande He, Jianfei Chen, Fan Bao, Jun Zhu

    Abstract: Denoising diffusion bridge models (DDBMs) are a powerful variant of diffusion models for interpolating between two arbitrary paired distributions given as endpoints. Despite their promising performance in tasks like image translation, DDBMs require a computationally intensive sampling process that involves the simulation of a (stochastic) differential equation through hundreds of network evaluatio… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  8. arXiv:2405.12437  [pdf

    stat.AP

    Considerations for Single-Arm Trials to Support Accelerated Approval of Oncology Drugs

    Authors: Feinan Lu, Tao Wang, Ying Lu, Jie Chen

    Abstract: In the last two decades, single-arm trials (SATs) have been effectively used to study anticancer therapies in well-defined patient populations using durable response rates as an objective and interpretable clinical endpoints. With a growing trend of regulatory accelerated approval (AA) requiring randomized controlled trials (RCTs), some confusions have arisen about the roles of SATs in AA. This pa… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  9. arXiv:2405.11643  [pdf, other

    cs.CV cs.LG stat.AP

    Morphological Prototy** for Unsupervised Slide Representation Learning in Computational Pathology

    Authors: Andrew H. Song, Richard J. Chen, Tong Ding, Drew F. K. Williamson, Guillaume Jaume, Faisal Mahmood

    Abstract: Representation learning of pathology whole-slide images (WSIs) has been has primarily relied on weak supervision with Multiple Instance Learning (MIL). However, the slide representations resulting from this approach are highly tailored to specific clinical tasks, which limits their expressivity and generalization, particularly in scenarios with limited data. Instead, we hypothesize that morphologi… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  10. arXiv:2405.10490  [pdf

    stat.ME cs.AI cs.IR cs.LG math.OC

    Neural Optimization with Adaptive Heuristics for Intelligent Marketing System

    Authors: Changshuai Wei, Benjamin Zelditch, Joyce Chen, Andre Assuncao Silva T Ribeiro, **gyi Kenneth Tay, Borja Ocejo Elizondo, Keerthi Selvaraj, Aman Gupta, Licurgo Benemann De Almeida

    Abstract: Computational marketing has become increasingly important in today's digital world, facing challenges such as massive heterogeneous data, multi-channel customer journeys, and limited marketing budgets. In this paper, we propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (NOAH) framework. NOAH is the first general framework for marketing optimizat… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: KDD 2024

    ACM Class: G.3; G.1.6; I.2

  11. arXiv:2401.14343  [pdf, other

    cs.LG cs.CY stat.ML

    Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

    Authors: Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak

    Abstract: Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, these heterogeneities impede the learning process, most notably, when optimizing fairness objectives. Confirming this, under a gaussian mixture setting,… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 15 pages, 8 figures

  12. arXiv:2401.06687  [pdf, other

    cs.CL cs.LG stat.ME

    Proximal Causal Inference With Text Data

    Authors: Jacob M. Chen, Rohit Bhattacharya, Katherine A. Keith

    Abstract: Recent text-based causal methods attempt to mitigate confounding bias by estimating proxies of confounding variables that are partially or imperfectly measured from unstructured text data. These approaches, however, assume analysts have supervised labels of the confounders given text for a subset of instances, a constraint that is sometimes infeasible due to data privacy or annotation costs. In th… ▽ More

    Submitted 21 May, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 26 pages

  13. arXiv:2401.03482  [pdf, other

    cs.LG stat.ML

    Uncertainty Quantification on Clinical Trial Outcome Prediction

    Authors: Tianyi Chen, Yingzhou Lu, Nan Hao, Capucine Van Rechem, **tai Chen, Tianfan Fu

    Abstract: The importance of uncertainty quantification is increasingly recognized in the diverse field of machine learning. Accurately assessing model prediction uncertainty can help provide deeper understanding and confidence for researchers and practitioners. This is especially critical in medical diagnosis and drug discovery areas, where reliable predictions directly impact research quality and patient h… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  14. arXiv:2312.14572  [pdf, other

    math.OC stat.ML

    Semidefinite Relaxations of the Gromov-Wasserstein Distance

    Authors: Junyu Chen, Binh T. Nguyen, Yong Sheng Soh

    Abstract: The Gromov-Wasserstein (GW) distance is a variant of the optimal transport problem that allows one to match objects between incomparable spaces. At its core, the GW distance is specified as the solution of a non-convex quadratic program and is not known to be tractable to solve. In particular, existing solvers for the GW distance are only able to find locally optimal solutions. In this work, we pr… ▽ More

    Submitted 26 December, 2023; v1 submitted 22 December, 2023; originally announced December 2023.

  15. arXiv:2312.13331  [pdf, other

    stat.ME stat.AP

    A Bayesian Spatial Berkson error approach to estimate small area opioid mortality rates accounting for population-at-risk uncertainty

    Authors: Emily N Peterson, Rachel C. Nethery, Jarvis T. Chen, Loni P. Tabb, Brent A. Coull, Frederic B. Piel, Lance A Waller

    Abstract: Monitoring small-area geographical population trends in opioid mortality has large scale implications to informing preventative resource allocation. A common approach to obtain small area estimates of opioid mortality is to use a standard disease map** approach in which population-at-risk estimates are treated as fixed and known. Assuming fixed populations ignores the uncertainty surrounding sma… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  16. arXiv:2312.04648  [pdf, other

    stat.ML cs.LG

    Enhancing Polynomial Chaos Expansion Based Surrogate Modeling using a Novel Probabilistic Transfer Learning Strategy

    Authors: Wyatt Bridgman, Uma Balakrishnan, Reese Jones, Jiefu Chen, Xuqing Wu, Cosmin Safta, Yueqin Huang, Mohammad Khalil

    Abstract: In the field of surrogate modeling, polynomial chaos expansion (PCE) allows practitioners to construct inexpensive yet accurate surrogates to be used in place of the expensive forward model simulations. For black-box simulations, non-intrusive PCE allows the construction of these surrogates using a set of simulation response evaluations. In this context, the PCE coefficients can be obtained using… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  17. arXiv:2312.00540  [pdf, other

    cs.LG cs.AI stat.ML

    Target-agnostic Source-free Domain Adaptation for Regression Tasks

    Authors: Tianlang He, Zhiqiu Xia, Jierun Chen, Haoliang Li, S. -H. Gary Chan

    Abstract: Unsupervised domain adaptation (UDA) seeks to bridge the domain gap between the target and source using unlabeled target data. Source-free UDA removes the requirement for labeled source data at the target to preserve data privacy and storage. However, work on source-free UDA assumes knowledge of domain gap distribution, and hence is limited to either target-aware or classification task. To overcom… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Accepted by ICDE 2024

  18. arXiv:2312.00296  [pdf, other

    cs.LG stat.ML

    Towards Aligned Canonical Correlation Analysis: Preliminary Formulation and Proof-of-Concept Results

    Authors: Biqian Cheng, Evangelos E. Papalexakis, Jia Chen

    Abstract: Canonical Correlation Analysis (CCA) has been widely applied to jointly embed multiple views of data in a maximally correlated latent space. However, the alignment between various data perspectives, which is required by traditional approaches, is unclear in many practical cases. In this work we propose a new framework Aligned Canonical Correlation Analysis (ACCA), to address this challenge by iter… ▽ More

    Submitted 7 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: 4 pages, 7 figures, KDD SoCal symposium 2023 (extended version)

  19. arXiv:2311.08677  [pdf, other

    cs.LG cs.DC cs.IT stat.ML

    Federated Learning for Sparse Principal Component Analysis

    Authors: Sin Cheng Ciou, Pin Jui Chen, Elvin Y. Tseng, Yuh-Jye Lee

    Abstract: In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keepin… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 11 pages, 7 figures, 1 table. Accepted by IEEE BigData 2023, Sorrento, Italy

  20. arXiv:2311.03289  [pdf, other

    stat.ME

    Batch effect correction with sample remeasurement in highly confounded case-control studies

    Authors: Hanxuan Ye, Xianyang Zhang, Chen Wang, Ellen L. Goode, Jun Chen

    Abstract: Batch effects are pervasive in biomedical studies. One approach to address the batch effects is repeatedly measuring a subset of samples in each batch. These remeasured samples are used to estimate and correct the batch effects. However, rigorous statistical methods for batch effect correction with remeasured samples are severely under-developed. In this study, we developed a framework for batch e… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 45 pages

  21. arXiv:2310.20498  [pdf, other

    cs.LG cond-mat.stat-mech quant-ph stat.ML

    Generative Learning of Continuous Data by Tensor Networks

    Authors: Alex Meiburg, **g Chen, Jacob Miller, Raphaëlle Tihon, Guillaume Rabusseau, Alejandro Perdomo-Ortiz

    Abstract: Beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. While possessing many desirable features arising from their quantum-inspired nature, tensor network generative models have previously been largely restricted to binary or categorical data, limiting… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: 21 pages, 15 figures

  22. arXiv:2310.20294  [pdf, ps, other

    stat.ME

    Robust nonparametric regression based on deep ReLU neural networks

    Authors: Juntong Chen

    Abstract: In this paper, we consider robust nonparametric regression using deep neural networks with ReLU activation function. While several existing theoretically justified methods are geared towards robustness against identical heavy-tailed noise distributions, the rise of adversarial attacks has emphasized the importance of safeguarding estimation procedures against systematic contamination. We approach… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: 40 pages

  23. arXiv:2310.18430  [pdf, other

    stat.ML cs.LG

    MCRAGE: Synthetic Healthcare Data for Fairness

    Authors: Keira Behal, Jiayi Chen, Caleb Fikes, Sophia Xiao

    Abstract: In the field of healthcare, electronic health records (EHR) serve as crucial training data for develo** machine learning models for diagnosis, treatment, and the management of healthcare resources. However, medical datasets are often imbalanced in terms of sensitive attributes such as race/ethnicity, gender, and age. Machine learning models trained on class-imbalanced EHR datasets perform signif… ▽ More

    Submitted 20 March, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: Keywords: synthetic electronic health records, conditional denoising diffusion probabilistic model, healthcare AI, tabular data, fairness, synthetic data. This paper is the result of work completed at the 2023 Emory University Department of Mathematics REU/RET program under the direction of Project Advisor Dr. Xi Yuanzhe. This work is sponsored by NSF DMS 2051019

  24. arXiv:2310.17023  [pdf, other

    stat.ML cs.LG

    On the Identifiability and Interpretability of Gaussian Process Models

    Authors: Jiawen Chen, Wancen Mu, Yun Li, Didong Li

    Abstract: In this paper, we critically examine the prevalent practice of using additive mixtures of Matérn kernels in single-output Gaussian process (GP) models and explore the properties of multiplicative mixtures of Matérn kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Matérn kernels is determined by the… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

    MSC Class: 62M30

  25. arXiv:2310.03758  [pdf, other

    eess.SP cs.IT cs.LG stat.ML

    A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing

    Authors: Junren Chen, Jonathan Scarlett, Michael K. Ng, Zhaoqiang Liu

    Abstract: In generative compressed sensing (GCS), we want to recover a signal $\mathbf{x}^* \in \mathbb{R}^n$ from $m$ measurements ($m\ll n$) using a generative prior $\mathbf{x}^*\in G(\mathbb{B}_2^k(r))$, where $G$ is typically an $L$-Lipschitz continuous generative model and $\mathbb{B}_2^k(r)$ represents the radius-$r$ $\ell_2$-ball in $\mathbb{R}^k$. Under nonlinear measurements, most prior results ar… ▽ More

    Submitted 9 October, 2023; v1 submitted 25 September, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  26. arXiv:2309.12162  [pdf, other

    stat.ME cs.LG econ.EM math.ST

    Optimal Conditional Inference in Adaptive Experiments

    Authors: Jiafeng Chen, Isaiah Andrews

    Abstract: We study batched bandit experiments and consider the problem of inference conditional on the realized stop** time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the a… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: An extended abstract of this paper was presented at CODE@MIT 2021

  27. arXiv:2309.09878  [pdf, other

    stat.ME stat.AP

    Anomaly Detection in Spatio-Temporal Data: Theory and Application

    Authors: Ji Chen

    Abstract: This paper provides an overview of three notable approaches for detecting anomalies in spatio-temporal data. The three review methods are selected from the framework of multivariate statistical process control (SPC), scan statistics, and tensor decomposition. For each method, we first demonstrate its technical intricacies and then apply it to a real-world dataset, which is 300 images of solar acti… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  28. arXiv:2309.09103  [pdf, other

    stat.ME econ.EM math.ST

    Optimal Estimation under a Semiparametric Density Ratio Model

    Authors: Archer Gong Zhang, Jiahua Chen

    Abstract: In many statistical and econometric applications, we gather individual samples from various interconnected populations that undeniably exhibit common latent structures. Utilizing a model that incorporates these latent structures for such data enhances the efficiency of inferences. Recently, many researchers have been adopting the semiparametric density ratio model (DRM) to address the presence of… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  29. arXiv:2309.09032  [pdf, other

    cs.IT cs.LG eess.SP stat.ML

    Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors

    Authors: Junren Chen, Shuai Huang, Michael K. Ng, Zhaoqiang Liu

    Abstract: The problem of recovering a signal $\boldsymbol{x} \in \mathbb{R}^n$ from a quadratic system $\{y_i=\boldsymbol{x}^\top\boldsymbol{A}_i\boldsymbol{x},\ i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol{A}_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol{A}_i$, this paper addresses the high-d… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  30. arXiv:2309.02426  [pdf

    stat.ML cs.LG

    Monotone Tree-Based GAMI Models by Adapting XGBoost

    Authors: Linwei Hu, Soroush Aramideh, Jie Chen, Vijayan N. Nair

    Abstract: Recent papers have used machine learning architecture to fit low-order functional ANOVA models with main effects and second-order interactions. These GAMI (GAM + Interaction) models are directly interpretable as the functional main effects and interactions can be easily plotted and visualized. Unfortunately, it is not easy to incorporate the monotonicity requirement into the existing GAMI models b… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 12 pages

  31. arXiv:2308.16059  [pdf, ps, other

    stat.ML cs.IT cs.LG

    A Parameter-Free Two-Bit Covariance Estimator with Improved Operator Norm Error Rate

    Authors: Junren Chen, Michael K. Ng

    Abstract: A covariance matrix estimator using two bits per entry was recently developed by Dirksen, Maly and Rauhut [Annals of Statistics, 50(6), pp. 3538-3562]. The estimator achieves near minimax rate for general sub-Gaussian distributions, but also suffers from two downsides: theoretically, there is an essential gap on operator norm error between their estimator and sample covariance when the diagonal of… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 24 pages, 2 figures

  32. Deep Generative Imputation Model for Missing Not At Random Data

    Authors: Jialei Chen, Yuanbo Xu, Pengyang Wang, Yongjian Yang

    Abstract: Data analysis usually suffers from the Missing Not At Random (MNAR) problem, where the cause of the value missing is not fully observed. Compared to the naive Missing Completely At Random (MCAR) problem, it is more in line with the realistic scenario whereas more complex and challenging. Existing statistical methods model the MNAR mechanism by different decomposition of the joint distribution of t… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  33. arXiv:2307.15802  [pdf, ps, other

    stat.AP

    Gender Inclusive Methods in Studies of STEM Practitioners

    Authors: Kaitlin Rasmussen, Jocelyne Chen, Rebecca L. Colquhoun, Sophia Frentz, Laurel Hiatt, Aiden James Kosciesza, Charlotte Olsen, Theo J. O'Neill, Vic Zamloot, Beckett E. Strauss

    Abstract: Gender inequity is one of the biggest challenges facing the STEM workforce. While there are many studies that look into gender disparities within STEM and academia, the majority of these have been designed and executed by those unfamiliar with research in sociology and gender studies. They adopt a normative view of gender as a binary choice of 'male' or 'female,' leaving individuals whose genders… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: 33 pages (15 pages plus references), 1 table. Follows work of arXiv:1907.04893 and arXiv:1907.04893

  34. arXiv:2307.11685  [pdf, other

    q-fin.TR cs.LG stat.ML

    Towards Generalizable Reinforcement Learning for Trade Execution

    Authors: Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao

    Abstract: Optimized trade execution is to sell (or buy) a given amount of assets in a given time with the lowest possible trading cost. Recently, reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. However, we find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. In this paper, we provid… ▽ More

    Submitted 11 May, 2023; originally announced July 2023.

    Comments: Accepted by IJCAI-23

  35. arXiv:2307.06442  [pdf, other

    cs.LG cs.DC cs.MA stat.ML

    On Collaboration in Distributed Parameter Estimation with Resource Constraints

    Authors: Yu-Zhen Janice Chen, Daniel S. Menasché, Don Towsley

    Abstract: We study sensor/agent data collection and collaboration policies for parameter estimation, accounting for resource constraints and correlation between observations collected by distinct sensors/agents. Specifically, we consider a group of sensors/agents each samples from different variables of a multivariate Gaussian distribution and has different estimation objectives, and we formulate a sensor/a… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  36. arXiv:2307.05050  [pdf

    stat.AP

    Considerations for Master Protocols Using External Controls

    Authors: Jie Chen, Xiaoyun, Li, Chengxing, Lu, Sammy Yuan, Godwin Yung, **g**g Ye, Hong Tian, Jianchang Lin

    Abstract: There has been an increasing use of master protocols in oncology clinical trials because of its efficiency and flexibility to accelerate cancer drug development. Depending on the study objective and design, a master protocol trial can be a basket trial, an umbrella trial, a platform trial, or any other form of trials in which multiple investigational products and/or subpopulations are studied unde… ▽ More

    Submitted 10 November, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

  37. arXiv:2307.00190  [pdf

    stat.AP

    Estimands in Real-World Evidence Studies

    Authors: Jie Chen, Daniel Scharfstein, Hongwei Wang, Binbing Yu, Yang Song, Weili He, John Scott, Xiwu Lin, Hana Lee

    Abstract: A Real-World Evidence (RWE) Scientific Working Group (SWG) of the American Statistical Association Biopharmaceutical Section (ASA BIOP) has been reviewing statistical considerations for the generation of RWE to support regulatory decision-making. As part of the effort, the working group is addressing estimands in RWE studies. Constructing the right estimand -- the target of estimation -- which ref… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  38. arXiv:2307.00189  [pdf

    stat.ME stat.AP

    A Direct Approach to Simultaneous Tests of Superiority and Noninferiority with Multiple Endpoints

    Authors: Wenfeng Chen, Naiqing Zhao, Guoyou Qin, Jie Chen

    Abstract: Simultaneous tests of superiority and non-inferiority hypotheses on multiple endpoints are often performed in clinical trials to demonstrate that a new treatment is superior over a control on at least one endpoint and non-inferior on the remaining endpoints. Existing methods tackle this problem by testing the superiority and non-inferiority hypotheses separately and control the Type I error rate e… ▽ More

    Submitted 29 September, 2023; v1 submitted 30 June, 2023; originally announced July 2023.

  39. arXiv:2306.11312  [pdf, other

    cs.DS cs.LG stat.ML

    Data Structures for Density Estimation

    Authors: Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal

    Abstract: We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$.… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: To appear at ICML'23

  40. arXiv:2306.08854  [pdf, other

    cs.LG cs.AI stat.CO stat.ML

    A Gromov--Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening

    Authors: Yifan Chen, Rentian Yao, Yun Yang, Jie Chen

    Abstract: Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a diffe… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: To appear at ICML 2023. Code is available at https://github.com/ychen-stat-ml/GW-Graph-Coarsening

  41. arXiv:2306.07566  [pdf, other

    stat.ML cs.LG

    Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach

    Authors: Jian Chen, Zhehao Li, Xiaojie Mao

    Abstract: We study the problem of learning with selectively labeled data, which arises when outcomes are only partially labeled due to historical decision-making. The labeled data distribution may substantially differ from the full population, especially when the historical decisions and the target outcome can be simultaneously affected by some unobserved factors. Consequently, learning with only the labele… ▽ More

    Submitted 23 June, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

  42. arXiv:2306.05511  [pdf, other

    stat.ME

    Causal Inference With Outcome-Dependent Missingness And Self-Censoring

    Authors: Jacob M Chen, Daniel Malinsky, Rohit Bhattacharya

    Abstract: We consider missingness in the context of causal inference when the outcome of interest may be missing. If the outcome directly affects its own missingness status, i.e., it is "self-censoring", this may lead to severely biased causal effect estimates. Miao et al. [2015] proposed the shadow variable method to correct for bias due to self-censoring; however, verifying the required model assumptions… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 15 pages. In proceedings of the 39th Conference on Uncertainty in Artificial Intelligence

  43. arXiv:2305.19575  [pdf, other

    math.OC cs.LG stat.ML

    On the Linear Convergence of Policy Gradient under Hadamard Parameterization

    Authors: Jiacai Liu, **chi Chen, Ke Wei

    Abstract: The convergence of deterministic policy gradient under the Hadamard parameterization is studied in the tabular setting and the linear convergence of the algorithm is established. To this end, we first show that the error decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based on this result, we further show that the algorithm has a faster local linear convergence rate after $k_0$ itera… ▽ More

    Submitted 25 November, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

  44. arXiv:2305.17284  [pdf, other

    cs.LG stat.ML

    GC-Flow: A Graph-Based Flow Network for Effective Clustering

    Authors: Tianchun Wang, Farzaneh Mirzazadeh, Xiang Zhang, Jie Chen

    Abstract: Graph convolutional networks (GCNs) are \emph{discriminative models} that directly model the class posterior $p(y|\mathbf{x})$ for semi-supervised classification of graph data. While being effective, as a representation learning approach, the node representations extracted from a GCN often miss useful information for effective clustering, because the objectives are different. In this work, we desi… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ICML 2023. Code is available at https://github.com/xztcwang/GCFlow

  45. arXiv:2305.15670  [pdf

    stat.ML cs.LG

    Interpretable Machine Learning based on Functional ANOVA Framework: Algorithms and Comparisons

    Authors: Linwei Hu, Vijayan N. Nair, Agus Sudjianto, Aijun Zhang, Jie Chen

    Abstract: In the early days of machine learning (ML), the emphasis was on develo** complex algorithms to achieve best predictive performance. To understand and explain the model results, one had to rely on post hoc explainability techniques, which are known to have limitations. Recently, with the recognition that interpretability is just as important, researchers are compromising on small increases in pre… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: 24 pages, 15 figures. arXiv admin note: substantial text overlap with arXiv:2207.06950

  46. arXiv:2304.14109  [pdf, other

    stat.ML cs.LG

    The Structurally Complex with Additive Parent Causality (SCARY) Dataset

    Authors: Jarry Chen, Haytham M. Fayek

    Abstract: Causal datasets play a critical role in advancing the field of causality. However, existing datasets often lack the complexity of real-world issues such as selection bias, unfaithful data, and confounding. To address this gap, we propose a new synthetic causal dataset, the Structurally Complex with Additive paRent causalitY (SCARY) dataset, which includes the following features. The dataset compri… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: 5 pages, 5 figures, accepted to CLeaR (Causal Learning and Reasoning) 2023

  47. arXiv:2304.06254  [pdf, other

    stat.ML cs.GT

    Fair Grading Algorithms for Randomized Exams

    Authors: Jiale Chen, Jason Hartline, Onno Zoeter

    Abstract: This paper studies grading algorithms for randomized exams. In a randomized exam, each student is asked a small number of random questions from a large question bank. The predominant grading rule is simple averaging, i.e., calculating grades by averaging scores on the questions each student is asked, which is fair ex-ante, over the randomized questions, but not fair ex-post, on the realized questi… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  48. arXiv:2304.04240  [pdf

    stat.ML cs.LG

    Data-driven multinomial random forest

    Authors: Junhao Chen, Xueli wang

    Abstract: In this article, we strengthen the proof methods of some previously weakly consistent variants of random forests into strongly consistent proof methods, and improve the data utilization of these variants, in order to obtain better theoretical properties and experimental performance. In addition, based on the multinomial random forest (MRF) and Bernoulli random forest (BRF), we propose a data-drive… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.15154

  49. arXiv:2303.13218  [pdf, other

    econ.EM stat.ME

    Functional-Coefficient Quantile Regression for Panel Data with Latent Group Structure

    Authors: Xiaorong Yang, Jia Chen, Degui Li, Runze Li

    Abstract: This paper considers estimating functional-coefficient models in panel quantile regression with individual effects, allowing the cross-sectional and temporal dependence for large panel observations. A latent group structure is imposed on the heterogenous quantile regression models so that the number of nonparametric functional coefficients to be estimated can be reduced considerably. With the prel… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

  50. arXiv:2303.09766  [pdf, other

    stat.ME

    A New Covariate Selection Strategy for High Dimensional Data in Causal Effect Estimation with Multivariate Treatments

    Authors: Juan Chen, Yingchun Zhou

    Abstract: Selection of covariates is crucial in the estimation of average treatment effects given observational data with high or even ultra-high dimensional pretreatment variables. Existing methods for this problem typically assume sparse linear models for both outcome and univariate treatment, and cannot handle situations with ultra-high dimensional covariates. In this paper, we propose a new covariate se… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.