Skip to main content

Showing 1–50 of 180 results for author: Ma, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.02178  [pdf

    stat.ME

    Reverse time-to-death as time-scale in time-to-event analysis for studies of advanced illness and palliative care

    Authors: Yin Bun Cheung, Xiangmei Ma, Isha Chaudhry, Nan Liu, Qingyuan Zhuang, Grace Meijuan Yang, Chetna Malhotra, Eric Andrew Finkelstein

    Abstract: Background: Incidence of adverse outcome events rises as patients with advanced illness approach end-of-life. Exposures that tend to occur near end-of-life, e.g., use of wheelchair, oxygen therapy and palliative care, may therefore be found associated with the incidence of the adverse outcomes. We propose a strategy for time-to-event analysis to mitigate the time-varying confounding. Methods: We p… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 22 pages (including 2 tables and 2 figures)

  2. arXiv:2405.16730  [pdf, other

    cs.LG cs.AI stat.AP

    Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

    Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

    Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  3. arXiv:2405.16564  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Contextual Linear Optimization with Bandit Feedback

    Authors: Yichun Hu, Nathan Kallus, Xiaojie Mao, Yanchen Wu

    Abstract: Contextual linear optimization (CLO) uses predictive observations to reduce uncertainty in random cost coefficients and thereby improve average-cost performance. An example is a stochastic shortest path with random edge costs (e.g., traffic) and predictive features (e.g., lagged traffic, weather). Existing work on CLO assumes the data has fully observed cost coefficient vectors, but in many applic… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2405.07138  [pdf, other

    stat.ME

    Large-dimensional Robust Factor Analysis with Group Structure

    Authors: Yong He, Xiaoyang Ma, Xingheng Wang, Yalin Wang

    Abstract: In this paper, we focus on exploiting the group structure for large-dimensional factor models, which captures the homogeneous effects of common factors on individuals within the same group. In view of the fact that datasets in macroeconomics and finance are typically heavy-tailed, we propose to identify the unknown group structure using the agglomerative hierarchical clustering algorithm and an in… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  5. arXiv:2404.19242  [pdf, other

    cs.CV eess.IV stat.ME

    A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems

    Authors: Xin Ma, Puchen Zhu, Xiao Li, Xiaoyin Zheng, Jianshu Zhou, Xuchen Wang, Kwok Wai Samuel Au

    Abstract: Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

  6. arXiv:2402.14840  [pdf, other

    cs.CL cs.AI stat.AP

    RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

    Authors: Congyun **, Ming Zhang, Xiaowei Ma, Li Yujiao, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, **jie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang

    Abstract: Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-Me… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 15 pages, 13 figures

  7. arXiv:2402.03954  [pdf, other

    stat.ME stat.ML

    Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous Missingness

    Authors: Xiaojun Mao, Hengfang Wang, Zhonglei Wang, Shu Yang

    Abstract: Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure:… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Journal of Computational and Graphical Statistics, 2023

  8. arXiv:2401.10474  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    LDReg: Local Dimensionality Regularized Self-Supervised Learning

    Authors: Hanxun Huang, Ricardo J. G. B. Campello, Sarah Monazam Erfani, Xingjun Ma, Michael E. Houle, James Bailey

    Abstract: Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous wor… ▽ More

    Submitted 14 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  9. arXiv:2401.02203  [pdf, other

    stat.ML cs.LG

    Robust bilinear factor analysis based on the matrix-variate $t$ distribution

    Authors: Xuan Ma, Jianhua Zhao, Changchun Shang, Fen Jiang, Philip L. H. Yu

    Abstract: Factor Analysis based on multivariate $t$ distribution ($t$fa) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, $t$fa is only applicable to vector data. When $t$fa is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for $t$fa: (i) the inherent matrix structure of the data is broken, a… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  10. arXiv:2401.01294  [pdf, other

    stat.ML cs.LG stat.ME

    Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

    Authors: Weidong Liu, Xiaojun Mao, Xiaofei Zhang, Xin Zhang

    Abstract: In recent years, privacy-preserving machine learning algorithms have attracted increasing attention because of their important applications in many scientific fields. However, in the literature, most privacy-preserving algorithms demand learning objectives to be strongly convex and Lipschitz smooth, which thus cannot cover a wide class of robust loss functions (e.g., quantile/least absolute loss).… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: IEEE Transactions on Information Forensics and Security, 2024

    MSC Class: 62J07

  11. arXiv:2312.10563  [pdf, other

    stat.ME math.ST

    Mediation Analysis with Mendelian Randomization and Efficient Multiple GWAS Integration

    Authors: Rita Qiuran Lyu, Chong Wu, Xinwei Ma, **gshen Wang

    Abstract: Mediation analysis is a powerful tool for studying causal pathways between exposure, mediator, and outcome variables of interest. While classical mediation analysis using observational data often requires strong and sometimes unrealistic assumptions, such as unconfoundedness, Mendelian Randomization (MR) avoids unmeasured confounding bias by employing genetic variations as instrumental variables.… ▽ More

    Submitted 17 May, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

  12. arXiv:2312.06883  [pdf, other

    stat.ME

    Adaptive Experiments Toward Learning Treatment Effect Heterogeneity

    Authors: Waverly Wei, Xinwei Ma, **gshen Wang

    Abstract: Understanding treatment effect heterogeneity has become an increasingly popular task in various fields, as it helps design personalized advertisements in e-commerce or targeted treatment in biomedical studies. However, most of the existing work in this research area focused on either analyzing observational data based on strong causal assumptions or conducting post hoc analyses of randomized contr… ▽ More

    Submitted 13 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

  13. arXiv:2312.05593  [pdf, ps, other

    econ.EM stat.ME

    Economic Forecasts Using Many Noises

    Authors: Yuan Liao, Xinjie Ma, Andreas Neuhierl, Zhentao Shi

    Abstract: This paper addresses a key question in economic forecasting: does pure noise truly lack predictive power? Economists typically conduct variable selection to eliminate noises from predictors. Yet, we prove a compelling result that in most economic forecasts, the inclusion of noises in predictions yields greater benefits than its exclusion. Furthermore, if the total number of predictors is not suffi… ▽ More

    Submitted 11 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  14. arXiv:2311.17605  [pdf, other

    stat.AP stat.ME

    Improving the Balance of Unobserved Covariates From Information Theory in Multi-Arm Randomization with Unequal Allocation Ratio

    Authors: Xingjian Ma, Yang Liu

    Abstract: Multi-arm randomization has increasingly widespread applications recently and it is also crucial to ensure that the distributions of important observed covariates as well as the potential unobserved covariates are similar and comparable among all the treatment. However, the theoretical properties of unobserved covariates imbalance in multi-arm randomization with unequal allocation ratio remains un… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 60 pages, 3 figures

  15. arXiv:2310.16290  [pdf, other

    stat.ME econ.EM

    Fair Adaptive Experiments

    Authors: Waverly Wei, Xinwei Ma, **gshen Wang

    Abstract: Randomized experiments have been the gold standard for assessing the effectiveness of a treatment or policy. The classical complete randomization approach assigns treatments based on a prespecified probability and may lead to inefficient use of data. Adaptive experiments improve upon complete randomization by sequentially learning and updating treatment assignment probabilities. However, their app… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  16. arXiv:2310.13969  [pdf, ps, other

    stat.ML cs.LG

    Distributed Linear Regression with Compositional Covariates

    Authors: Yue Chao, Lei Huang, Xuejun Ma

    Abstract: With the availability of extraordinarily huge data sets, solving the problems of distributed statistical methodology and computing for such data sets has become increasingly crucial in the big data area. In this paper, we focus on the distributed sparse penalized linear log-contrast model in massive compositional data. In particular, two distributed optimization techniques under centralized and de… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: 35 pages,2 figures

    MSC Class: 62-08 62-08 62-08 62-08 62-08 ACM Class: G.3

  17. arXiv:2310.05495  [pdf, other

    cs.LG stat.ML

    On the Convergence of Federated Averaging under Partial Participation for Over-parameterized Neural Networks

    Authors: Xin Liu, Wei li, Dazhi Zhan, Yu Pan, Xin Ma, Yu Ding, Zhisong Pan

    Abstract: Federated learning (FL) is a widely employed distributed paradigm for collaboratively training machine learning models from multiple clients without sharing local data. In practice, FL encounters challenges in dealing with partial client participation due to the limited bandwidth, intermittent connection and strict synchronized delay. Simultaneously, there exist few theoretical convergence guarant… ▽ More

    Submitted 2 February, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  18. arXiv:2310.05166  [pdf, other

    cs.LG stat.ML

    A Corrected Expected Improvement Acquisition Function Under Noisy Observations

    Authors: Han Zhou, Xingchen Ma, Matthew B Blaschko

    Abstract: Sequential maximization of expected improvement (EI) is one of the most widely used policies in Bayesian optimization because of its simplicity and ability to handle noisy observations. In particular, the improvement function often uses the best posterior mean as the best incumbent in noisy settings. However, the uncertainty associated with the incumbent solution is often neglected in many analyti… ▽ More

    Submitted 13 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  19. arXiv:2310.03218  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Energy-Based Prior Model with Diffusion-Amortized MCMC

    Authors: Peiyu Yu, Yaxuan Zhu, Sirui Xie, Xiaojian Ma, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu

    Abstract: Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progres… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  20. arXiv:2308.03545  [pdf

    stat.AP

    A Causal Inference Approach to Eliminate the Impacts of Interfering Factors on Traffic Performance Evaluation

    Authors: Xiaobo Ma, Abolfazl Karimpour, Yao-Jan Wu

    Abstract: Before and after study frameworks are widely adopted to evaluate the effectiveness of transportation policies and emerging technologies. However, many factors such as seasonal factors, holidays, and lane closure might interfere with the evaluation process by inducing variation in traffic volume during the before and after periods. In practice, limited effort has been made to eliminate the effects… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  21. arXiv:2307.13793  [pdf, ps, other

    stat.ME cs.LG econ.EM math.ST stat.ML

    Source Condition Double Robust Inference on Functionals of Inverse Problems

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems. Any such parameter admits a doubly robust representation that depends on the solution to a dual linear inverse problem, where the dual solution can be thought as a generalization of the inverse propensity function. We provide the first source condition double robust inference method that ens… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  22. arXiv:2307.08079  [pdf, other

    stat.ML cs.LG stat.ME

    Flexible and efficient spatial extremes emulation via variational autoencoders

    Authors: Likun Zhang, Xiaoyu Ma, Christopher K. Wikle, Raphaël Huser

    Abstract: Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we aim to push the boundaries on computation and modeling of high-dimensional spatia… ▽ More

    Submitted 9 May, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: 30 pages, 8 figures

    MSC Class: 68T07 (Primary); 60G70; 62H11 (Secondary)

  23. arXiv:2306.10395  [pdf, other

    stat.ML cs.LG

    Distributed Semi-Supervised Sparse Statistical Inference

    Authors: Jiyuan Tu, Weidong Liu, Xiaojun Mao, Mingyue Xu

    Abstract: The debiased estimator is a crucial tool in statistical inference for high-dimensional model parameters. However, constructing such an estimator involves estimating the high-dimensional inverse Hessian matrix, incurring significant computational costs. This challenge becomes particularly acute in distributed setups, where traditional methods necessitate computing a debiased estimator on every mach… ▽ More

    Submitted 15 December, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

    Comments: IEEE Transactions on Information Theory, 2023

  24. arXiv:2306.07566  [pdf, other

    stat.ML cs.LG

    Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach

    Authors: Jian Chen, Zhehao Li, Xiaojie Mao

    Abstract: We study the problem of learning with selectively labeled data, which arises when outcomes are only partially labeled due to historical decision-making. The labeled data distribution may substantially differ from the full population, especially when the historical decisions and the target outcome can be simultaneously affected by some unobserved factors. Consequently, learning with only the labele… ▽ More

    Submitted 23 June, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

  25. arXiv:2305.10934  [pdf, ps, other

    econ.TH econ.EM stat.ME

    Context-Dependent Heterogeneous Preferences: A Comment on Barseghyan and Molinari (2023)

    Authors: Matias D. Cattaneo, Xinwei Ma, Yusufcan Masatlioglu

    Abstract: Barseghyan and Molinari (2023) give sufficient conditions for semi-nonparametric point identification of parameters of interest in a mixture model of decision-making under risk, allowing for unobserved heterogeneity in utility functions and limited consideration. A key assumption in the model is that the heterogeneity of risk preferences is unobservable but context-independent. In this comment, we… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  26. arXiv:2305.04140  [pdf, other

    stat.ME

    A Nonparametric Mixed-Effects Mixture Model for Patterns of Clinical Measurements Associated with COVID-19

    Authors: Xiaoran Ma, Wensheng Guo, Mengyang Gu, Len Usvyat, Peter Kotanko, Yuedong Wang

    Abstract: Some patients with COVID-19 show changes in signs and symptoms such as temperature and oxygen saturation days before being positively tested for SARS-CoV-2, while others remain asymptomatic. It is important to identify these subgroups and to understand what biological and clinical predictors are related to these subgroups. This information will provide insights into how the immune system may respo… ▽ More

    Submitted 31 May, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

  27. arXiv:2304.02022  [pdf, other

    cs.LG stat.ME

    Online Joint Assortment-Inventory Optimization under MNL Choices

    Authors: Yong Liang, Xiaojie Mao, Shiyuan Wang

    Abstract: We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model, and the attraction parameters are unknown a priori. The retailer makes periodic assortment and inventory decisions to dynamically learn from the realized demands about the attraction parameters while maximizing the expect… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  28. arXiv:2303.11536  [pdf, other

    cs.LG cs.AI cs.CV math.ST stat.ML

    Indeterminate Probability Neural Network

    Authors: Tao Yang, Chuang Liu, Xiaofeng Ma, Weijia Lu, Ning Wu, Bingyang Li, Zhifei Yang, Peng Liu, Lin Sun, Xiaodong Zhang, Can Zhang

    Abstract: We propose a new general model called IPNN - Indeterminate Probability Neural Network, which combines neural network and probability theory together. In the classical probability theory, the calculation of probability is based on the occurrence of events, which is hardly used in current neural networks. In this paper, we propose a new general probability theory, which is an extension of classical… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: 13 pages

  29. arXiv:2302.10470  [pdf, other

    stat.ME math.ST

    Breaking the Winner's Curse in Mendelian Randomization: Rerandomized Inverse Variance Weighted Estimator

    Authors: Xinwei Ma, **gshen Wang, Chong Wu

    Abstract: Developments in genome-wide association studies and the increasing availability of summary genetic association data have made the application of two-sample Mendelian Randomization (MR) with summary data increasingly popular. Conventional two-sample MR methods often employ the same sample for selecting relevant genetic variants and for constructing final causal estimates. Such a practice often lead… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  30. arXiv:2302.09834  [pdf, other

    stat.ML cs.LG stat.ME

    Transductive Matrix Completion with Calibration for Multi-Task Learning

    Authors: Hengfang Wang, Yasi Zhang, Xiaojun Mao, Zhonglei Wang

    Abstract: Multi-task learning has attracted much attention due to growing multi-purpose research with multiple related data sources. Moreover, transduction with matrix completion is a useful method in multi-label learning. In this paper, we propose a transductive matrix completion algorithm that incorporates a calibration constraint for the features under the multi-task learning framework. The proposed algo… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted by IEEE ICASSP 2023

  31. arXiv:2302.07437  [pdf, other

    stat.ML cs.LG

    Bridging the Usability Gap: Theoretical and Methodological Advances for Spectral Learning of Hidden Markov Models

    Authors: Xiaoyuan Ma, Jordan Rodu

    Abstract: The Baum-Welch (B-W) algorithm is the most widely accepted method for inferring hidden Markov models (HMM). However, it is prone to getting stuck in local optima, and can be too slow for many real-time applications. Spectral learning of HMMs (SHMM), based on the method of moments (MOM) has been proposed in the literature to overcome these obstacles. Despite its promises, asymptotic theory for SHMM… ▽ More

    Submitted 29 April, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  32. arXiv:2302.05404  [pdf, ps, other

    stat.ML cs.LG econ.EM math.ST stat.ME

    Minimax Instrumental Variable Regression and $L_2$ Convergence Guarantees without Identification or Closedness

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: In this paper, we study nonparametric estimation of instrumental variable (IV) regressions. Recently, many flexible machine learning methods have been developed for instrumental variable estimation. However, these methods have at least one of the following limitations: (1) restricting the IV regression to be uniquely identified; (2) only obtaining estimation error rates in terms of pseudometrics (… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Under review

  33. arXiv:2301.11721  [pdf, other

    stat.ML cs.AI cs.LG

    Single-Trajectory Distributionally Robust Reinforcement Learning

    Authors: Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan Zhou

    Abstract: As a framework for sequential decision-making, Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI). However, RL is often criticized for having the same training environment as the test one, which also hinders its application in the real world. To mitigate this problem, Distributionally Robust RL (DRRL) is proposed to improve the… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: First two authors contribute equally

  34. arXiv:2211.09295  [pdf, other

    stat.ML cs.LG

    Testing for context-dependent changes in neural encoding in naturalistic experiments

    Authors: Yenho Chen, Carl W. Harris, Xiaoyu Ma, Zheng Li, Francisco Pereira, Charles Y. Zheng

    Abstract: We propose a decoding-based approach to detect context effects on neural codes in longitudinal neural recording data. The approach is agnostic to how information is encoded in neural activity, and can control for a variety of possible confounding factors present in the data. We demonstrate our approach by determining whether it is possible to decode location encoding from prefrontal cortex in the… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: 39 pages, 13 figures

  35. arXiv:2211.07429  [pdf, other

    q-bio.NC cs.LG eess.IV stat.CO stat.ME

    Accounting for Temporal Variability in Functional Magnetic Resonance Imaging Improves Prediction of Intelligence

    Authors: Yang Li, Xin Ma, Raj Sunderraman, Shihao Ji, Suprateek Kundu

    Abstract: Neuroimaging-based prediction methods for intelligence and cognitive abilities have seen a rapid development in literature. Among different neuroimaging modalities, prediction based on functional connectivity (FC) has shown great promise. Most literature has focused on prediction using static FC, but there are limited investigations on the merits of such analysis compared to prediction based on dy… ▽ More

    Submitted 14 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

  36. arXiv:2210.15008  [pdf, other

    stat.ME math.ST stat.ML

    High-dimensional Measurement Error Models for Lipschitz Loss

    Authors: Xin Ma, Suprateek Kundu

    Abstract: Recently emerging large-scale biomedical data pose exciting opportunities for scientific discoveries. However, the ultrahigh dimensionality and non-negligible measurement errors in the data may create difficulties in estimation. There are limited methods for high-dimensional covariates with measurement error, that usually require knowledge of the noise distribution and focus on linear or generaliz… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  37. arXiv:2209.13855  [pdf, ps, other

    stat.ME stat.AP stat.CO

    Nonparametric augmented probability weighting with sparsity

    Authors: Xin He, Xiaojun Mao, Zhonglei Wang

    Abstract: Nonresponse frequently arises in practice, and simply ignoring it may lead to erroneous inference. Besides, the number of collected covariates may increase as the sample size in modern statistics, so parametric imputation or propensity score weighting usually leads to inefficiency without consideration of sparsity. In this paper, we propose a nonparametric imputation method with sparse learning by… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  38. arXiv:2209.06620  [pdf, other

    cs.LG cs.AI stat.ML

    Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

    Authors: Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou

    Abstract: Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e.g., a simulator). This paper attempts to address these issues simultaneously with distributionally robust offline RL, where we learn a d… ▽ More

    Submitted 27 January, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: First two authors contribute equally

  39. arXiv:2209.04419  [pdf, other

    cs.CR cs.LG stat.ME stat.ML

    Majority Vote for Distributed Differentially Private Sign Selection

    Authors: Weidong Liu, Jiyuan Tu, Xiaojun Mao, Xi Chen

    Abstract: Privacy-preserving data analysis has become more prevalent in recent years. In this study, we propose a distributed group differentially private Majority Vote mechanism, for the sign selection problem in a distributed setup. To achieve this, we apply the iterative peeling to the stability function and use the exponential mechanism to recover the signs. For enhanced applicability, we study the priv… ▽ More

    Submitted 4 June, 2024; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: 41 pages, 5 figures

  40. arXiv:2208.08291  [pdf, ps, other

    stat.ME econ.EM math.ST stat.ML

    Inference on Strongly Identified Functionals of Weakly Identified Functions

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: In a variety of applications, including nonparametric instrumental variable (NPIV) analysis, proximal causal inference under unmeasured confounding, and missing-not-at-random data with shadow variables, we are interested in inference on a continuous linear functional (e.g., average causal effects) of nuisance function (e.g., NPIV regression) defined by conditional moment restrictions. These nuisan… ▽ More

    Submitted 30 June, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: This supersedes the previous version titled "Debiased Inference on Identified Linear Functionals of Underidentified Nuisances via Penalized Minimax Estimation"

  41. arXiv:2207.04083  [pdf, other

    stat.ME

    Sparse additive models in high dimensions with wavelets

    Authors: Sylvain Sardy, Xiaoyu Ma

    Abstract: In multivariate regression, when covariates are numerous, it is often reasonable to assume that only a small number of them has predictive information. In some medical applications for instance, it is believed that only a few genes out of thousands are responsible for cancers. In that case, the aim is not only to propose a good fit, but also to select the relevant covariates (genes). We propose to… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: 28 pages, 3 figures and 2 tables

  42. arXiv:2206.02829  [pdf, other

    cs.LG cs.AI stat.ML

    RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

    Authors: Rui Yang, Chenjia Bai, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han

    Abstract: Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation de… ▽ More

    Submitted 22 October, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted by Advances in Neural Information Processing Systems (NeurIPS) 2022

  43. arXiv:2204.10375  [pdf, other

    stat.CO stat.AP stat.ME

    lpcde: Estimation and Inference for Local Polynomial Conditional Density Estimators

    Authors: Matias D. Cattaneo, Rajita Chandak, Michael Jansson, Xinwei Ma

    Abstract: This paper discusses the R package lpcde, which stands for local polynomial conditional density estimation. It implements the kernel-based local polynomial smoothing methods introduced in Cattaneo, Chandak, Jansson, Ma (2024( for statistical estimation and inference of conditional distributions, densities, and derivatives thereof. The package offers mean square error optimal bandwidth selection an… ▽ More

    Submitted 29 February, 2024; v1 submitted 21 April, 2022; originally announced April 2022.

  44. arXiv:2204.10359  [pdf, other

    math.ST econ.EM stat.ME

    Boundary Adaptive Local Polynomial Conditional Density Estimators

    Authors: Matias D. Cattaneo, Rajita Chandak, Michael Jansson, Xinwei Ma

    Abstract: We begin by introducing a class of conditional density estimators based on local polynomial techniques. The estimators are boundary adaptive and easy to implement. We then study the (pointwise and) uniform statistical properties of the estimators, offering characterizations of both probability concentration and distributional approximation. In particular, we establish uniform convergence rates in… ▽ More

    Submitted 17 December, 2023; v1 submitted 21 April, 2022; originally announced April 2022.

  45. arXiv:2204.09193  [pdf, other

    stat.ME

    Functional Calibration under Non-Probability Survey Sampling

    Authors: Zhonglei Wang, Xiaojun Mao, Jae Kwang Kim

    Abstract: Non-probability sampling is prevailing in survey sampling, but ignoring its selection bias leads to erroneous inferences. We offer a unified nonparametric calibration method to estimate the sampling weights for a non-probability sample by calibrating functions of auxiliary variables in a reproducing kernel Hilbert space. The consistency and the limiting distribution of the proposed estimator are e… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

  46. arXiv:2202.09667  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

    Authors: Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou

    Abstract: Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where online experimentation is limited. However, depending entirely on logged data, OPE/L is sensitive to environment distribution shifts -- discrepancies between the data-generating environment and that where policies are deployed. \citet{si2020distributional} prop… ▽ More

    Submitted 18 July, 2022; v1 submitted 19 February, 2022; originally announced February 2022.

    Comments: Short Talk at ICML 2022

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:10598-10632, 2022

  47. arXiv:2202.07234  [pdf, other

    stat.ME econ.EM stat.ML

    Long-term Causal Inference Under Persistent Confounding via Data Combination

    Authors: Guido Imbens, Nathan Kallus, Xiaojie Mao, Yuhao Wang

    Abstract: We study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Since the long-term outcome is observed only after a long delay, it is not measured in the experimental data, but only recorded in the observational data. However, both types of data include observations of some short-term outcomes. In this paper, we uniquely tackl… ▽ More

    Submitted 14 May, 2024; v1 submitted 15 February, 2022; originally announced February 2022.

  48. arXiv:2202.05498  [pdf, other

    stat.ML cs.LG stat.ME

    Fast and Robust Sparsity Learning over Networks: A Decentralized Surrogate Median Regression Approach

    Authors: Weidong Liu, Xiaojun Mao, Xin Zhang

    Abstract: Decentralized sparsity learning has attracted a significant amount of attention recently due to its rapidly growing applications. To obtain the robust and sparse estimators, a natural idea is to adopt the non-smooth median loss combined with a $\ell_1$ sparsity regularizer. However, most of the existing methods suffer from slow convergence performance caused by the {\em double} non-smooth objectiv… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

    Comments: IEEE Transactions on Signal Processing, 2022

  49. arXiv:2201.08652  [pdf, other

    stat.ML cs.LG

    A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

    Authors: Xiaoyu Ma, Sylvain Sardy, Nick Hengartner, Nikolai Bobenko, Yen Ting Lin

    Abstract: To fit sparse linear associations, a LASSO sparsity inducing penalty with a single hyperparameter provably allows to recover the important features (needles) with high probability in certain regimes even if the sample size is smaller than the dimension of the input vector (haystack). More recently learners known as artificial neural networks (ANN) have shown great successes in many machine learnin… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  50. arXiv:2201.06821  [pdf, other

    cs.LG stat.CO

    Nonparametric Feature Selection by Random Forests and Deep Neural Networks

    Authors: Xiaojun Mao, Liuhua Peng, Zhonglei Wang

    Abstract: Random forests are a widely used machine learning algorithm, but their computational efficiency is undermined when applied to large-scale datasets with numerous instances and useless features. Herein, we propose a nonparametric feature selection algorithm that incorporates random forests and deep neural networks, and its theoretical properties are also investigated under regularity conditions. Usi… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.