Skip to main content

Showing 1–50 of 123 results for author: Zhao, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.01621  [pdf, other

    cs.LG q-bio.QM stat.ME stat.ML

    Deciphering interventional dynamical causality from non-intervention systems

    Authors: Jifan Shi, Yang Li, Juan Zhao, Siyang Leng, Kazuyuki Aihara, Luonan Chen, Wei Lin

    Abstract: Detecting and quantifying causality is a focal topic in the fields of science, engineering, and interdisciplinary studies. However, causal studies on non-intervention systems attract much attention but remain extremely challenging. To address this challenge, we propose a framework named Interventional Dynamical Causality (IntDC) for such non-intervention systems, along with its computational crite… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  2. arXiv:2406.04150  [pdf, other

    stat.ME stat.ML

    A novel robust meta-analysis model using the $t$ distribution for outlier accommodation and detection

    Authors: Yue Wang, Jianhua Zhao, Fen Jiang, Lei Shi, Jianxin Pan

    Abstract: Random effects meta-analysis model is an important tool for integrating results from multiple independent studies. However, the standard model is based on the assumption of normal distributions for both random effects and within-study errors, making it susceptible to outlying studies. Although robust modeling using the $t$ distribution is an appealing idea, the existing work, that explores the use… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures

    MSC Class: 62P10 ACM Class: I.2.6

  3. arXiv:2406.03849  [pdf

    cs.LG stat.AP stat.ML

    A Noise-robust Multi-head Attention Mechanism for Formation Resistivity Prediction: Frequency Aware LSTM

    Authors: Yongan Zhang, Junfeng Zhao, Jian Li, Xuanran Wang, Youzhuang Sun, Yuntian Chen, Dongxiao Zhang

    Abstract: The prediction of formation resistivity plays a crucial role in the evaluation of oil and gas reservoirs, identification and assessment of geothermal energy resources, groundwater detection and monitoring, and carbon capture and storage. However, traditional well logging techniques fail to measure accurate resistivity in cased boreholes, and the transient electromagnetic method for cased borehole… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2406.00701  [pdf, other

    math.ST stat.ME

    Profiled Transfer Learning for High Dimensional Linear Model

    Authors: Ziqian Lin, Junlong Zhao, Fang Wang, Hansheng Wang

    Abstract: We develop here a novel transfer learning methodology called Profiled Transfer Learning (PTL). The method is based on the \textit{approximate-linear} assumption between the source and target parameters. Compared with the commonly assumed \textit{vanishing-difference} assumption and \textit{low-rank} assumption in the literature, the \textit{approximate-linear} assumption is more flexible and less… ▽ More

    Submitted 5 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  5. arXiv:2404.04709  [pdf, other

    econ.GN stat.AP

    Two-Sided Flexibility in Platforms

    Authors: Daniel Freund, Sébastien Martin, Jiayu Kamessi Zhao

    Abstract: Flexibility is a cornerstone of operations management, crucial to hedge stochasticity in product demands, service requirements, and resource allocation. In two-sided platforms, flexibility is also two-sided and can be viewed as the compatibility of agents on one side with agents on the other side. Platform actions often influence the flexibility on either the demand or the supply side. But how sho… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  6. arXiv:2403.07288  [pdf, other

    stat.ME

    Efficient and Model-Agnostic Parameter Estimation Under Privacy-Preserving Post-randomization Data

    Authors: Qinglong Tian, Jiwei Zhao

    Abstract: Protecting individual privacy is crucial when releasing sensitive data for public use. While data de-identification helps, it is not enough. This paper addresses parameter estimation in scenarios where data are perturbed using the Post-Randomization Method (PRAM) to enhance privacy. Existing methods for parameter estimation under PRAM data suffer from limitations like being parameter-specific, mod… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  7. arXiv:2401.16410  [pdf, other

    stat.ML cs.LG

    ReTaSA: A Nonparametric Functional Estimation Approach for Addressing Continuous Target Shift

    Authors: Hwanwoo Kim, Xin Zhang, Jiwei Zhao, Qinglong Tian

    Abstract: The presence of distribution shifts poses a significant challenge for deploying modern machine learning models in real-world applications. This work focuses on the target shift problem in a regression setting (Zhang et al., 2013; Nguyen et al., 2016). More specifically, the target variable y (also known as the response variable), which is continuous, has different marginal distributions in the tra… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024

  8. arXiv:2401.09259  [pdf, other

    math.NA math.DS stat.ML

    Mitigating distribution shift in machine learning-augmented hybrid simulation

    Authors: Jiaxi Zhao, Qianxiao Li

    Abstract: We study the problem of distribution shift generally arising in machine-learning augmented hybrid simulation, where parts of simulation algorithms are replaced by data-driven surrogates. We first establish a mathematical framework to understand the structure of machine-learning augmented hybrid simulation problems, and the cause and effect of the associated distribution shift. We show correlations… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    MSC Class: 68T99; 65M15; 37M05

  9. arXiv:2401.07267  [pdf, other

    stat.ME

    Inference for high-dimensional linear expectile regression with de-biased method

    Authors: Xiang Li, Yu-Ning Li, Li-Xin Zhang, Jun Zhao

    Abstract: In this paper, we address the inference problem in high-dimensional linear expectile regression. We transform the expectile loss into a weighted-least-squares form and apply a de-biased strategy to establish Wald-type tests for multiple constraints within a regularized framework. Simultaneously, we construct an estimator for the pseudo-inverse of the generalized Hessian matrix in high dimension wi… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: 34 pages

    MSC Class: 62F05; 62F12; 62J12

  10. arXiv:2401.07000  [pdf, other

    stat.ME stat.AP

    Counterfactual Slope and Its Applications to Social Stratification

    Authors: Ang Yu, Jiwei Zhao

    Abstract: This paper addresses two prominent theses in social stratification research, the great equalizer thesis and Mare's (1980) school transition thesis. Both theses are premised on a descriptive regularity: the association between socioeconomic background and an outcome variable changes when conditioning on an intermediate treatment. The interpretation of this descriptive regularity is complicated by s… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  11. arXiv:2401.02203  [pdf, other

    stat.ML cs.LG

    Robust bilinear factor analysis based on the matrix-variate $t$ distribution

    Authors: Xuan Ma, Jianhua Zhao, Changchun Shang, Fen Jiang, Philip L. H. Yu

    Abstract: Factor Analysis based on multivariate $t$ distribution ($t$fa) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, $t$fa is only applicable to vector data. When $t$fa is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for $t$fa: (i) the inherent matrix structure of the data is broken, a… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  12. arXiv:2311.14220  [pdf, other

    stat.ME cs.LG stat.ML

    Assumption-lean and Data-adaptive Post-Prediction Inference

    Authors: Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, Qiongshi Lu

    Abstract: A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be both costly and labor-intensive to obtain. With the rapid development of machine learning (ML), scientists have relied on ML algorithms to predict these gold-standard outcomes with easily obtained covariates. However, these predicted outcomes are often used directly in subsequent st… ▽ More

    Submitted 6 February, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  13. arXiv:2311.07972  [pdf, other

    stat.ME

    Residual Importance Weighted Transfer Learning For High-dimensional Linear Regression

    Authors: Junlong Zhao, Shengbin Zheng, Chenlei Leng

    Abstract: Transfer learning is an emerging paradigm for leveraging multiple sources to improve the statistical inference on a single target. In this paper, we propose a novel approach named residual importance weighted transfer learning (RIW-TL) for high-dimensional linear models built on penalized likelihood. Compared to existing methods such as Trans-Lasso that selects sources in an all-in-all-out manner,… ▽ More

    Submitted 3 January, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  14. arXiv:2310.07990  [pdf

    q-bio.GN cs.IR cs.LG stat.AP

    Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

    Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

    Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 19 pages, 3 figures

  15. arXiv:2309.12997  [pdf, other

    math.PR math.NA stat.ML

    Scaling Limits of the Wasserstein information matrix on Gaussian Mixture Models

    Authors: Wuchen Li, Jiaxi Zhao

    Abstract: We consider the Wasserstein metric on the Gaussian mixture models (GMMs), which is defined as the pullback of the full Wasserstein metric on the space of smooth probability distributions with finite second moment. It derives a class of Wasserstein metrics on probability simplices over one-dimensional bounded homogeneous lattices via a scaling limit of the Wasserstein metric on GMMs. Specifically,… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: 32 pages, 3 figures

    MSC Class: 62B11; 41A60

  16. arXiv:2309.08808  [pdf, other

    stat.ME econ.EM

    Adaptive Neyman Allocation

    Authors: **glong Zhao

    Abstract: In experimental design, Neyman allocation refers to the practice of allocating subjects into treated and control groups, potentially in unequal numbers proportional to their respective standard deviations, with the objective of minimizing the variance of the treatment effect estimator. This widely recognized approach increases statistical power in scenarios where the treated and control groups hav… ▽ More

    Submitted 21 September, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

  17. arXiv:2308.08152  [pdf, other

    econ.EM stat.ME

    Estimating Effects of Long-Term Treatments

    Authors: Shan Huang, Chen Wang, Yuan Yuan, **glong Zhao, **g**g Zhang

    Abstract: Estimating the effects of long-term treatments in A/B testing presents a significant challenge. Such treatments -- including updates to product functions, user interface designs, and recommendation algorithms -- are intended to remain in the system for a long period after their launches. On the other hand, given the constraints of conducting long-term experiments, practitioners often rely on short… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  18. arXiv:2307.12226  [pdf, other

    cs.LG cs.AI stat.ML

    Geometry-Aware Adaptation for Pretrained Models

    Authors: Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala

    Abstract: Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of… ▽ More

    Submitted 27 November, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  19. arXiv:2307.04250  [pdf, ps, other

    stat.ME

    Doubly Flexible Estimation under Label Shift

    Authors: Seong-ho Lee, Yanyuan Ma, Jiwei Zhao

    Abstract: In studies ranging from clinical medicine to policy research, complete data are usually available from a population $\mathscr{P}$, but the quantity of interest is often sought for a related but different population $\mathscr{Q}$ which only has partial data. In this paper, we consider the setting that both outcome $Y$ and covariate ${\bf X}$ are available from $\mathscr{P}$ whereas only ${\bf X}$ i… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

  20. arXiv:2307.01908  [pdf, other

    stat.ME

    Efficient Estimation of Average Treatment Effect on the Treated under Endogenous Treatment Assignment

    Authors: Trinetri Ghosh, Menggang Yu, Jiwei Zhao

    Abstract: In this paper, we consider estimation of average treatment effect on the treated (ATT), an interpretable and relevant causal estimand to policy makers when treatment assignment is endogenous. By considering shadow variables that are unrelated to the treatment assignment but related to interested outcomes, we establish identification of the ATT. Then we focus on efficient estimation of the ATT by c… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 34 pages, 2 figures

  21. arXiv:2307.00205  [pdf, other

    stat.ME

    A Transparent and Nonlinear Method for Variable Selection

    Authors: Keyao Wang, Huiwen Wang, Jichang Zhao, Lihong Wang

    Abstract: Variable selection is a procedure to attain the truly important predictors from inputs. Complex nonlinear dependencies and strong coupling pose great challenges for variable selection in high-dimensional data. In addition, real-world applications have increased demands for interpretability of the selection process. A pragmatic approach should not only attain the most predictive covariates, but als… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  22. arXiv:2306.06443  [pdf, other

    stat.ME stat.ML

    Sufficient Identification Conditions and Semiparametric Estimation under Missing Not at Random Mechanisms

    Authors: Anna Guo, Jiwei Zhao, Razieh Nabi

    Abstract: Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data, where the missingness mechanism is dependent on the missing values themselves even conditioned on the observed data. Here, we consider a MNAR model that generalizes several prior popular MNAR models in two ways: first, it is less restrictive in terms of statistical independence assumptions im… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Journal ref: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), 2023

  23. arXiv:2305.19123  [pdf, other

    stat.ML cs.LG math.ST

    ELSA: Efficient Label Shift Adaptation through the Lens of Semiparametric Models

    Authors: Qinglong Tian, Xin Zhang, Jiwei Zhao

    Abstract: We study the domain adaptation problem with label shift in this work. Under the label shift context, the marginal distribution of the label varies across the training and testing datasets, while the conditional distribution of features given the label is the same. Traditional label shift adaptation methods either suffer from large estimation errors or require cumbersome post-prediction calibration… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  24. arXiv:2305.15545  [pdf, other

    stat.AP

    Reconstructing Transit Vehicle Trajectory Using High-Resolution GPS Data

    Authors: Yuzhu Huang, Awad Abdelhalim, Anson Stewart, **hua Zhao, Haris Koutsopoulos

    Abstract: High-resolution location ("heartbeat") data of transit fleet vehicles is a relatively new data source for many transit agencies. On its surface, the heartbeat data can provide a wealth of information about all operational details of a recorded transit vehicle trip, from its location trajectory to its speed and acceleration profiles. Previous studies have mainly focused on decomposing the total tri… ▽ More

    Submitted 15 August, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 7 pages, to be published in IEEE ITSC-2023

  25. arXiv:2305.11323  [pdf, other

    stat.ME cs.CY

    Cumulative differences between paired samples

    Authors: Isabel Kloumann, Hannah Korevaar, Chris McConnell, Mark Tygert, Jessica Zhao

    Abstract: The simplest, most common paired samples consist of observations from two populations, with each observed response from one population corresponding to an observed response from the other population at the same value of an ordinal covariate. The pair of observed responses (one from each population) at the same value of the covariate is known as a "matched pair" (with the matching based on the valu… ▽ More

    Submitted 8 April, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 19 pages, 9 figures

  26. arXiv:2303.06186  [pdf, other

    stat.AP

    The impacts of remote work on travel: insights from nearly three years of monthly surveys

    Authors: Nicholas S. Caros, Xiaotong Guo, Yunhan Zheng, **hua Zhao

    Abstract: Remote work has expanded dramatically since 2020, upending longstanding travel patterns and behavior. More fundamentally, the flexibility for remote workers to choose when and where to work has created much stronger connections between travel behavior and organizational behavior. This paper uses a large and comprehensive monthly longitudinal survey over nearly three years to identify new trends in… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

  27. arXiv:2303.06012  [pdf, other

    stat.AP

    Examining the interactions between working from home, travel behavior and change in car ownership due to the impact of COVID-19

    Authors: Yunhan Zheng, Nicholas Caros, Jim Aloisi, **hua Zhao

    Abstract: COVID-19 has disrupted society and changed how people learn, work and live. The availability of vaccines in the spring of 2021, however, led to a gradual return of many pre-pandemic activities in Massachusetts in the fall of 2021. Leveraging data that were collected using a map-based survey tool in the Greater Boston area in the fall of 2021, this study explores changes in travel behavior due to C… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

  28. arXiv:2303.04040  [pdf, other

    cs.LG stat.AP stat.ML

    Uncertainty Quantification of Spatiotemporal Travel Demand with Probabilistic Graph Neural Networks

    Authors: Qingyi Wang, Shenhao Wang, Dingyi Zhuang, Haris Koutsopoulos, **hua Zhao

    Abstract: Recent studies have significantly improved the prediction accuracy of travel demand using graph neural networks. However, these studies largely ignored uncertainty that inevitably exists in travel demand prediction. To fill this gap, this study proposes a framework of probabilistic graph neural networks (Prob-GNN) to quantify the spatiotemporal uncertainty of travel demand. This Prob-GNN framework… ▽ More

    Submitted 22 February, 2024; v1 submitted 7 March, 2023; originally announced March 2023.

  29. arXiv:2302.08298  [pdf, other

    cs.LG stat.ML

    Unleashing the Potential of Acquisition Functions in High-Dimensional Bayesian Optimization

    Authors: Jiayu Zhao, Renyu Yang, Shenghao Qiu, Zheng Wang

    Abstract: Bayesian optimization (BO) is widely used to optimize expensive-to-evaluate black-box functions.BO first builds a surrogate model to represent the objective function and assesses its uncertainty. It then decides where to sample by maximizing an acquisition function (AF) based on the surrogate model. However, when dealing with high-dimensional problems, finding the global maximum of the AF becomes… ▽ More

    Submitted 23 January, 2024; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted by Transactions on Machine Learning Research (TMLR)

  30. arXiv:2301.03808  [pdf, other

    stat.AP

    Passenger Path Choice Estimation Using Smart Card Data: A Latent Class Approach with Panel Effects Across Days

    Authors: Baichuan Mo, ZhenLiang Ma, Haris N. Koutsopoulos, **hua Zhao

    Abstract: Understanding passengers' path choice behavior in urban rail systems is a prerequisite for effective operations and planning. This paper attempts bridging the gap by proposing a probabilistic approach to infer passengers' path choice behavior in urban rail systems using a large-scale smart card data. The model uses latent classes and panel effects to capture passengers' implicit behavior heterogen… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  31. arXiv:2301.02594  [pdf, other

    stat.AP

    Modeling Virus Transmission Risks in Commuting with Emerging Mobility Services: A Case Study of COVID-19

    Authors: Baichuan Mo, Peyman Noursalehi, Haris N. Koutsopoulos, **hua Zhao

    Abstract: Commuting is an important part of daily life. With the gradual recovery from COVID-19 and more people returning to work from the office, the transmission of COVID-19 during commuting becomes a concern. Recent emerging mobility services (such as ride-hailing and bike-sharing) further deteriorate the infection risks due to shared vehicles or spaces during travel. Hence, it is important to quantify t… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

  32. arXiv:2211.04915  [pdf, other

    stat.AP

    Inferring Mobility of Care Travel Behavior From Transit Origin-Destination Data

    Authors: Daniela Shuman, Awad Abdelhalim, Anson F Stewart, Kayleigh B Campbell, Mira Patel, Ines Sanchez de Madariaga, **hua Zhao

    Abstract: There are substantial differences in travel behavior by gender on public transit. Studies have concluded that these differences are largely attributable to household responsibilities typically falling disproportionately on women, leading to women being more likely to utilize transit for purposes referred to by the umbrella concept of "mobility of care". In contrast to past studies that have quanti… ▽ More

    Submitted 10 April, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: Updated reference formatting and discussion points

  33. Uncertainty Quantification of Sparse Travel Demand Prediction with Spatial-Temporal Graph Neural Networks

    Authors: Dingyi Zhuang, Shenhao Wang, Haris N. Koutsopoulos, **hua Zhao

    Abstract: Origin-Destination (O-D) travel demand prediction is a fundamental challenge in transportation. Recently, spatial-temporal deep learning models demonstrate the tremendous potential to enhance prediction accuracy. However, few studies tackled the uncertainty and sparsity issues in fine-grained O-D matrices. This presents a serious problem, because a vast number of zeros deviate from the Gaussian as… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: Accepted by KDD 2022

  34. arXiv:2208.03291  [pdf

    stat.AP

    Comparing Unit Trains versus Manifest Trains for the Risk of Rail Transport of Hazardous Materials -- Part II: Application and Case Study

    Authors: Di Kang, Jiaxi Zhao, C. Tyler Dick, Xiang Liu, Zheyong Bian, Steven W. Kirkpatrick, Chen-Yu Lin

    Abstract: Built upon the risk analysis methodology (presented in the part I paper), this part II paper focuses on applying this methodology. Five illustrative scenarios were used to analyze the best or worst cases and compare the transportation risk differences between service options using unit trains and manifest trains. The comparison results indicate that if all tank cars are placed at the positions wit… ▽ More

    Submitted 4 July, 2022; originally announced August 2022.

  35. arXiv:2207.02113  [pdf

    stat.AP stat.ME

    Comparing Unit Trains versus Manifest Trains for the Risk of Rail Transport of Hazardous Materials -- Part I: Risk Analysis Methodology

    Authors: Di Kang, Jiaxi Zhao, C. Tyler Dick, Xiang Liu, Zheyong Bian, Steven W. Kirkpatrick, Chen-Yu Lin

    Abstract: Transporting hazardous materials (hazmats) using tank cars has more significant economic benefits than other transportation modes. Although railway transportation is roughly four times more fuel-efficient than roadway transportation, a train derailment has greater potential to cause more disastrous consequences than a truck incident. Train types, such as unit train or manifest train (also called m… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  36. arXiv:2204.09904  [pdf, other

    cs.HC cs.AI cs.CV cs.LG stat.ML

    Infographics Wizard: Flexible Infographics Authoring and Design Exploration

    Authors: Anjul Tyagi, Jian Zhao, Pushkar Patel, Swasti Khurana, Klaus Mueller

    Abstract: Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice… ▽ More

    Submitted 8 May, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

    Comments: Preprint of the EUROVIS 22 accepted paper. arXiv admin note: substantial text overlap with arXiv:2108.11914

    ACM Class: H.5.2; I.4.6; J.5

    Journal ref: Computer Graphics Forum, 2022, 41: 121-132

  37. arXiv:2204.09086  [pdf, other

    stat.ML cs.LG

    Choosing the number of factors in factor analysis with incomplete data via a hierarchical Bayesian information criterion

    Authors: Jianhua Zhao, Changchun Shang, Shulan Li, Ling Xin, Philip L. H. Yu

    Abstract: The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size $N$, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the `complete' sample size $N$ is the same no matter whether in a complete or incomplete… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 16 pages, 4 figures

    MSC Class: 62H25 ACM Class: G.3; I.2.6

  38. arXiv:2203.01171  [pdf, other

    cs.RO stat.ML

    Imitation of Manipulation Skills Using Multiple Geometries

    Authors: Boyang Ti, Yongsheng Gao, Jie Zhao, Sylvain Calinon

    Abstract: Daily manipulation tasks are characterized by geometric primitives related to actions and object shapes. Such geometric descriptors are poorly represented by only using Cartesian coordinate systems. In this paper, we propose a learning approach to extract the optimal representation from a dictionary of coordinate systems to encode an observed movement/behavior. This is achieved by using an extensi… ▽ More

    Submitted 21 July, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

  39. Regularized Bilinear Discriminant Analysis for Multivariate Time Series Data

    Authors: Jianhua Zhao, Haiye Liang, Shulan Li, Zhiji Yang, Zhen Wang

    Abstract: In recent years, the methods on matrix-based or bilinear discriminant analysis (BLDA) have received much attention. Despite their advantages, it has been reported that the traditional vector-based regularized LDA (RLDA) is still quite competitive and could outperform BLDA on some benchmark datasets. Nevertheless, it is also noted that this finding is mainly limited to image data. In this paper, we… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

    Comments: 14 pages, 2 figures

    MSC Class: 68T10 ACM Class: I.5.2

  40. arXiv:2201.12936  [pdf, other

    stat.ME econ.EM

    Pigeonhole Design: Balancing Sequential Experiments from an Online Matching Perspective

    Authors: **glong Zhao, Zijie Zhou

    Abstract: Practitioners and academics have long appreciated the benefits of covariate balancing when they conduct randomized experiments. For web-facing firms running online A/B tests, however, it still remains challenging in balancing covariate information when experimental subjects arrive sequentially. In this paper, we study an online experimental design problem, which we refer to as the "Online Blocking… ▽ More

    Submitted 23 May, 2024; v1 submitted 30 January, 2022; originally announced January 2022.

  41. arXiv:2201.01281  [pdf, other

    stat.AP

    The emerging spectrum of flexible work locations: implications for travel demand and carbon emissions

    Authors: Nicholas S. Caros, Xiaotong Guo, **hua Zhao

    Abstract: Many studies of the effect of remote work on travel demand assume that remote work takes place entirely at home. Recent evidence, however, shows that in the United States, remote workers are choosing to spend approximately one third of their remote work hours outside of the home at cafes, co-working spaces or the homes of friends and family. Commutes to these "third places" could offset much of th… ▽ More

    Submitted 10 March, 2023; v1 submitted 4 January, 2022; originally announced January 2022.

  42. arXiv:2201.01229  [pdf, other

    stat.AP

    Impact of unplanned service disruptions on urban public transit systems

    Authors: Baichuan Mo, Max Y von Franque, Haris N. Koutsopoulosc, John Attanuccid, **hua Zhao

    Abstract: This paper proposes a general unplanned incident analysis framework for public transit systems from the supply and demand sides using automated fare collection (AFC) and automated vehicle location (AVL) data. Specifically, on the supply side, we propose an incident-based network redundancy index to analyze the network's ability to provide alternative services under a specific rail disruption. The… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

  43. Robust factored principal component analysis for matrix-valued outlier accommodation and detection

    Authors: Xuan Ma, Jianhua Zhao, Yue Wang

    Abstract: Principal component analysis (PCA) is a popular dimension reduction technique for vector data. Factored PCA (FPCA) is a probabilistic extension of PCA for matrix data, which can substantially reduce the number of parameters in PCA while yield satisfactory performance. However, FPCA is based on the Gaussian assumption and thereby susceptible to outliers. Although the multivariate $t$ distribution a… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: 37 pages

    MSC Class: 62H25

  44. arXiv:2110.02588  [pdf, ps, other

    stat.ME

    Hypothesis Testing of One-Sample Mean Vector in Distributed Frameworks

    Authors: Bin Du, Junlong Zhao

    Abstract: Distributed frameworks are widely used to handle massive data, where sample size $n$ is very large, and data are often stored in $k$ different machines. For a random vector $X\in \mathbb{R}^p$ with expectation $μ$, testing the mean vector $H_0: μ=μ_0$ vs $H_1: μ\ne μ_0$ for a given vector $μ_0$ is a basic problem in statistics. The centralized test statistics require heavy communication costs, whi… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  45. arXiv:2109.12422  [pdf, other

    stat.ML cs.LG stat.AP

    Equality of opportunity in travel behavior prediction with deep neural networks and discrete choice models

    Authors: Yunhan Zheng, Shenhao Wang, **hua Zhao

    Abstract: Although researchers increasingly adopt machine learning to model travel behavior, they predominantly focus on prediction accuracy, ignoring the ethical challenges embedded in machine learning algorithms. This study introduces an important missing dimension - computational fairness - to travel behavior analysis. We first operationalize computational fairness by equality of opportunity, then differ… ▽ More

    Submitted 25 September, 2021; originally announced September 2021.

  46. arXiv:2108.11589  [pdf, other

    stat.AP stat.ME

    A Statistical Inference Framework for the Minimal Clinically Important Difference

    Authors: Zehua Zhou, Leslie J. Bisson, Jiwei Zhao

    Abstract: In clinical research, the effect of a treatment or intervention is widely assessed through clinical importance, instead of statistical significance. In this paper, we propose a principled statistical inference framework to learning the minimal clinically important difference (MCID), a vital concept in assessing clinical importance. We formulate the scientific question into a novel statistical lear… ▽ More

    Submitted 1 March, 2022; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: 36 Pages, 5 figures, 3 tables, submitted to Statistics in Biosciences

  47. arXiv:2108.04966  [pdf, ps, other

    stat.ME math.ST

    Avoid Estimating the Unknown Function in a Semiparametric Nonignorable Propensity Model

    Authors: Samidha Shetty, Yanyuan Ma, Jiwei Zhao

    Abstract: We study the problem of estimating a functional or a parameter in the context where outcome is subject to nonignorable missingness. We completely avoid modeling the regression relation, while allowing the propensity to be modeled by a semiparametric logistic relation where the dependence on covariates is unspecified. We discover a surprising phenomenon in that the estimation of the parameter in th… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: 21 pages

  48. arXiv:2108.04306  [pdf, other

    stat.ME math.ST

    Test of Significance for High-dimensional Thresholds with Application to Individualized Minimal Clinically Important Difference

    Authors: Huijie Feng, **gyi Duan, Yang Ning, Jiwei Zhao

    Abstract: This work is motivated by learning the individualized minimal clinically important difference, a vital concept to assess clinical importance in various biomedical studies. We formulate the scientific question into a high-dimensional statistical problem where the parameter of interest lies in an individualized linear threshold. The goal is to develop a hypothesis testing procedure for the significa… ▽ More

    Submitted 26 March, 2023; v1 submitted 9 August, 2021; originally announced August 2021.

  49. arXiv:2108.02196  [pdf, other

    stat.ME econ.EM

    Synthetic Controls for Experimental Design

    Authors: Alberto Abadie, **glong Zhao

    Abstract: This article studies experimental design in settings where the experimental units are large aggregate entities (e.g., markets), and only one or a small number of units can be exposed to the treatment. In such settings, randomization of the treatment may result in treated and control groups with very different characteristics at baseline, inducing biases. We propose a variety of synthetic control d… ▽ More

    Submitted 6 December, 2023; v1 submitted 4 August, 2021; originally announced August 2021.

  50. arXiv:2107.02043  [pdf

    stat.AP cs.CY

    An extended watershed-based zonal statistical AHP model for flood risk estimation: Constraining runoff converging related indicators by sub-watersheds

    Authors: Hong** Zhang, Zhenfeng Shao, **qi Zhao, Xiao Huang, Jie Yang, Bin Hu, Wenfu Wu

    Abstract: Floods are highly uncertain events, occurring in different regions, with varying prerequisites and intensities. A highly reliable flood disaster risk map can help reduce the impact of floods for flood management, disaster decreasing, and urbanization resilience. In flood risk estimation, the widely used analytic hierarchy process (AHP) usually adopts pixel as a basic unit, it cannot capture the si… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Comments: This paper is a research paper, it contains 40 pages and 8 figures. This paper is a modest contribution to the ongoing discussions the accuracy of flood risk estimation via AHP model improved by adopting pixels replaced with sub-watersheds as basic unit

    MSC Class: 86A05 ACM Class: H.1