Skip to main content

Showing 1–50 of 156 results for author: Wu, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.17079  [pdf, other

    stat.ML cs.LG

    Learning with User-Level Local Differential Privacy

    Authors: Puning Zhao, Li Shen, Rongfei Fan, Qingming Li, Huiwen Wu, Jiafei Wu, Zhe Liu

    Abstract: User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially dif… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2404.19649  [pdf, other

    cs.LG math.ST physics.data-an stat.ML

    Landmark Alternating Diffusion

    Authors: Sing-Yuan Yeh, Hau-Tieng Wu, Ronen Talmon, Mao-Pei Tsui

    Abstract: Alternating Diffusion (AD) is a commonly applied diffusion-based sensor fusion algorithm. While it has been successfully applied to various problems, its computational burden remains a limitation. Inspired by the landmark diffusion idea considered in the Robust and Scalable Embedding via Landmark Diffusion (ROSELAND), we propose a variation of AD, called Landmark AD (LAD), which captures the essen… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    MSC Class: 53Z50; 65D18

  3. arXiv:2401.17675  [pdf, ps, other

    stat.ML cs.DS cs.LG

    Convergence analysis of t-SNE as a gradient flow for point cloud on a manifold

    Authors: Seonghyeon Jeong, Hau-Tieng Wu

    Abstract: We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    MSC Class: 90C26; 90C30 ACM Class: F.2.2; F.2.0; G.4

  4. arXiv:2401.03921  [pdf, other

    stat.ML cs.LG stat.AP

    Design a Metric Robust to Complicated High Dimensional Noise for Efficient Manifold Denoising

    Authors: Hau-Tieng Wu

    Abstract: In this manuscript, we propose an efficient manifold denoiser based on landmark diffusion and optimal shrinkage under the complicated high dimensional noise and compact manifold setup. It is flexible to handle several setups, including the high ambient space dimension with a manifold embedding that occupies a subspace of high or low dimensions, and the noise could be colored and dependent. A syste… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  5. arXiv:2401.02080  [pdf, other

    cs.LG stat.CO stat.ML

    Energy based diffusion generator for efficient sampling of Boltzmann distributions

    Authors: Yan Wang, Ling Guo, Hao Wu, Tao Zhou

    Abstract: We introduce a novel sampler called the energy based diffusion generator for generating samples from arbitrary target distributions. The sampling model employs a structure similar to a variational autoencoder, utilizing a decoder to transform latent variables from a simple distribution into random variables approximating the target distribution, and we design an encoder based on the diffusion mode… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  6. arXiv:2310.18814  [pdf, other

    stat.ML cs.LG

    Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

    Authors: Yan Wang, Huaiqing Wu, Dan Nettleton

    Abstract: We establish stability of random forests under the mild condition that the squared response ($Y^2$) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed $Y^2$. Us… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  7. arXiv:2309.06673  [pdf, other

    math.NA stat.ME

    Ridge detection for nonstationary multicomponent signals with time-varying wave-shape functions and its applications

    Authors: Yan-Wei Su, Gi-Ren Liu, Yuan-Chung Sheu, Hau-Tieng Wu

    Abstract: We introduce a novel ridge detection algorithm for time-frequency (TF) analysis, particularly tailored for intricate nonstationary time series encompassing multiple non-sinusoidal oscillatory components. The algorithm is rooted in the distinctive geometric patterns that emerge in the TF domain due to such non-sinusoidal oscillations. We term this method \textit{shape-adaptive mode decomposition-ba… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  8. arXiv:2309.05878  [pdf, other

    cs.LG math.DS physics.chem-ph physics.data-an stat.ML

    Reaction coordinate flows for model reduction of molecular kinetics

    Authors: Hao Wu, Frank Noé

    Abstract: In this work, we introduce a flow based machine learning approach, called reaction coordinate (RC) flow, for discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  9. arXiv:2306.11979  [pdf, other

    stat.ME

    Qini Curves for Multi-Armed Treatment Rules

    Authors: Erik Sverdrup, Han Wu, Susan Athey, Stefan Wager

    Abstract: Qini curves have emerged as an attractive and popular approach for evaluating the benefit of data-driven targeting rules for treatment allocation. We propose a generalization of the Qini curve to multiple costly treatment arms, that quantifies the value of optimally selecting among both units and treatment arms at different budget levels. We develop an efficient algorithm for computing these curve… ▽ More

    Submitted 23 April, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

  10. arXiv:2306.02857  [pdf, other

    stat.AP physics.data-an

    Topological Data Analysis Assisted Automated Sleep Stage Scoring Using Airflow Signals

    Authors: Yu-Min Chung, Whitney K. Huang, Hau-Tieng Wu

    Abstract: Objective: Breathing pattern variability (BPV), as a universal physiological feature, encodes rich health information. We aim to show that, a high-quality automatic sleep stage scoring based on a proper quantification of BPV extracting from the single airflow signal can be achieved. Methods: Topological data analysis (TDA) is applied to characterize BPV from the intrinsically nonstationary airfl… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  11. arXiv:2305.10878  [pdf, other

    stat.ME stat.AP

    Multi-scale wavelet coherence with its applications

    Authors: Haibo Wu, MI Knight, H Ombao

    Abstract: The goal in this paper is to develop a novel statistical approach to characterize functional interactions between channels in a brain network. Wavelets are effective for capturing transient properties of non-stationary signals because they have compact support that can be compressed or stretched according to the dynamic properties of the signal. Wavelets give a multi-scale decomposition of signals… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 43 pages, 20 figures in the paper

  12. arXiv:2305.02650  [pdf, other

    cs.IT cs.LG stat.ML

    A Constrained BA Algorithm for Rate-Distortion and Distortion-Rate Functions

    Authors: Lingyi Chen, Shitong Wu, Wenhao Ye, Huihui Wu, Wenyi Zhang, Hao Wu, Bo Bai

    Abstract: The Blahut-Arimoto (BA) algorithm has played a fundamental role in the numerical computation of rate-distortion (RD) functions. This algorithm possesses a desirable monotonic convergence property by alternatively minimizing its Lagrangian with a fixed multiplier. In this paper, we propose a novel modification of the BA algorithm, wherein the multiplier is updated through a one-dimensional root-fin… ▽ More

    Submitted 18 January, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Version_2

  13. arXiv:2211.03262  [pdf, other

    stat.ME stat.AP

    Detecting Interference in A/B Testing with Increasing Allocation

    Authors: Kevin Han, Shuangning Li, Jialiang Mao, Han Wu

    Abstract: In the past decade, the technology industry has adopted online randomized controlled experiments (a.k.a. A/B testing) to guide product development and make business decisions. In practice, A/B tests are often implemented with increasing treatment allocation: the new treatment is gradually released to an increasing number of units through a sequence of randomized experiments. In scenarios such as e… ▽ More

    Submitted 24 March, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

  14. arXiv:2208.08561  [pdf, other

    stat.ML cs.LG math.SP

    Geometric Scattering on Measure Spaces

    Authors: Joyce Chew, Matthew Hirn, Smita Krishnaswamy, Deanna Needell, Michael Perlmutter, Holly Steach, Siddharth Viswanath, Hau-Tieng Wu

    Abstract: The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and man… ▽ More

    Submitted 13 October, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

    MSC Class: 68T07

  15. arXiv:2208.07601  [pdf, other

    cs.IT stat.CO

    An Optimal Transport Approach to the Computation of the LM Rate

    Authors: Wenhao Ye, Huihui Wu, Shitong Wu, Yizhu Wang, Wenyi Zhang, Hao Wu, Bo Bai

    Abstract: Mismatch capacity characterizes the highest information rate for a channel under a prescribed decoding metric, and is thus a highly relevant fundamental performance metric when dealing with many practically important communication scenarios. Compared with the frequently used generalized mutual information (GMI), the LM rate has been known as a tighter lower bound of the mismatch capacity. The comp… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

  16. arXiv:2207.03466  [pdf, other

    stat.AP

    Data-Driven optimal shrinkage of singular values under high-dimensional noise with separable covariance structure with application

    Authors: Pei-Chun Su, Hau-Tieng Wu

    Abstract: We develop a data-driven optimal shrinkage algorithm for matrix denoising in the presence of high-dimensional noise with a separable covariance structure; that is, the noise is colored and dependent across samples. The algorithm, coined {\em extended OptShrink} (eOptShrink) depends on the asymptotic behavior of singular values and singular vectors of the random matrix associated with the noisy dat… ▽ More

    Submitted 11 May, 2024; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: text overlap with arXiv:1905.13060 by other authors

  17. arXiv:2206.11880  [pdf, ps, other

    stat.ME

    The Effective Sample Size in Bayesian Information Criterion for Level-Specific Fixed and Random Effects Selection in a Two-Level Nested Model

    Authors: Sun-Joo Cho, Hao Wu, Matthew Naveiras

    Abstract: Popular statistical software provides Bayesian information criterion (BIC) for multilevel models or linear mixed models. However, it has been observed that the combination of statistical literature and software documentation has led to discrepancies in the formulas of the BIC and uncertainties of the proper use of the BIC in selecting a multilevel model with respect to level-specific fixed and ran… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 34 pages

  18. arXiv:2206.10078  [pdf, other

    cs.LG eess.SP math.NA stat.ML

    The Manifold Scattering Transform for High-Dimensional Point Cloud Data

    Authors: Joyce Chew, Holly R. Steach, Siddharth Viswanath, Hau-Tieng Wu, Matthew Hirn, Deanna Needell, Smita Krishnaswamy, Michael Perlmutter

    Abstract: The manifold scattering transform is a deep feature extractor for data defined on a Riemannian manifold. It is one of the first examples of extending convolutional neural network-like operators to general manifolds. The initial work on this model focused primarily on its theoretical stability and invariance properties but did not provide methods for its numerical implementation except in the case… ▽ More

    Submitted 21 January, 2024; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in the TAG in DS Workshop at ICML. For subsequent theoretical guarantees, please see Section 6 of arXiv:2208.08561

    MSC Class: 68T07 ACM Class: I.2.6

  19. arXiv:2205.12754  [pdf

    stat.ME stat.AP

    Restricted mean survival time regression model with time-dependent covariates

    Authors: Chengfeng Zhang, Hongji Wu, Baoyi Huang, Hao Yuan, Yawen Hou, Zheng Chen

    Abstract: In clinical or epidemiological follow-up studies, methods based on time scale indicators such as the restricted mean survival time (RMST) have been developed to some extent. Compared with traditional hazard rate indicator system methods, the RMST is easier to interpret and does not require the proportional hazard assumption. To date, regression models based on the RMST are indirect or direct model… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: 24 pages, 4 tables

    Journal ref: Statistics in Medicine. 2022

  20. Application of de-shape synchrosqueezing to estimate gait cadence from a single-sensor accelerometer placed in different body locations

    Authors: Hau-Tieng Wu, Jacek Urbanek

    Abstract: Objective: Commercial and research-grade wearable devices have become increasingly popular over the past decade. Information extracted from devices using accelerometers is frequently summarized as ``number of steps" (commercial devices) or ``activity counts" (research-grade devices). Raw accelerometry data that can be easily extracted from accelerometers used in research, for instance ActiGraph GT… ▽ More

    Submitted 5 February, 2023; v1 submitted 20 March, 2022; originally announced March 2022.

  21. arXiv:2203.02057  [pdf, other

    stat.ML cs.LG

    Interpretable Latent Variables in Deep State Space Models

    Authors: Haoxuan Wu, David S. Matteson, Martin T. Wells

    Abstract: We introduce a new version of deep state-space models (DSSMs) that combines a recurrent neural network with a state-space framework to forecast time series data. The model estimates the observed series as functions of latent variables that evolve non-linearly through time. Due to the complexity and non-linearity inherent in DSSMs, previous works on DSSMs typically produced latent variables that ar… ▽ More

    Submitted 19 May, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  22. arXiv:2203.00820  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    Partial Likelihood Thompson Sampling

    Authors: Han Wu, Stefan Wager

    Abstract: We consider the problem of deciding how best to target and prioritize existing vaccines that may offer protection against new variants of an infectious disease. Sequential experiments are a promising approach; however, challenges due to delayed feedback and the overall ebb and flow of disease prevalence make available methods inapplicable for this task. We present a method, partial likelihood Thom… ▽ More

    Submitted 19 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

  23. arXiv:2202.12445  [pdf, other

    stat.ME cs.LG

    Ensemble Method for Estimating Individualized Treatment Effects

    Authors: Kevin Wu Han, Han Wu

    Abstract: In many medical and business applications, researchers are interested in estimating individualized treatment effects using data from a randomized experiment. For example in medical applications, doctors learn the treatment effects from clinical trials and in technology companies, researchers learn them from A/B testing experiments. Although dozens of machine learning models have been proposed for… ▽ More

    Submitted 27 February, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  24. arXiv:2202.12431  [pdf, other

    cs.LG math.ST stat.ME

    Thompson Sampling with Unrestricted Delays

    Authors: Han Wu, Stefan Wager

    Abstract: We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary delay distributions, including ones with unbounded expectation. Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms… ▽ More

    Submitted 22 May, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  25. arXiv:2201.08530  [pdf, other

    stat.ML cs.LG eess.SP

    Spatiotemporal Analysis Using Riemannian Composition of Diffusion Operators

    Authors: Tal Shnitzer, Hau-Tieng Wu, Ronen Talmon

    Abstract: Multivariate time-series have become abundant in recent years, as many data-acquisition systems record information through multiple sensors simultaneously. In this paper, we assume the variables pertain to some geometry and present an operator-based approach for spatiotemporal analysis. Our approach combines three components that are often considered separately: (i) manifold learning for building… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: 48 pages, 13 figures

  26. arXiv:2201.06606  [pdf, other

    stat.ME

    Drift vs Shift: Decoupling Trends and Changepoint Analysis

    Authors: Haoxuan Wu, Toryn L. J. Schafer, Sean Ryan, David S. Matteson

    Abstract: We introduce a new approach for decoupling trends (drift) and changepoints (shifts) in time series. Our locally adaptive model-based approach for robustly decoupling combines Bayesian trend filtering and machine learning based regularization. An over-parameterized Bayesian dynamic linear model (DLM) is first applied to characterize drift. Then a weighted penalized likelihood estimator is paired wi… ▽ More

    Submitted 6 January, 2024; v1 submitted 17 January, 2022; originally announced January 2022.

  27. arXiv:2111.10940  [pdf, ps, other

    stat.ML cs.LG

    How do kernel-based sensor fusion algorithms behave under high dimensional noise?

    Authors: Xiucai Ding, Hau-Tieng Wu

    Abstract: We study the behavior of two kernel based sensor fusion algorithms, nonparametric canonical correlation analysis (NCCA) and alternating diffusion (AD), under the nonnull setting that the clean datasets collected from two sensors are modeled by a common low dimensional manifold embedded in a high dimensional Euclidean space and the datasets are corrupted by high dimensional noise. We establish the… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

  28. arXiv:2111.03267  [pdf, other

    cs.LG stat.ML

    Interpretable Personalized Experimentation

    Authors: Han Wu, Sarah Tan, Weiwei Li, Mia Garrard, Adam Obeng, Drew Dimmery, Shaun Singh, Hanson Wang, Daniel Jiang, Eytan Bakshy

    Abstract: Black-box heterogeneous treatment effect (HTE) models are increasingly being used to create personalized policies that assign individuals to their optimal treatments. However, they are difficult to understand, and can be burdensome to maintain in a production environment. In this paper, we present a scalable, interpretable personalized experimentation system, implemented and deployed in production… ▽ More

    Submitted 5 August, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Camera-ready version for KDD 2022. Previously titled "Distilling Heterogeneity: From Explanations of Heterogeneous Treatment Effect Models to Interpretable Policies". A short version was presented at MIT CODE 2021

  29. arXiv:2110.15013  [pdf, other

    math.DS cs.LG math-ph physics.comp-ph stat.ML

    Deeptime: a Python library for machine learning dynamical models from time series data

    Authors: Moritz Hoffmann, Martin Scherer, Tim Hempel, Andreas Mardt, Brian de Silva, Brooke E. Husic, Stefan Klus, Hao Wu, Nathan Kutz, Steven L. Brunton, Frank Noé

    Abstract: Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic… ▽ More

    Submitted 11 December, 2021; v1 submitted 28 October, 2021; originally announced October 2021.

    Journal ref: Machine Learning: Science and Technology, Volume 3, Number 1, 2021

  30. arXiv:2106.13798  [pdf, other

    cs.LG stat.ML

    Conjugate Energy-Based Models

    Authors: Hao Wu, Babak Esmaeili, Michael Wick, Jean-Baptiste Tristan, Jan-Willem van de Meent

    Abstract: In this paper, we propose conjugate energy-based models (CEBMs), a new class of energy-based models that define a joint density over data and latent variables. The joint density of a CEBM decomposes into an intractable distribution over data and a tractable posterior over latent variables. CEBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised mappin… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

  31. arXiv:2106.13390  [pdf

    stat.AP stat.ME

    Implementation of an alternative method for assessing competing risks: restricted mean time lost

    Authors: Hongji Wu, Hao Yuan, Zi**g Yang, Yawen Hou, Zheng Chen

    Abstract: In clinical and epidemiological studies, hazard ratios are often applied to compare treatment effects between two groups for survival data. For competing risks data, the corresponding quantities of interest are cause-specific hazard ratios (cHRs) and subdistribution hazard ratios (sHRs). However, they both have some limitations related to model assumptions and clinical interpretation. Therefore, w… ▽ More

    Submitted 19 December, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: The R code (crRMTL) is publicly available from Github (https://github.com/chenzgz/crRMTL.1)

    Journal ref: American Journal of Epidemiology, 2021

  32. arXiv:2106.11302  [pdf, other

    stat.ML cs.LG

    Nested Variational Inference

    Authors: Heiko Zimmermann, Hao Wu, Babak Esmaeili, Jan-Willem van de Meent

    Abstract: We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments appl… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

  33. Dynamic prediction and analysis based on restricted mean survival time in survival analysis with nonproportional hazards

    Authors: Zi**g Yang, Hongji Wu, Yawen Hou, Hao Yuan, Zheng Chen

    Abstract: In the process of clinical diagnosis and treatment, the restricted mean survival time (RMST), which reflects the life expectancy of patients up to a specified time, can be used as an appropriate outcome measure. However, the RMST only calculates the mean survival time of patients within a period of time after the start of follow-up and may not accurately portray the change in a patient's life expe… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: 25 pages, 5 figures

    Report number: 207:106155

    Journal ref: Computer Methods and Programs in Biomedicine. 2021, 207: 106155

  34. arXiv:2106.07103  [pdf, other

    q-fin.ST cs.LG stat.ME stat.ML

    A News-based Machine Learning Model for Adaptive Asset Pricing

    Authors: Liao Zhu, Haoxuan Wu, Martin T. Wells

    Abstract: The paper proposes a new asset pricing model -- the News Embedding UMAP Selection (NEUS) model, to explain and predict the stock returns based on the financial news. Using a combination of various machine learning algorithms, we first derive a company embedding vector for each basis asset from the financial news. Then we obtain a collection of the basis assets based on their company embedding. Aft… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  35. arXiv:2106.04145  [pdf, other

    math.OC cs.LG stat.ML

    Unbalanced Optimal Transport through Non-negative Penalized Linear Regression

    Authors: Laetitia Chapel, Rémi Flamary, Haoran Wu, Cédric Févotte, Gilles Gasso

    Abstract: This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Laetitia Chapel and Rémi Flamary have equal contribution

  36. arXiv:2103.00668  [pdf, other

    stat.ML cs.LG cs.PL

    Learning Proposals for Probabilistic Programs with Inference Combinators

    Authors: Sam Stites, Heiko Zimmermann, Hao Wu, Eli Sennesh, Jan-Willem van de Meent

    Abstract: We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernel and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimi… ▽ More

    Submitted 16 June, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    Comments: Accepted to UAI 2021

  37. arXiv:2101.03944  [pdf, other

    cs.LG stat.AP

    Impact of Interventional Policies Including Vaccine on Covid-19 Propagation and Socio-Economic Factors

    Authors: Haonan Wu, Rajarshi Banerjee, Indhumathi Venkatachalam, Daniel Percy-Hughes, Praveen Chougale

    Abstract: A novel coronavirus disease has emerged (later named COVID-19) and caused the world to enter a new reality, with many direct and indirect factors influencing it. Some are human-controllable (e.g. interventional policies, mobility and the vaccine); some are not (e.g. the weather). We have sought to test how a change in these human-controllable factors might influence two measures: the number of dai… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

  38. arXiv:2011.10801  [pdf, ps, other

    stat.ML cs.LG math.PR

    Central and Non-central Limit Theorems arising from the Scattering Transform and its Neural Activation Generalization

    Authors: Gi-Ren Liu, Yuan-Chung Sheu, Hau-Tieng Wu

    Abstract: Motivated by analyzing complicated and non-stationary time series, we study a generalization of the scattering transform (ST) that includes broad neural activation functions, which is called neural activation ST (NAST). On the whole, NAST is a transform that comprises a sequence of ``neural processing units'', each of which applies a high pass filter to the input from the previous layer followed b… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

    MSC Class: Primary 60G60; 60H05; 62M15; Secondary 35K15

  39. arXiv:2011.09437  [pdf, other

    stat.ME

    Trend and Variance Adaptive Bayesian Changepoint Analysis & Local Outlier Scoring

    Authors: Haoxuan Wu, Toryn L. J. Schafer, David S. Matteson

    Abstract: We adaptively estimate both changepoints and local outlier processes in a Bayesian dynamic linear model with global-local shrinkage priors in a novel model we call Adaptive Bayesian Changepoints with Outliers (ABCO). We utilize a state-space approach to identify a dynamic signal in the presence of outliers and measurement error with stochastic volatility. We find that global state equation paramet… ▽ More

    Submitted 13 March, 2024; v1 submitted 18 November, 2020; originally announced November 2020.

  40. arXiv:2011.01479  [pdf, other

    math.ST cs.LG stat.ML

    Convergence of Graph Laplacian with kNN Self-tuned Kernels

    Authors: Xiuyuan Cheng, Hau-Tieng Wu

    Abstract: Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {σ^2} )$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $σ$, and a common practice called self-tuned kernel adaptively sets a $σ_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) dis… ▽ More

    Submitted 9 February, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

  41. arXiv:2010.15368  [pdf

    stat.ME stat.AP

    Classification Accuracy and Parameter Estimation in Multilevel Contexts: A Study of Conditional Nonparametric Multilevel Latent Class Analysis

    Authors: Chi Chang, Kimberly Kelly, M. Lee Van Horn, Richard T. Houang, Joseph Gardiner, Laurie Van Egeren, Heng-Chieh Wu

    Abstract: The current research has two aims. First, to demonstrate the utility conditional nonparametric multilevel latent class analysis (NP-MLCA) for multi-site program evaluation using an empirical dataset. Second, to investigate how classification accuracy and parameter estimation of a conditional NP-MLCA are affected by six study factors: the quality of latent class indicators, the number of latent cla… ▽ More

    Submitted 30 October, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

    Comments: 40 pages, 5 figures, 5 tables

  42. arXiv:2010.07242  [pdf, other

    stat.ME math.ST stat.ML

    Graph Based Gaussian Processes on Restricted Domains

    Authors: David B Dunson, Hau-Tieng Wu, Nan Wu

    Abstract: In nonparametric regression, it is common for the inputs to fall in a restricted subset of Euclidean space. Typical kernel-based methods that do not take into account the intrinsic geometry of the domain across which observations are collected may produce sub-optimal results. In this article, we focus on solving this problem in the context of Gaussian process (GP) models, proposing a new class of… ▽ More

    Submitted 2 November, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: 38 pages, 15 figures

  43. Interpretable Machine Learning for COVID-19: An Empirical Study on Severity Prediction Task

    Authors: Han Wu, Wenjie Ruan, Jiangtao Wang, Dingchang Zheng, Bei Liu, Yayuan Gen, Xiangfei Chai, Jian Chen, Kunwei Li, Shaolin Li, Sumi Helal

    Abstract: The black-box nature of machine learning models hinders the deployment of some high-accuracy models in medical diagnosis. It is risky to put one's life in the hands of models that medical researchers do not fully understand. However, through model interpretation, black-box models can promptly reveal significant biomarkers that medical practitioners may have overlooked due to the surge of infected… ▽ More

    Submitted 20 October, 2021; v1 submitted 30 September, 2020; originally announced October 2020.

    Comments: Accepted by IEEE Transactions on Artificial Intelligence, 2021

    Journal ref: IEEE Transactions on Artificial Intelligence, 2021

  44. arXiv:2009.10318  [pdf, other

    cs.LG stat.ML

    Public Health Informatics: Proposing Causal Sequence of Death Using Neural Machine Translation

    Authors: Yuanda Zhu, Ying Sha, Hang Wu, Mai Li, Ryan A. Hoffman, May D. Wang

    Abstract: Each year there are nearly 57 million deaths around the world, with over 2.7 million in the United States. Timely, accurate and complete death reporting is critical in public health, as institutions and government agencies rely on death reports to analyze vital statistics and to formulate responses to communicable diseases. Inaccurate death reporting may result in potential misdirection of public… ▽ More

    Submitted 9 March, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: 11 pages, 8 figures, 8 tables. Updates: (1) Added Section II: Recent Work (2) Added three accuracy evaluation criteria (3) Re-run experiments of OpenNMT and updated the results and discussion in Section VI: Results and Discussion (4) Finished FHIR mobile app and updated Section VII: FHIR Interface (5) Revised Section VIII: Conclusion accordingly

  45. arXiv:2009.04197   

    cs.LG cs.MA stat.ML

    QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

    Authors: Jian Hu, Seth Austin Harding, Haibin Wu, Siyue Hu, Shih-wei Liao

    Abstract: In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network… ▽ More

    Submitted 23 February, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

    Comments: There are some experimental errors and experimental unfairness in this paper that will seriously affect the later studies

  46. arXiv:2008.06246  [pdf, other

    cs.LG stat.ML

    Graph Polish: A Novel Graph Generation Paradigm for Molecular Optimization

    Authors: Chaojie Ji, Yijia Zheng, Ruxin Wang, Yunpeng Cai, Hongyan Wu

    Abstract: Molecular optimization, which transforms a given input molecule X into another Y with desirable properties, is essential in molecular drug discovery. The traditional translating approaches, generating the molecular graphs from scratch by adding some substructures piece by piece, prone to error because of the large set of candidate substructures in a large number of steps to the final target. In th… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

  47. arXiv:2008.04473  [pdf, other

    stat.ML cs.LG eess.SP

    Airflow recovery from thoracic and abdominal movements using Synchrosqueezing Transform and Locally Stationary Gaussian Process Regression

    Authors: Whitney K. Huang, Yu-Min Chung, Yu-Bo Wang, Jeff E. Mandel, Hau-Tieng Wu

    Abstract: Airflow signal encodes rich information about respiratory system. While the gold standard for measuring airflow is to use a spirometer with an occlusive seal, this is not practical for ambulatory monitoring of patients. Advances in sensor technology have made measurement of motion of the thorax and abdomen feasible with small inexpensive devices, but estimation of airflow from these time series is… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

  48. arXiv:2007.06408  [pdf, ps, other

    math.ST math.PR stat.ML

    Strong Uniform Consistency with Rates for Kernel Density Estimators with General Kernels on Manifolds

    Authors: Hau-Tieng Wu, Nan Wu

    Abstract: When analyzing modern machine learning algorithms, we may need to handle kernel density estimation (KDE) with intricate kernels that are not designed by the user and might even be irregular and asymmetric. To handle this emerging challenge, we provide a strong uniform consistency result with the $L^\infty$ convergence rate for KDE on Riemannian manifolds with Riemann integrable kernels (in the amb… ▽ More

    Submitted 8 June, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: 50 pages

    MSC Class: 60F15; 62G07

  49. arXiv:2006.13054  [pdf

    eess.SP physics.data-an stat.AP

    Unsupervised Ensembling of Multiple Software Sensors with Phase Synchronization: A Robust Approach For Electrocardiogram-derived Respiration

    Authors: Jacob McErlean, John Malik, Yu-Ting Lin, Ronen Talmon, Hau-Tieng Wu

    Abstract: Objective: We aimed to fuse the outputs of different electrocardiogram-derived respiration (EDR) algorithms to create one EDR signal that is of higher quality. Methods: We viewed each EDR algorithm as a software sensor that recorded breathing activity from a different vantage point, identified high-quality software sensors based on the respiratory signal quality index, aligned the highest-quality… ▽ More

    Submitted 13 April, 2024; v1 submitted 23 June, 2020; originally announced June 2020.

  50. arXiv:2005.14346  [pdf, other

    math.OC cs.LG stat.ML

    Consistent Second-Order Conic Integer Programming for Learning Bayesian Networks

    Authors: Simge Kucukyavuz, Ali Shojaie, Hasan Manzour, Linchuan Wei, Hao-Hsiang Wu

    Abstract: Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modeled as a mixed-integer program with an objective fun… ▽ More

    Submitted 6 May, 2022; v1 submitted 28 May, 2020; originally announced May 2020.