Skip to main content

Showing 1–50 of 156 results for author: Huang, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.10499  [pdf, other

    stat.ME stat.AP

    Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US

    Authors: Fangzhi Luo, Jianbin Tan, Donglan Zhang, Hui Huang, Ye Shen

    Abstract: Understanding longitudinally changing associations between Social determinants of health (SDOH) and stroke mortality is crucial for timely stroke management. Previous studies have revealed a significant regional disparity in the SDOH -- stroke mortality associations. However, they do not develop data-driven methods based on these longitudinal associations for regional division in stroke control. T… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  2. arXiv:2406.04690  [pdf, other

    cs.LG stat.ML

    Higher-order Structure Based Anomaly Detection on Attributed Networks

    Authors: Xu Yuan, Na Zhou, Shuo Yu, Huafei Huang, Zhikui Chen, Feng Xia

    Abstract: Anomaly detection (such as telecom fraud detection and medical image detection) has attracted the increasing attention of people. The complex interaction between multiple entities widely exists in the network, which can reflect specific human behavior patterns. Such patterns can be modeled by higher-order network structures, thus benefiting anomaly detection on attributed networks. However, due to… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2406.01561  [pdf, other

    cs.CV cs.AI cs.CL cs.LG stat.ML

    Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation

    Authors: Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang

    Abstract: Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by develo**… ▽ More

    Submitted 22 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  4. arXiv:2405.18108  [pdf

    physics.class-ph stat.AP

    Simulation of Single-Phase Natural Circulation within the BEPU Framework: Sketching Scaling Uncertainty Principle by Multi-Scale CFD Approaches

    Authors: Haifu Huang, Jorge Perez, Nicolas Alpy, Marc Medale

    Abstract: In order to enhance safety, nuclear reactors in the design phase consider natural circulation as a mean to remove residual power. The simulation of this passive mechanism must be qualified between the validation range and the scope of utilization (reactor case), introducing potential physical and numerical distortion effects. In this study, we simulate the flow of liquid sodium using the TrioCFD c… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Journal ref: Best Estimate Plus Uncertainty International Conference (BEPU 2024), May 2024, Lucca, Italy

  5. arXiv:2405.12453  [pdf, other

    stat.CO

    One-step data-driven generative model via Schrödinger Bridge

    Authors: Hanwen Huang

    Abstract: Generating samples from a probability distribution is a fundamental task in machine learning and statistics. This article proposes a novel scheme for sampling from a distribution for which the probability density $μ({\bf x})$ for ${\bf x}\in{\mathbb{R}}^d$ is unknown, but finite independent samples are given. We focus on constructing a Schrödinger Bridge (SB) diffusion process on finite horizon… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 22 pages, 5 figures

  6. arXiv:2404.13707  [pdf, other

    stat.ME stat.AP

    Robust inference for the unification of confidence intervals in meta-analysis

    Authors: Wei Liang, Haicheng Huang, Hongsheng Dai, Yinghui Wei

    Abstract: Traditional meta-analysis assumes that the effect sizes estimated in individual studies follow a Gaussian distribution. However, this distributional assumption is not always satisfied in practice, leading to potentially biased results. In the situation when the number of studies, denoted as K, is large, the cumulative Gaussian approximation errors from each study could make the final estimation un… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  7. arXiv:2404.06681  [pdf, other

    cs.AI cs.LG stat.ME

    Causal Unit Selection using Tractable Arithmetic Circuits

    Authors: Haiying Huang, Adnan Darwiche

    Abstract: The unit selection problem aims to find objects, called units, that optimize a causal objective function which describes the objects' behavior in a causal context (e.g., selecting customers who are about to churn but would most likely change their mind if encouraged). While early studies focused mainly on bounding a specific class of counterfactual objective functions using data, more recent work… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  8. arXiv:2404.04057  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

    Authors: Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

    Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo… ▽ More

    Submitted 24 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: ICML 2024, PyTorch implementation: https://github.com/mingyuanzhou/SiD

  9. Green's matching: an efficient approach to parameter estimation in complex dynamic systems

    Authors: Jianbin Tan, Guoyu Zhang, Xueqin Wang, Hui Huang, Fang Yao

    Abstract: Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statist… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 40 pages, 4 figures

    Journal ref: Journal of the Royal Statistical Society: Series B, 2024

  10. Nonlinear Regression Analysis

    Authors: Hsin-Hsiung Huang, Qing He

    Abstract: Nonlinear regression analysis is a popular and important tool for scientists and engineers. In this article, we introduce theories and methods of nonlinear regression and its statistical inferences using the frequentist and Bayesian statistical modeling and computation. Least squares with the Gauss-Newton method is the most widely used approach to parameters estimation. Under the assumption of nor… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  11. A Framework of Zero-Inflated Bayesian Negative Binomial Regression Models For Spatiotemporal Data

    Authors: Qing He, Hsin-Hsiung Huang

    Abstract: Spatiotemporal data analysis with massive zeros is widely used in many areas such as epidemiology and public health. We use a Bayesian framework to fit zero-inflated negative binomial models and employ a set of latent variables from Pólya-Gamma distributions to derive an efficient Gibbs sampler. The proposed model accommodates varying spatial and temporal random effects through Gaussian process pr… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Journal ref: Journal of Statistical Planning and Inference (2024). 229, 106098

  12. Robust Sufficient Dimension Reduction via $α$-Distance Covariance

    Authors: Hsin-Hsiung Huang, Feng Yu, Teng Zhang

    Abstract: We introduce a novel sufficient dimension-reduction (SDR) method which is robust against outliers using $α$-distance covariance (dCov) in dimension-reduction problems. Under very mild conditions on the predictors, the central subspace is effectively estimated and model-free advantage without estimating link function based on the projection on the Stiefel manifold. We establish the convergence prop… ▽ More

    Submitted 4 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  13. arXiv:2401.10474  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    LDReg: Local Dimensionality Regularized Self-Supervised Learning

    Authors: Hanxun Huang, Ricardo J. G. B. Campello, Sarah Monazam Erfani, Xingjun Ma, Michael E. Houle, James Bailey

    Abstract: Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous wor… ▽ More

    Submitted 14 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  14. Graphical Principal Component Analysis of Multivariate Functional Time Series

    Authors: Jianbin Tan, Decai Liang, Yongtao Guan, Hui Huang

    Abstract: In this paper, we consider multivariate functional time series with a two-way dependence structure: a serial dependence across time points and a graphical interaction among the multiple functions within each time point. We develop the notion of dynamic weak separability, a more general condition than those assumed in literature, and use it to characterize the two-way structure in multivariate func… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    Comments: Journal of the American Statistical Association (2024)

    Journal ref: Journal of the American Statistical Association (2024): 1-24

  15. arXiv:2311.17143  [pdf, other

    astro-ph.IM astro-ph.HE cs.LG stat.ML

    Predicting the Age of Astronomical Transients from Real-Time Multivariate Time Series

    Authors: Hali Huang, Daniel Muthukrishna, Prajna Nair, Zimi Zhang, Michael Fausnaugh, Torsha Majumder, Ryan J. Foley, George R. Ricker

    Abstract: Astronomical transients, such as supernovae and other rare stellar explosions, have been instrumental in some of the most significant discoveries in astronomy. New astronomical sky surveys will soon record unprecedented numbers of transients as sparsely and irregularly sampled multivariate time series. To improve our understanding of the physical mechanisms of transients and their progenitor syste… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 6 pages, 4 figures. Accepted at the NeurIPS 2023 Machine Learning and the Physical Sciences workshop

  16. arXiv:2310.18527  [pdf, other

    stat.ME stat.AP stat.CO

    Multiple Imputation Method for High-Dimensional Neuroimaging Data

    Authors: Tong Lu, Chixiang Chen, Hsin-Hsiung Huang, Peter Kochunov, Elliot Hong, Shuo Chen

    Abstract: Missingness is a common issue for neuroimaging data, and neglecting it in downstream statistical analysis can introduce bias and lead to misguided inferential conclusions. It is therefore crucial to conduct appropriate statistical methods to address this issue. While multiple imputation is a popular technique for handling missing data, its application to neuroimaging data is hindered by high dimen… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 13 pages, 5 figures

  17. arXiv:2308.12680  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

    Authors: Hanchi Huang, Li Shen, Deheng Ye, Wei Liu

    Abstract: We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduc… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: IEEE Transactions on Neural Networks and Learning Systems

  18. A PC-Kriging-HDMR integrated with an adaptive sequential sampling strategy for high-dimensional approximate modeling

    Authors: Yili Zhang, Hanyan Huang, Mei Xiong, Zengquan Yao

    Abstract: High-dimensional complex multi-parameter problems are prevalent in engineering, exceeding the capabilities of traditional surrogate models designed for low/medium-dimensional problems. These models face the curse of dimensionality, resulting in decreased modeling accuracy as the design parameter space expands. Furthermore, the lack of a parameter decoupling mechanism hinders the identification of… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 17 pages with 7 figures and 9 tables

    Journal ref: International Journal of Computer Science & Information Technology (IJCSIT) Vol 15, No 3, June 2023

  19. arXiv:2306.14403  [pdf, other

    cs.LG cs.AI stat.ML

    Anomaly Detection with Score Distribution Discrimination

    Authors: Minqi Jiang, Songqiao Han, Hailiang Huang

    Abstract: Recent studies give more attention to the anomaly detection (AD) methods that can leverage a handful of labeled anomalies along with abundant unlabeled data. These existing anomaly-informed AD methods rely on manually predefined score target(s), e.g., prior constant or margin hyperparameter(s), to realize discrimination in anomaly scores between normal and abnormal data. However, such methods woul… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: Accepted by KDD 2023. Detailed discussions can be found in https://openreview.net/forum?id=P1Worw-M1Tf&referrer=[the%20profile%20of%20Minqi%20Jiang](/profile?id=~Minqi_Jiang2)

  20. arXiv:2306.07456  [pdf

    stat.AP stat.ME

    On the Temporal-spatial Analysis of Estimating Urban Traffic Patterns Via GPS Trace Data of Car-hailing Vehicles

    Authors: Jiannan Mao, Lan Liu, Hao Huang, Weike Lu, Kaiyu Yang, Tianli Tang, Haotian Shi

    Abstract: Car-hailing services have become a prominent data source for urban traffic studies. Extracting useful information from car-hailing trace data is essential for effective traffic management, while discrepancies between car-hailing vehicles and urban traffic should be considered. This paper proposes a generic framework for estimating and analyzing urban traffic patterns using car-hailing trace data.… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  21. arXiv:2303.12834  [pdf, other

    quant-ph cs.AI cs.LG stat.ML

    The power and limitations of learning quantum dynamics incoherently

    Authors: Sofiene Jerbi, Joe Gibbs, Manuel S. Rudolph, Matthias C. Caro, Patrick J. Coles, Hsin-Yuan Huang, Zoë Holmes

    Abstract: Quantum process learning is emerging as an important tool to study quantum systems. While studied extensively in coherent frameworks, where the target and model system can share quantum information, less attention has been paid to whether the dynamics of quantum systems can be learned without the system and target directly interacting. Such incoherent frameworks are practically appealing since the… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 6+9 pages, 7 figures

    Report number: LA-UR-23-22871

  22. arXiv:2303.09491  [pdf, other

    quant-ph cs.LG stat.ML

    Challenges and Opportunities in Quantum Machine Learning

    Authors: M. Cerezo, Guillaume Verdon, Hsin-Yuan Huang, Lukasz Cincio, Patrick J. Coles

    Abstract: At the intersection of machine learning and quantum computing, Quantum Machine Learning (QML) has the potential of accelerating data analysis, especially for quantum data, with applications for quantum materials, biochemistry, and high-energy physics. Nevertheless, challenges remain regarding the trainability of QML models. Here we review current methods and applications for QML. We highlight diff… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: 14 pages, 5 figures

    Report number: LA-UR-21-31504

    Journal ref: Nature Computational Science 2, 567-576 (2022)

  23. arXiv:2302.14618  [pdf, other

    stat.ME stat.CO

    Barycenter Estimation of Positive Semi-Definite Matrices with Bures-Wasserstein Distance

    Authors: **gyi Zheng, Huajun Huang, Yuyan Yi, Yuexin Li, Shu-Chin Lin

    Abstract: Brain-computer interface (BCI) builds a bridge between human brain and external devices by recording brain signals and translating them into commands for devices to perform the user's imagined action. The core of the BCI system is the classifier that labels the input signals as the user's imagined action. The classifiers that directly classify covariance matrices using Riemannian geometry are wide… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  24. arXiv:2302.05686  [pdf, other

    math.ST cs.LG stat.ML

    A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing

    Authors: Kevin H. Huang, Xing Liu, Andrew B. Duncan, Axel Gandy

    Abstract: We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate… ▽ More

    Submitted 2 July, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

    Comments: COLT camera-ready version

  25. arXiv:2301.04204  [pdf, ps, other

    math.OC cs.LG math.NA stat.ML

    A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization

    Authors: Chuan He, Heng Huang, Zhaosong Lu

    Abstract: In this paper we consider finding an approximate second-order stationary point (SOSP) of general nonconvex conic optimization that minimizes a twice differentiable function subject to nonlinear equality constraints and also a convex conic constraint. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier-augmented Lagrangian method for finding an approximate SOSP of this p… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: 34 pages. arXiv admin note: substantial text overlap with arXiv:2301.03139

    MSC Class: 49M05; 49M15; 68Q25; 90C26; 90C30; 90C60

  26. arXiv:2211.10541  [pdf, ps, other

    math.ST stat.CO

    Phase transition and higher order analysis of $L_q$ regularization under dependence

    Authors: Hanwen Huang, Peng Zeng, Qinglong Yang

    Abstract: We study the problem of estimating a $k$-sparse signal ${\mbox{$β$}}_0\in{\bf R}^p$ from a set of noisy observations ${\bf y}\in{\bf R}^n$ under the model ${\bf y}={\bf X}{\mbox{$β$}}+{\bf w}$, where ${\bf X}\in{\bf R}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\mbox{$Σ$}})$. We consider the class of $L_q$-regularized least squares (LQLS) given by the… ▽ More

    Submitted 1 December, 2022; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: 35 pages, 11 figures

  27. arXiv:2210.08231  [pdf, other

    stat.ME

    Assessing Spatial Stationarity and Segmenting Spatial Processes into Stationary Components

    Authors: ShengLi Tzeng, Bo-Yu Chen, Hsin-Cheng Huang

    Abstract: In this research, we propose a novel technique for visualizing nonstationarity in geostatistics, particularly when confronted with a single realization of data at irregularly spaced locations. Our method hinges on formulating a statistic that tracks a stable microergodic parameter of the exponential covariance function, allowing us to address the intricate challenges of nonstationary processes tha… ▽ More

    Submitted 28 August, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

    Comments: 25 pages, 5 tables, 11 figures

    MSC Class: 62M30

  28. arXiv:2208.11411  [pdf, other

    q-bio.NC cond-mat.dis-nn cond-mat.stat-mech math-ph stat.ML

    Spectrum of non-Hermitian deep-Hebbian neural networks

    Authors: Zijian Jiang, Ziming Chen, Tianqi Hou, Hai** Huang

    Abstract: Neural networks with recurrent asymmetric couplings are important to understand how episodic memories are encoded in the brain. Here, we integrate the experimental observation of wide synaptic integration window into our model of sequence retrieval in the continuous time dynamics. The model with non-normal neuron-interactions is theoretically studied by deriving a random matrix theory of the Jacob… ▽ More

    Submitted 16 January, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

    Comments: 65 pages, 12 figures, revised version for publication

    Journal ref: Phys. Rev. Research 5, 013090 (2023)

  29. arXiv:2208.06058  [pdf, other

    cs.LG stat.ML

    An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification

    Authors: Runxue Bao, Bin Gu, Heng Huang

    Abstract: Sparsity regularized loss minimization problems play an important role in various fields including machine learning, data mining, and modern statistics. Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem. Although existing methods can achieve implicit model identification, aka support set identification, in a finite nu… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

  30. arXiv:2207.01813  [pdf, other

    q-bio.QM stat.AP

    Stochastic Variational Methods in Generalized Hidden Semi-Markov Models to Characterize Functionality in Random Heteropolymers

    Authors: Yun Zhou, Boying Gong, Tao Jiang, Ting Xu, Haiyan Huang

    Abstract: Recent years have seen substantial advances in the development of biofunctional materials using synthetic polymers. The growing problem of elusive sequence-functionality relations for most biomaterials has driven researchers to seek more effective tools and analysis methods. In this study, statistical models are used to study sequence features of the recently reported random heteropolymers (RHP),… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

  31. arXiv:2206.08353  [pdf, other

    cs.LG stat.ML

    Towards Understanding How Machines Can Learn Causal Overhypotheses

    Authors: Eliza Kosoy, David M. Chan, Adrian Liu, Jasmine Collins, Bryanna Kaufmann, Sandy Han Huang, Jessica B. Hamrick, John Canny, Nan Rosemary Ke, Alison Gopnik

    Abstract: Recent work in machine learning and cognitive science has suggested that understanding causal information is essential to the development of intelligence. The extensive literature in cognitive science using the ``blicket detector'' environment shows that children are adept at many kinds of causal inference and learning. We propose to adapt that environment for machine learning agents. One of the k… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  32. arXiv:2206.06463  [pdf, other

    physics.med-ph stat.AP

    A statistical reconstruction algorithm for positronium lifetime imaging using time-of-flight positron emission tomography

    Authors: Hsin-Hsiung Huang, Zheyuan Zhu, Slun Booppasiri, Zhuo Chen, Shuo Pang, Chien-Min Kao

    Abstract: Positron emission tomography (PET) is an important modality for diagnosing diseases such as cancer and Alzheimer's disease, capable of revealing the uptake of radiolabeled molecules that target specific pathological markers of the diseases. Recently, positronium lifetime imaging (PLI) that adds to traditional PET the ability to explore properties of the tissue microenvironment beyond tracer uptake… ▽ More

    Submitted 9 May, 2024; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: Submitted to IEEE-TPRMS

  33. arXiv:2205.07833  [pdf, other

    cs.LG stat.ML

    Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

    Authors: Yuting Ye, Christine Ho, Ci-Ren Jiang, Wayne Tai Lee, Haiyan Huang

    Abstract: Hierarchical multi-label classification (HMC) has drawn increasing attention in the past few decades. It is applicable when hierarchical relationships among classes are available and need to be incorporated along with the multi-label classification whereby each object is assigned to one or more classes. There are two key challenges in HMC: i) optimizing the classification accuracy, and meanwhile i… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: 34 pages, 11 figures, 9 tables

  34. arXiv:2205.07106  [pdf, ps, other

    stat.ML cs.LG

    Robust Regularized Low-Rank Matrix Models for Regression and Classification

    Authors: Hsin-Hsiung Huang, Feng Yu, Xing Fan, Teng Zhang

    Abstract: While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional and noisy matrix-valued predictors. To address these issues, this paper proposes a framework of matrix variate regression models based on a rank constraint, vector regulariz… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: 26 pages, 7 figures

    MSC Class: 62J12

  35. arXiv:2204.10981  [pdf, other

    cs.LG cs.DC stat.ML

    Distributed Dynamic Safe Screening Algorithms for Sparse Regularization

    Authors: Runxue Bao, Xidong Wu, Wenhan Xian, Heng Huang

    Abstract: Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in the era of big data. Safe screening is a popular technique to speed up high-dimensional models by discarding the inactive features with zero coefficients. Neverth… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  36. arXiv:2204.10268  [pdf, other

    quant-ph cs.LG stat.ML

    Out-of-distribution generalization for learning quantum dynamics

    Authors: Matthias C. Caro, Hsin-Yuan Huang, Nicholas Ezzell, Joe Gibbs, Andrew T. Sornborger, Lukasz Cincio, Patrick J. Coles, Zoë Holmes

    Abstract: Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we req… ▽ More

    Submitted 9 July, 2023; v1 submitted 21 April, 2022; originally announced April 2022.

    Comments: 8 pages (main body) + 18 pages (references and appendix); 4+2 figures; V3 includes additional explanations and numerical experiments in the appendix

    Report number: LA-UR-22-23623

    Journal ref: Nat Commun 14, 3751 (2023)

  37. arXiv:2203.04272  [pdf, other

    cs.LG cs.AI stat.ME

    Policy-Based Bayesian Experimental Design for Non-Differentiable Implicit Models

    Authors: Vincent Lim, Ellen Novoseller, Jeffrey Ichnowski, Huang Huang, Ken Goldberg

    Abstract: For applications in healthcare, physics, energy, robotics, and many other fields, designing maximally informative experiments is valuable, particularly when experiments are expensive, time-consuming, or pose safety hazards. While existing approaches can sequentially design experiments based on prior observation history, many of these methods do not extend to implicit models, where simulation is po… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: 15 pages, 3 figures

  38. arXiv:2203.02433  [pdf, ps, other

    cs.LG cs.NE math.OC stat.ML

    The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights

    Authors: Maxime Gasse, Quentin Cappart, Jonas Charfreitag, Laurent Charlin, Didier Chételat, Antonia Chmiela, Justin Dumouchelle, Ambros Gleixner, Aleksandr M. Kazachkov, Elias Khalil, Pawel Lichocki, Andrea Lodi, Miles Lubin, Chris J. Maddison, Christopher Morris, Dimitri J. Papageorgiou, Augustin Parjadis, Sebastian Pokutta, Antoine Prouvost, Lara Scavuzzo, Giulia Zarpellon, Linxin Yang, Sha Lai, Akang Wang, Xiaodong Luo , et al. (16 additional authors not shown)

    Abstract: Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either dir… ▽ More

    Submitted 17 March, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: Neurips 2021 competition. arXiv admin note: text overlap with arXiv:2112.12251 by other authors

  39. arXiv:2202.11592  [pdf, ps, other

    cs.LG stat.ML

    A Law of Robustness beyond Isoperimetry

    Authors: Yihan Wu, Heng Huang, Hongyang Zhang

    Abstract: We study the robust interpolation problem of arbitrary data distributions supported on a bounded space and propose a two-fold law of robustness. Robust interpolation refers to the problem of interpolating $n$ noisy training data points in $\mathbb{R}^d$ by a Lipschitz function. Although this problem has been well understood when the samples are drawn from an isoperimetry distribution, much remains… ▽ More

    Submitted 31 May, 2023; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: To appear in ICML 2023

  40. arXiv:2202.09134  [pdf, other

    cs.LG math.ST stat.ML

    Data Augmentation in the Underparameterized and Overparameterized Regimes

    Authors: Kevin Han Huang, Peter Orbanz, Morgane Austern

    Abstract: We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction ris… ▽ More

    Submitted 28 September, 2023; v1 submitted 18 February, 2022; originally announced February 2022.

    Comments: Changed title and added an analysis on the effect of augmentations on the double-descent risk curve of a high-dimensional ridgeless estimator

  41. arXiv:2111.13428  [pdf, other

    stat.AP

    Nonstationary Spatial Modeling of Massive Global Satellite Data

    Authors: Huang Huang, Lewis R. Blake, Matthias Katzfuss, Dorit M. Hammerling

    Abstract: Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete,… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

  42. arXiv:2111.13302  [pdf, ps, other

    cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Equivalence between algorithmic instability and transition to replica symmetry breaking in perceptron learning systems

    Authors: Yang Zhao, Junbin Qiu, Mingshan Xie, Hai** Huang

    Abstract: Binary perceptron is a fundamental model of supervised learning for the non-convex optimization, which is a root of the popular deep learning. Binary perceptron is able to achieve a classification of random high-dimensional data by computing the marginal probabilities of binary synapses. The relationship between the algorithmic instability and the equilibrium analysis of the model remains elusive.… ▽ More

    Submitted 7 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: 24 pages, 2 figures, revision to journal

    Journal ref: Phys. Rev. Research 4, 023023 (2022)

  43. arXiv:2111.10734  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Deep Probability Estimation

    Authors: Sheng Liu, Aakash Kaku, Weicheng Zhu, Matan Leibovich, Sreyas Mohan, Boyang Yu, Haoxiang Huang, Laure Zanna, Narges Razavian, Jonathan Niles-Weed, Carlos Fernandez-Granda

    Abstract: Reliable probability estimation is of crucial importance in many real-world applications where there is inherent (aleatoric) uncertainty. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or whether a patient has died or not), because the ground-truth probabilities of the events of interest are typically unknown. The problem is therefore analogous t… ▽ More

    Submitted 11 October, 2022; v1 submitted 20 November, 2021; originally announced November 2021.

    Comments: SL, AK, WZ, ML, SM contributed equally to this work; 36 pages, 17 figures, 12 tables

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:13746-13781, 2022

  44. arXiv:2111.05292  [pdf, other

    quant-ph cs.LG stat.ML

    Generalization in quantum machine learning from few training data

    Authors: Matthias C. Caro, Hsin-Yuan Huang, M. Cerezo, Kunal Sharma, Andrew Sornborger, Lukasz Cincio, Patrick J. Coles

    Abstract: Modern quantum machine learning (QML) methods involve variationally optimizing a parameterized quantum circuit on a training data set, and subsequently making predictions on a testing data set (i.e., generalizing). In this work, we provide a comprehensive study of generalization performance in QML after training on a limited number $N$ of training data points. We show that the generalization error… ▽ More

    Submitted 5 September, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: 14+26 pages, 4+1 figures

    Report number: LA-UR-21-31086

    Journal ref: Nat Commun 13, 4919 (2022)

  45. arXiv:2111.04951  [pdf, other

    cs.CL cs.AI econ.GN stat.AP

    American Hate Crime Trends Prediction with Event Extraction

    Authors: Songqiao Han, Hailiang Huang, Jiangwei Liu, Shengsheng Xiao

    Abstract: Social media platforms may provide potential space for discourses that contain hate speech, and even worse, can act as a propagation mechanism for hate crimes. The FBI's Uniform Crime Reporting (UCR) Program collects hate crime data and releases statistic report yearly. These statistics provide information in determining national hate crime trends. The statistics can also provide valuable holistic… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: 12 pages, 5 figures, 4 tables

  46. arXiv:2111.03252  [pdf, other

    stat.ME

    Test of Weak Separability for Spatially Stationary Functional Field

    Authors: Decai Liang, Hui Huang, Yongtao Guan, Fang Yao

    Abstract: For spatially dependent functional data, a generalized Karhunen-Loève expansion is commonly used to decompose data into an additive form of temporal components and spatially correlated coefficients. This structure provides a convenient model to investigate the space-time interactions, but may not hold for complex spatio-temporal processes. In this work, we introduce the concept of weak separabilit… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

  47. arXiv:2110.08005  [pdf, other

    stat.ME stat.AP

    Spatially Adaptive Calibrations of AirBox PM$_{2.5}$ Data

    Authors: ShengLi Tzeng, Chi-Wei Lai, Hsin-Cheng Huang

    Abstract: Two networks are available to monitor PM$_{2.5}$ in Taiwan, including the Taiwan Air Quality Monitoring Network (TAQMN) and the AirBox network. The TAQMN, managed by Taiwan's Environmental Protection Administration (EPA), provides high-quality PM$_{2.5}$ measurements at $77$ monitoring stations. More recently, the AirBox network was launched, consisting of low-cost, small internet-of-things (IoT)… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: 21 pages, 9 figures

    MSC Class: 62P12

  48. arXiv:2110.03825  [pdf, other

    cs.LG cs.CV stat.ML

    Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

    Authors: Hanxun Huang, Yisen Wang, Sarah Monazam Erfani, Quanquan Gu, James Bailey, Xingjun Ma

    Abstract: Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks. A range of defense methods have been proposed to train adversarially robust DNNs, among which adversarial training has demonstrated promising results. However, despite preliminary understandings developed for adversarial training, it is still not clear, from the architectural perspective, what configurations can lead to… ▽ More

    Submitted 22 January, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  49. arXiv:2110.01571  [pdf, other

    cs.CV stat.ME

    Causal Representation Learning for Context-Aware Face Transfer

    Authors: Gege Gao, Huaibo Huang, Chaoyou Fu, Ran He

    Abstract: Human face synthesis involves transferring knowledge about the identity and identity-dependent face shape (IDFS) of a human face to target face images where the context (e.g., facial expressions, head poses, and other background factors) may change dramatically. Human faces are non-rigid, so facial expression leads to deformation of face shape, and head pose also affects the face observed in 2D im… ▽ More

    Submitted 17 November, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

  50. arXiv:2107.08273  [pdf, other

    cs.LG stat.ML

    STRODE: Stochastic Boundary Ordinary Differential Equation

    Authors: Hengguan Huang, Hongfu Liu, Hao Wang, Chang Xiao, Ye Wang

    Abstract: Perception of time from sequentially acquired sensory inputs is rooted in everyday behaviors of individual organisms. Yet, most algorithms for time-series modeling fail to learn dynamics of random event timings directly from visual or audio inputs, requiring timing annotations during training that are usually unavailable for real-world applications. For instance, neuroscience perspectives on postd… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    Comments: Accepted at ICML 2021; typos corrected