Skip to main content

Showing 1–50 of 86 results for author: Huang, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.14163  [pdf, other

    cs.DB stat.ME

    A Unified Statistical And Computational Framework For Ex-Post Harmonisation Of Aggregate Statistics

    Authors: Cynthia A. Huang

    Abstract: Ex-post harmonisation is one of many data preprocessing processes used to combine the increasingly vast and diverse sources of data available for research and analysis. Documenting provenance and ensuring the quality of multi-source datasets is vital for ensuring trustworthy scientific research and encouraging reuse of existing harmonisation efforts. However, capturing and communicating statistica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.06516  [pdf, other

    stat.ME cs.LG stat.ML

    Distribution-Free Predictive Inference under Unknown Temporal Drift

    Authors: Elise Han, Chengpiao Huang, Kaizheng Wang

    Abstract: Distribution-free prediction sets play a pivotal role in uncertainty quantification for complex statistical models. Their validity hinges on reliable calibration data, which may not be readily available as real-world environments often undergo unknown changes over time. In this paper, we propose a strategy for choosing an adaptive window and use the data therein to construct prediction sets. The w… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 25 pages, 4 figures, 6 tables

  3. arXiv:2403.05281  [pdf, other

    stat.ML math.ST

    An Efficient Quasi-Random Sampling for Copulas

    Authors: Sumin Wang, Chenxian Huang, Yongdao Zhou, Min-Qian Liu

    Abstract: This paper examines an efficient method for quasi-random sampling of copulas in Monte Carlo computations. Traditional methods, like conditional distribution methods (CDM), have limitations when dealing with high-dimensional or implicit copulas, which refer to those that cannot be accurately represented by existing parametric copulas. Instead, this paper proposes the use of generative models, such… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 42 pages, 5 figures

  4. arXiv:2402.08672  [pdf, other

    cs.LG cs.AI stat.ME

    Model Assessment and Selection under Temporal Distribution Shift

    Authors: Elise Han, Chengpiao Huang, Kaizheng Wang

    Abstract: We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidat… ▽ More

    Submitted 3 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: 26 pages, 6 figures, 4 tables

    MSC Class: 62G05 (Primary); 62J02 (Secondary)

  5. arXiv:2401.00800  [pdf, other

    stat.ME stat.ML

    Factor Importance Ranking and Selection using Total Indices

    Authors: Chaofan Huang, V. Roshan Joseph

    Abstract: Factor importance measures the impact of each feature on output prediction accuracy. Many existing works focus on the model-based importance, but an important feature in one learning algorithm may hold little significance in another model. Hence, a factor importance measure ought to characterize the feature's predictive potential without relying on a specific prediction algorithm. Such algorithm-a… ▽ More

    Submitted 11 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  6. arXiv:2312.05757  [pdf, ps, other

    cs.LG cs.AI cs.DL cs.SI stat.ME

    Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph

    Authors: Tianqian** Lin, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Weikang Yuan, Xurui Li, Changlong Sun, Cui Huang, Xiaozhong Liu

    Abstract: Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: 28 pages, 10 figures, 6 tables, accepted by Information Processing & Management

    Journal ref: Information Processing & Management, 60 (2024) 1-21

  7. arXiv:2310.18304  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    A Stability Principle for Learning under Non-Stationarity

    Authors: Chengpiao Huang, Kaizheng Wang

    Abstract: We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while kee** the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptability of this approach to unknown non-st… ▽ More

    Submitted 22 January, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: 48 pages, 1 figure

    MSC Class: 68T05; 90C15

  8. arXiv:2310.07953  [pdf, other

    stat.ME stat.CO

    Enhancing Sample Quality through Minimum Energy Importance Weights

    Authors: Chaofan Huang, V. Roshan Joseph

    Abstract: Importance sampling is a powerful tool for correcting the distributional mismatch in many statistical and machine learning problems, but in practice its performance is limited by the usage of simple proposals whose importance weights can be computed analytically. To address this limitation, Liu and Lee (2017) proposed a Black-Box Importance Sampling (BBIS) algorithm that computes the importance we… ▽ More

    Submitted 31 December, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  9. arXiv:2309.16492  [pdf, other

    stat.ME cs.AI stat.AP stat.ML

    Asset Bundling for Wind Power Forecasting

    Authors: Hanyu Zhang, Mathieu Tanneau, Chaofan Huang, V. Roshan Joseph, Shangkun Wang, Pascal Van Hentenryck

    Abstract: The growing penetration of intermittent, renewable generation in US power grids, especially wind and solar generation, results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  10. arXiv:2308.05517   

    stat.ME

    Quantile regression outcome-adaptive lasso: variable selection for causal quantile treatment effect estimation

    Authors: Yahang Liu, Kecheng Wei, Chen Huang, Yongfu Yu, Guoyou Qin

    Abstract: Quantile treatment effects (QTEs) can characterize the potentially heterogeneous causal effect of a treatment on different points of the entire outcome distribution. Propensity score (PS) methods are commonly employed for estimating QTEs in non-randomized studies. Empirical and theoretical studies have shown that insufficient and unnecessary adjustment for covariates in PS models can lead to bias… ▽ More

    Submitted 14 August, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: Need polishing

  11. arXiv:2212.05663  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On an Interpretation of ResNets via Solution Constructions

    Authors: Changcun Huang

    Abstract: This paper first constructs a typical solution of ResNets for multi-category classifications by the principle of gate-network controls and deep-layer classifications, from which a general interpretation of the ResNet architecture is given and the performance mechanism is explained. We then use more solutions to further demonstrate the generality of that interpretation. The universal-approximation… ▽ More

    Submitted 23 December, 2022; v1 submitted 11 December, 2022; originally announced December 2022.

    Comments: v2:writing improved

    MSC Class: 68T07 ACM Class: I.2.6

  12. Adaptive Exploration and Optimization of Materials Crystal Structures

    Authors: Arvind Krishna, Huan Tran, Chaofan Huang, Rampi Ramprasad, V. Roshan Joseph

    Abstract: A central problem of materials science is to determine whether a hypothetical material is stable without being synthesized, which is mathematically equivalent to a global optimization problem on a highly non-linear and multi-modal potential energy surface (PES). This optimization problem poses multiple outstanding challenges, including the exceedingly high dimensionality of the PES and that PES mu… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Journal ref: INFORMS Journal on Data Science, 2023

  13. arXiv:2211.07816  [pdf, other

    cs.LG stat.ML

    Quantifying the Impact of Label Noise on Federated Learning

    Authors: Shuqi Ke, Chao Huang, Xin Liu

    Abstract: Federated Learning (FL) is a distributed machine learning paradigm where clients collaboratively train a model using their local (human-generated) datasets. While existing studies focus on FL algorithm development to tackle data heterogeneity across clients, the important issue of data quality (e.g., label noise) in FL is overlooked. This paper aims to fill this gap by providing a quantitative stu… ▽ More

    Submitted 3 April, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted by The AAAI 2023 Workshop on Representation Learning for Responsible Human-Centric AI

  14. arXiv:2210.15060  [pdf, other

    stat.ME stat.ML

    Optimal Sub-sampling to Boost Power of Kernel Sequential Change-point Detection

    Authors: Song Wei, Chaofan Huang

    Abstract: We present a novel scheme to boost detection power for kernel maximum mean discrepancy based sequential change-point detection procedures. Our proposed scheme features an optimal sub-sampling of the history data before the detection procedure, in order to tackle the power loss incurred by the random sub-sample from the enormous history data. We apply our proposed scheme to both Scan $B$ and Kernel… ▽ More

    Submitted 13 January, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: 5 pages

  15. arXiv:2209.12054  [pdf, other

    stat.ML cs.LG

    From Local to Global: Spectral-Inspired Graph Neural Networks

    Authors: Ningyuan Huang, Soledad Villar, Carey E. Priebe, Da Zheng, Chengyue Huang, Lin Yang, Vladimir Braverman

    Abstract: Graph Neural Networks (GNNs) are powerful deep learning methods for Non-Euclidean data. Popular GNNs are message-passing algorithms (MPNNs) that aggregate and combine signals in a local graph neighborhood. However, shallow MPNNs tend to miss long-range signals and perform poorly on some heterophilous graphs, while deep MPNNs can suffer from issues like over-smoothing or over-squashing. To mitigate… ▽ More

    Submitted 4 November, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

    Comments: Accepted for publication at the NeurIPS 2022 GLFrontiers Workshop

  16. arXiv:2209.11730  [pdf, other

    q-bio.GN stat.AP

    BioKlustering: a web app for semi-supervised learning of maximally imbalanced genomic data

    Authors: Samuel Ozminkowski, Yuke Wu, Liule Yang, Zhiwen Xu, Luke Selberg, Chunrong Huang, Claudia Solis-Lemus

    Abstract: Summary: Accurate phenotype prediction from genomic sequences is a highly coveted task in biological and medical research. While machine-learning holds the key to accurate prediction in a variety of fields, the complexity of biological data can render many methodologies inapplicable. We introduce BioKlustering, a user-friendly open-source and publicly available web app for unsupervised and semi-su… ▽ More

    Submitted 26 September, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

  17. arXiv:2204.10525  [pdf

    stat.OT

    Research on spatial information transmission efficiency and capability of safe evacuation signs

    Authors: Ruiwen Fan, Zhangyin Dai, Shixiang Tian, Ting Xia a, Hui Zhou, Congbao Huang

    Abstract: As an indispensable spatial direction information indicator for emergency evacuation, the spatial relationship between safety evacuation signs and evacuees will affect the response time of evacuees and the evacuation efficiency. This paper takes 2 kinds of common safety evacuation signs, hangtag-type and embedded, as the research object and designs space direction information transmission efficien… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    MSC Class: 14J26 ACM Class: I.2.7

  18. arXiv:2204.05552  [pdf

    stat.AP

    The Effects of Dynamic Learning and the Forgetting Process on an Optimizing Modelling for Full-Service Repair Pricing Contracts for Medical Devices

    Authors: Ai** Jiang, Lin Li, Xuemin Xu, David Y. C. Huang

    Abstract: In order to improve the profitability and customer service management of original equipment manufacturers (OEMs) in a market where full-service (FS) and on-call service (OS) co-exist, this article extends the optimizing modelling for pricing FS repair contracts with the effects of dynamic learning and forgetting. Along with considering autonomous learning in maintenance practice, this study also a… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

  19. Network of Low-cost Air Quality Sensor for Monitoring Indoor, Outdoor, and Personal PM2.5 Exposure in Seattle during the 2020 Wildfire Season

    Authors: Jiayang He, Ching-Hsuan Huang, Nanhsun Yuan, Elena Austin, Edmund Seto, Igor Novosselov

    Abstract: The increased frequency of wildfires in the Western United States has raised public concerns. Exposure to wildfire smoke has been linked to an increased risk of cancer and cardiorespiratory morbidity. Evidence-driven interventions can alleviate the adverse health impact of wildfire smoke. Public health guidance during wildfires is based on regional air quality data with limited spatiotemporal reso… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

  20. arXiv:2107.09730  [pdf, ps, other

    stat.ME stat.AP

    A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data

    Authors: Jung-Yi Joyce Lin, Liangyuan Hu, Chuyue Huang, Steven Lawrence, Usha Govindarajulu

    Abstract: Prior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can provide good performances achievable on fully observed data when covariate and outcome data are missing at random (MAR). This approach however is computationally expensive, especially on large-scale datasets. We propose an inference-based method, called RR-BART, which leverages… ▽ More

    Submitted 13 April, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: 16 pages, 3 figures, 3 tables

  21. arXiv:2107.02871  [pdf, other

    stat.ME stat.CO

    Non-Homogeneity Estimation and Universal Kriging on the Sphere

    Authors: Nicholas W. Bussberg, Jacob Shields, Chunfeng Huang

    Abstract: Kriging is a widely recognized method for making spatial predictions. On the sphere, popular methods such as ordinary kriging assume that the spatial process is intrinsically homogeneous. However, intrinsic homogeneity is too strict in many cases. This research uses intrinsic random function (IRF) theory to relax the homogeneity assumption. A key component of modeling IRF processes is estimating t… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: 15 pages, 6 figures

  22. arXiv:2105.12894  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    MAGI-X: Manifold-Constrained Gaussian Process Inference for Unknown System Dynamics

    Authors: Chaofan Huang, Simin Ma, Shihao Yang

    Abstract: Ordinary differential equations (ODEs), commonly used to characterize the dynamic systems, are difficult to propose in closed-form for many complicated scientific applications, even with the help of domain expert. We propose a fast and accurate data-driven method, MAGI-X, to learn the unknown dynamic from the observation data in a non-parametric fashion, without the need of any domain knowledge. U… ▽ More

    Submitted 19 October, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

  23. arXiv:2105.07424  [pdf, other

    econ.EM stat.ME

    Uniform Inference on High-dimensional Spatial Panel Networks

    Authors: Victor Chernozhukov, Chen Huang, Weining Wang

    Abstract: We propose employing a debiased-regularized, high-dimensional generalized method of moments (GMM) framework to perform inference on large-scale spatial panel networks. In particular, network structure with a flexible sparse deviation, which can be regarded either as latent or as misspecified from a predetermined adjacency matrix, is estimated using debiased machine learning approach. The theoretic… ▽ More

    Submitted 7 September, 2023; v1 submitted 16 May, 2021; originally announced May 2021.

  24. Constrained Minimum Energy Designs

    Authors: Chaofan Huang, V. Roshan Joseph, Douglas M. Ray

    Abstract: Space-filling designs are important in computer experiments, which are critical for building a cheap surrogate model that adequately approximates an expensive computer code. Many design construction techniques in the existing literature are only applicable for rectangular bounded space, but in real world applications, the input space can often be non-rectangular because of constraints on the input… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: Submitted to Statistics and Computing

    Journal ref: Stat Comput 31, 80 (2021)

  25. arXiv:2104.11708  [pdf

    stat.CO

    Regression Modeling for Recurrent Events Using R Package reReg

    Authors: Sy Han Chiou, Gongjun Xu, Jun Yan, Chiung-Yu Huang

    Abstract: Recurrent event analyses have found a wide range of applications in biomedicine, public health, and engineering, among others, where study subjects may experience a sequence of event of interest during follow-up. The R package reReg (Chiou and Huang 2021) offers a comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, possibly with the presence of… ▽ More

    Submitted 20 August, 2022; v1 submitted 23 April, 2021; originally announced April 2021.

  26. arXiv:2103.10510  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Hidden Technical Debts for Fair Machine Learning in Financial Services

    Authors: Chong Huang, Arash Nourian, Kevin Griest

    Abstract: The recent advancements in machine learning (ML) have demonstrated the potential for providing a powerful solution to build complex prediction systems in a short time. However, in highly regulated industries, such as the financial technology (Fintech), people have raised concerns about the risk of ML systems discriminating against specific protected groups or individuals. To address these concerns… ▽ More

    Submitted 21 March, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: Presented at NeurIPS 2020 Fair AI in Finance Workshop

  27. arXiv:2101.10102  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Practical Robustness Analysis for DNNs based on PAC-Model Learning

    Authors: Renjue Li, Pengfei Yang, Cheng-Chao Huang, Youcheng Sun, Bai Xue, Lijun Zhang

    Abstract: To analyse local robustness properties of deep neural networks (DNNs), we present a practical framework from a model learning perspective. Based on black-box model learning with scenario optimisation, we abstract the local behaviour of a DNN via an affine model with the probably approximately correct (PAC) guarantee. From the learned model, we can infer the corresponding PAC-model robustness prope… ▽ More

    Submitted 13 April, 2022; v1 submitted 25 January, 2021; originally announced January 2021.

  28. Population Quasi-Monte Carlo

    Authors: Chaofan Huang, V. Roshan Joseph, Simon Mak

    Abstract: Monte Carlo methods are widely used for approximating complicated, multidimensional integrals for Bayesian inference. Population Monte Carlo (PMC) is an important class of Monte Carlo methods, which utilizes a population of proposals to generate weighted samples that approximate the target distribution. The generic PMC framework iterates over three steps: samples are simulated from a set of propos… ▽ More

    Submitted 26 December, 2020; originally announced December 2020.

    Comments: Submitted to Journal of Computational and Graphical Statistics

    Journal ref: Journal of Computational and Graphical Statistics (2022)

  29. arXiv:2012.09943  [pdf, other

    stat.ML cs.LG

    Guiding Neural Network Initialization via Marginal Likelihood Maximization

    Authors: Anthony S. Tai, Chunfeng Huang

    Abstract: We propose a simple, data-driven approach to help guide hyperparameter selection for neural network initialization. We leverage the relationship between neural network and Gaussian process models having corresponding activation and covariance functions to infer the hyperparameter values desirable for model initialization. Our experiment shows that marginal likelihood maximization provides recommen… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  30. arXiv:2012.04837  [pdf, other

    cs.CV cs.LG stat.ML

    Deep Unsupervised Image Anomaly Detection: An Information Theoretic Framework

    Authors: Fei Ye, Huangjie Zheng, Chaoqin Huang, Ya Zhang

    Abstract: Surrogate task based methods have recently shown great promise for unsupervised image anomaly detection. However, there is no guarantee that the surrogate tasks share the consistent optimization direction with anomaly detection. In this paper, we return to a direct objective function for anomaly detection with information theory, which maximizes the distance between normal and anomalous data in te… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  31. arXiv:2011.15007  [pdf, other

    cs.LG cs.AI stat.ML

    RealCause: Realistic Causal Inference Benchmarking

    Authors: Brady Neal, Chin-Wei Huang, Sunand Raghupathi

    Abstract: There are many different causal effect estimators in causal inference. However, it is unclear how to choose between these estimators because there is no ground-truth for causal effects. A commonly used option is to simulate synthetic data, where the ground-truth is known. However, the best causal estimators on synthetic data are unlikely to be the best causal estimators on real data. An ideal benc… ▽ More

    Submitted 29 March, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

  32. arXiv:2011.09770  [pdf, other

    q-bio.QM stat.AP

    Freecyto: Quantized Flow Cytometry Analysis for the Web

    Authors: Nathan Wong, Daehwan Kim, Zachery Robinson, Connie Huang, Irina M. Conboy

    Abstract: Flow cytometry (FCM) is an analytic technique that is capable of detecting and recording the emission of fluorescence and light scattering of cells or particles (that are collectively called "events") in a population. A typical FCM experiment can produce a large array of data making the analysis computationally intensive. Current FCM data analysis platforms (FlowJo, etc.), while very useful, do no… ▽ More

    Submitted 19 November, 2020; originally announced November 2020.

    Comments: for associated application, see https://freecyto.com

  33. arXiv:2011.07175  [pdf, other

    stat.AP

    Dynamic Risk Prediction Triggered by Intermediate Events Using Survival Tree Ensembles

    Authors: Yifei Sun, Sy Han Chiou, Colin O. Wu, Meghan McGarry, Chiung-Yu Huang

    Abstract: With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be perf… ▽ More

    Submitted 25 August, 2022; v1 submitted 13 November, 2020; originally announced November 2020.

  34. arXiv:2009.02040  [pdf, other

    cs.LG stat.ML

    Multivariate Time-series Anomaly Detection via Graph Attention Network

    Authors: Hang Zhao, Yu**g Wang, Juanyong Duan, Congrui Huang, Defu Cao, Yunhai Tong, Bixiong Xu, **g Bai, Jie Tong, Qi Zhang

    Abstract: Anomaly detection on multivariate time-series is of great importance in both data mining research and industrial applications. Recent approaches have achieved significant progress in this topic, but there is remaining limitations. One major limitation is that they do not capture the relationships between different time-series explicitly, resulting in inevitable false alarms. In this paper, we prop… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

    Comments: Accepted by ICDM 2020. 10 pages

  35. arXiv:2007.06758  [pdf, other

    cs.IR cs.LG stat.ML

    Recommender Systems for the Internet of Things: A Survey

    Authors: May Altulyan, Lina Yao, Xianzhi Wang, Chaoran Huang, Salil S Kanhere, Quan Z Sheng

    Abstract: Recommendation represents a vital stage in develo** and promoting the benefits of the Internet of Things (IoT). Traditional recommender systems fail to exploit ever-growing, dynamic, and heterogeneous IoT data. This paper presents a comprehensive review of the state-of-the-art recommender systems, as well as related techniques and application in the vibrant field of IoT. We discuss several limit… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

  36. arXiv:2007.04250  [pdf, other

    cs.LG cs.CV stat.ML

    A Benchmark of Medical Out of Distribution Detection

    Authors: Tianshi Cao, Chin-Wei Huang, David Yu-Tung Hui, Joseph Paul Cohen

    Abstract: Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images s… ▽ More

    Submitted 4 August, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Submitted to Machine Learning for Biomedical Imaging Journal (MELBA)

  37. arXiv:2006.05164  [pdf, other

    cs.LG stat.ML

    AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

    Authors: Jae Hyun Lim, Aaron Courville, Christopher Pal, Chin-Wei Huang

    Abstract: Entropy is ubiquitous in machine learning, but it is in general intractable to compute the entropy of the distribution of an arbitrary continuous random variable. In this paper, we propose the amortized residual denoising autoencoder (AR-DAE) to approximate the gradient of the log density function, which can be used to estimate the gradient of entropy. Amortization allows us to significantly reduc… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: accepted in ICML 2020

  38. arXiv:2004.12234  [pdf, other

    stat.ME

    Recurrent Events Analysis With Data Collected at Informative Clinical Visits in Electronic Health Records

    Authors: Yifei Sun, Charles E. McCulloch, Kieren A. Marr, Chiung-Yu Huang

    Abstract: Although increasingly used as a data resource for assembling cohorts, electronic health records (EHRs) pose many analytic challenges. In particular, a patient's health status influences when and what data are recorded, generating sampling bias in the collected data. In this paper, we consider recurrent event analysis using EHR data. Conventional regression methods for event risk analysis usually r… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

  39. Knowledge-guided Deep Reinforcement Learning for Interactive Recommendation

    Authors: Xiaocong Chen, Chaoran Huang, Lina Yao, Xianzhi Wang, Wei Liu, Wenjie Zhang

    Abstract: Interactive recommendation aims to learn from dynamic interactions between items and users to achieve responsiveness and accuracy. Reinforcement learning is inherently advantageous for co** with dynamic environments and thus has attracted increasing attention in interactive recommendation research. Inspired by knowledge-aware recommendation, we proposed Knowledge-Guided deep Reinforcement learni… ▽ More

    Submitted 17 April, 2020; originally announced April 2020.

  40. arXiv:2002.07101  [pdf, other

    cs.LG stat.ML

    Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models

    Authors: Chin-Wei Huang, Laurent Dinh, Aaron Courville

    Abstract: In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drastically increasing the computational cost of sampling and evaluation of a lower bound on the likelihood. Theoretically, we prove the proposed flow can approximate a Hamiltonian ODE as a universal transport map. Empirically, we demonstrate state-of-the-art performanc… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

    Comments: 27 pages, 12 figures

  41. arXiv:2002.03595  [pdf, other

    eess.SP cs.LG stat.ML

    Representation Learning on Variable Length and Incomplete Wearable-Sensory Time Series

    Authors: Xian Wu, Chao Huang, Pablo Roblesgranda, Nitesh Chawla

    Abstract: The prevalence of wearable sensors (e.g., smart wristband) is creating unprecedented opportunities to not only inform health and wellness states of individuals, but also assess and infer personal attributes, including demographic and personality attributes. However, the data captured from wearables, such as heart rate or number of steps, present two key challenges: 1) the time series is often of v… ▽ More

    Submitted 27 May, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  42. arXiv:2002.02090  [pdf, other

    cs.LG cs.DC stat.ML

    Faster On-Device Training Using New Federated Momentum Algorithm

    Authors: Zhouyuan Huo, Qian Yang, Bin Gu, Lawrence Carin. Heng Huang

    Abstract: Mobile crowdsensing has gained significant attention in recent years and has become a critical paradigm for emerging Internet of Things applications. The sensing devices continuously generate a significant quantity of data, which provide tremendous opportunities to develop innovative intelligent applications. To utilize these data to train machine learning models while not compromising user privac… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

  43. arXiv:1911.09251  [pdf, other

    cs.LG stat.ML

    AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture

    Authors: Tunhou Zhang, Hsin-Pai Cheng, Zhenwen Li, Feng Yan, Chengyu Huang, Hai Li, Yiran Chen

    Abstract: Resource is an important constraint when deploying Deep Neural Networks (DNNs) on mobile and edge devices. Existing works commonly adopt the cell-based search approach, which limits the flexibility of network patterns in learned cell structures. Moreover, due to the topology-agnostic nature of existing works, including both cell-based and node-based approaches, the search process is time consuming… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Report number: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)

  44. arXiv:1911.08538  [pdf, other

    cs.LG cs.SI stat.ML

    Heterogeneous Deep Graph Infomax

    Authors: Yuxiang Ren, Bo Liu, Chao Huang, Peng Dai, Liefeng Bo, Jiawei Zhang

    Abstract: Graph representation learning is to learn universal node representations that preserve both node attributes and structural information. The derived node representations can be used to serve various downstream tasks, such as node classification and node clustering. When a graph is heterogeneous, the problem becomes more challenging than the homogeneous graph node learning problem. Inspired by the e… ▽ More

    Submitted 13 November, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

  45. arXiv:1910.05787  [pdf, other

    cs.LG eess.IV stat.ML

    ERNet Family: Hardware-Oriented CNN Models for Computational Imaging Using Block-Based Inference

    Authors: Chao-Tsung Huang

    Abstract: Convolutional neural networks (CNNs) demand huge DRAM bandwidth for computational imaging tasks, and block-based processing has recently been applied to greatly reduce the bandwidth. However, the induced additional computation for feature recomputing or the large SRAM for feature reusing will degrade the performance or even forbid the usage of state-of-the-art models. In this paper, we address the… ▽ More

    Submitted 30 January, 2020; v1 submitted 13 October, 2019; originally announced October 2019.

    Comments: 5 pages; appearing in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

  46. arXiv:1910.00411  [pdf, other

    cs.LG stat.ML

    Generating Fair Universal Representations using Adversarial Models

    Authors: Peter Kairouz, Jiachun Liao, Chong Huang, Maunil Vyas, Monica Welfert, Lalitha Sankar

    Abstract: We present a data-driven framework for learning fair universal representations (FUR) that guarantee statistical fairness for any learning task that may not be known a priori. Our framework leverages recent advances in adversarial learning to allow a data holder to learn representations in which a set of sensitive attributes are decoupled from the rest of the dataset. We formulate this as a constra… ▽ More

    Submitted 11 May, 2022; v1 submitted 27 September, 2019; originally announced October 2019.

    Comments: Extended version of a paper accepted to TIFS

  47. arXiv:1908.03620  [pdf, other

    physics.comp-ph cs.LG eess.SY math.DS stat.ML

    Learning physics-based reduced-order models for a single-injector combustion process

    Authors: Renee Swischuk, Boris Kramer, Cheng Huang, Karen Willcox

    Abstract: This paper presents a physics-based data-driven method to learn predictive reduced-order models (ROMs) from high-fidelity simulations, and illustrates it in the challenging context of a single-injector combustion process. The method combines the perspectives of model reduction and machine learning. Model reduction brings in the physics of the problem, constraining the ROM predictions to lie on a s… ▽ More

    Submitted 11 July, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

    Journal ref: AIAA Journal 58:6, 2658-2672, 2020

  48. arXiv:1907.06637  [pdf, other

    cs.SD cs.HC cs.LG eess.AS stat.ML

    The Bach Doodle: Approachable music composition with machine learning at scale

    Authors: Cheng-Zhi Anna Huang, Curtis Hawthorne, Adam Roberts, Monica Dinculescu, James Wexler, Leon Hong, Jacob Howcroft

    Abstract: To make music composition more approachable, we designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in the style of Bach. For users to input melodies, we designed a simplified sheet-music based interface. To support an interactive experience at scale, we re-implemented… ▽ More

    Submitted 14 July, 2019; originally announced July 2019.

    Comments: Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2019

  49. arXiv:1906.07159  [pdf, other

    cs.SI cs.LG stat.ML

    vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

    Authors: Fan-Yun Sun, Meng Qu, Jordan Hoffmann, Chin-Wei Huang, Jian Tang

    Abstract: This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and… ▽ More

    Submitted 17 September, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted Paper at NeurIPS 2019

  50. arXiv:1906.06706  [pdf, other

    cs.LG math.NA stat.ML

    Interpretations of Deep Learning by Forests and Haar Wavelets

    Authors: Changcun Huang

    Abstract: This paper presents a basic property of region dividing of ReLU (rectified linear unit) deep learning when new layers are successively added, by which two new perspectives of interpreting deep learning are given. The first is related to decision trees and forests; we construct a deep learning structure equivalent to a forest in classification abilities, which means that certain kinds of ReLU deep… ▽ More

    Submitted 6 December, 2019; v1 submitted 16 June, 2019; originally announced June 2019.

    Comments: v2:Lemma 4 rectified; v3-v6:details refined and typos corrected; v7:descriptions revised

    MSC Class: I.2.0 ACM Class: I.2.0