Search | arXiv e-print repository

Information-Theoretic Thresholds for the Alignments of Partially Correlated Graphs

Authors: Dong Huang, Xianwen Song, Pengkun Yang

Abstract: This paper studies the problem of recovering the hidden vertex correspondence between two correlated random graphs. We propose the partially correlated Erdős-Rényi graphs model, wherein a pair of induced subgraphs with a certain number are correlated. We investigate the information-theoretic thresholds for recovering the latent correlated subgraphs and the hidden vertex correspondence. We prove th… ▽ More This paper studies the problem of recovering the hidden vertex correspondence between two correlated random graphs. We propose the partially correlated Erdős-Rényi graphs model, wherein a pair of induced subgraphs with a certain number are correlated. We investigate the information-theoretic thresholds for recovering the latent correlated subgraphs and the hidden vertex correspondence. We prove that there exists an optimal rate for partial recovery for the number of correlated nodes, above which one can correctly match a fraction of vertices and below which correctly matching any positive fraction is impossible, and we also derive an optimal rate for exact recovery. In the proof of possibility results, we propose correlated functional digraphs, which partition the edges of the intersection graph into two types of components, and bound the error probability by lower-order cumulant generating functions. The proof of impossibility results build upon the generalized Fano's inequality and the recovery thresholds settled in correlated Erdős-Rényi graphs model. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.03683 [pdf, other]

Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

Authors: Ding Huang, Ting Li, Jian Huang

Abstract: We propose a Bayesian framework for fine-tuning large diffusion models with a novel network structure called Bayesian Power Steering (BPS). We clarify the meaning behind adaptation from a \textit{large probability space} to a \textit{small probability space} and explore the task of fine-tuning pre-trained models using learnable modules from a Bayesian perspective. BPS extracts task-specific knowle… ▽ More We propose a Bayesian framework for fine-tuning large diffusion models with a novel network structure called Bayesian Power Steering (BPS). We clarify the meaning behind adaptation from a \textit{large probability space} to a \textit{small probability space} and explore the task of fine-tuning pre-trained models using learnable modules from a Bayesian perspective. BPS extracts task-specific knowledge from a pre-trained model's learned prior distribution. It efficiently leverages large diffusion models, differentially intervening different hidden features with a head-heavy and foot-light configuration. Experiments highlight the superiority of BPS over contemporary methods across a range of tasks even with limited amount of data. Notably, BPS attains an FID score of 10.49 under the sketch condition on the COCO17 dataset. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 25 pages, 26 figures, and 4 tables

MSC Class: 62G05; 68T07

arXiv:2405.16672 [pdf, other]

Transfer Learning Under High-Dimensional Graph Convolutional Regression Model for Node Classification

Authors: Jiachen Chen, Danyang Huang, Liyuan Wang, Kathryn L. Lunetta, Debarghya Mukherjee, Huimin Cheng

Abstract: Node classification is a fundamental task, but obtaining node classification labels can be challenging and expensive in many real-world scenarios. Transfer learning has emerged as a promising solution to address this challenge by leveraging knowledge from source domains to enhance learning in a target domain. Existing transfer learning methods for node classification primarily focus on integrating… ▽ More Node classification is a fundamental task, but obtaining node classification labels can be challenging and expensive in many real-world scenarios. Transfer learning has emerged as a promising solution to address this challenge by leveraging knowledge from source domains to enhance learning in a target domain. Existing transfer learning methods for node classification primarily focus on integrating Graph Convolutional Networks (GCNs) with various transfer learning techniques. While these approaches have shown promising results, they often suffer from a lack of theoretical guarantees, restrictive conditions, and high sensitivity to hyperparameter choices. To overcome these limitations, we propose a Graph Convolutional Multinomial Logistic Regression (GCR) model and a transfer learning method based on the GCR model, called Trans-GCR. We provide theoretical guarantees of the estimate obtained under GCR model in high-dimensional settings. Moreover, Trans-GCR demonstrates superior empirical performance, has a low computational cost, and requires fewer hyperparameters than existing methods. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2403.16773 [pdf, other]

Privacy-Protected Spatial Autoregressive Model

Authors: Danyang Huang, Ziyi Kong, Shuyuan Wu, Hansheng Wang

Abstract: Spatial autoregressive (SAR) models are important tools for studying network effects. However, with an increasing emphasis on data privacy, data providers often implement privacy protection measures that make classical SAR models inapplicable. In this study, we introduce a privacy-protected SAR model with noise-added response and covariates to meet privacy-protection requirements. However, in this… ▽ More Spatial autoregressive (SAR) models are important tools for studying network effects. However, with an increasing emphasis on data privacy, data providers often implement privacy protection measures that make classical SAR models inapplicable. In this study, we introduce a privacy-protected SAR model with noise-added response and covariates to meet privacy-protection requirements. However, in this scenario, the traditional quasi-maximum likelihood estimator becomes infeasible because the likelihood function cannot be formulated. To address this issue, we first consider an explicit expression for the likelihood function with only noise-added responses. However, the derivatives are biased owing to the noise in the covariates. Therefore, we develop techniques that can correct the biases introduced by noise. Correspondingly, a Newton-Raphson-type algorithm is proposed to obtain the estimator, leading to a corrected likelihood estimator. To further enhance computational efficiency, we introduce a corrected least squares estimator based on the idea of bias correction. These two estimation methods ensure both data security and the attainment of statistically valid estimators. Theoretical analysis of both estimators is carefully conducted, and statistical inference methods are discussed. The finite sample performances of different methods are demonstrated through extensive simulations and the analysis of a real dataset. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.13118 [pdf, other]

Modal Analysis of Spatiotemporal Data via Multivariate Gaussian Process Regression

Authors: Jiwoo Song, Daning Huang

Abstract: Modal analysis has become an essential tool to understand the coherent structure of complex flows. The classical modal analysis methods, such as dynamic mode decomposition (DMD) and spectral proper orthogonal decomposition (SPOD), rely on a sufficient amount of data that is regularly sampled in time. However, often one needs to deal with sparse temporally irregular data, e.g., due to experimental… ▽ More Modal analysis has become an essential tool to understand the coherent structure of complex flows. The classical modal analysis methods, such as dynamic mode decomposition (DMD) and spectral proper orthogonal decomposition (SPOD), rely on a sufficient amount of data that is regularly sampled in time. However, often one needs to deal with sparse temporally irregular data, e.g., due to experimental measurements and simulation algorithm. To overcome the limitations of data scarcity and irregular sampling, we propose a novel modal analysis technique using multi-variate Gaussian process regression (MVGPR). We first establish the connection between MVGPR and the existing modal analysis techniques, DMD and SPOD, from a linear system identification perspective. Next, leveraging this connection, we develop a MVGPR-based modal analysis technique that addresses the aforementioned limitations. The capability of MVGPR is endowed by its judiciously designed kernel structure for correlation function, that is derived from the assumed linear dynamics. Subsequently, the proposed MVGPR method is benchmarked against DMD and SPOD on a range of examples, from academic and synthesized data to unsteady airfoil aerodynamics. The results demonstrate MVGPR as a promising alternative to classical modal analysis methods, especially in the scenario of scarce and temporally irregular data. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 43 pages, 35 figures

arXiv:2403.11163 [pdf, ps, other]

doi 10.1080/24754269.2024.2343151

A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques

Authors: Xuetong Li, Yuan Gao, Hong Chang, Danyang Huang, Yingying Ma, Rui Pan, Haobo Qi, Feifei Wang, Shuyuan Wu, Ke Xu, **g Zhou, Xuening Zhu, Yingqiu Zhu, Hansheng Wang

Abstract: This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first clas… ▽ More This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first class of literature is about distributed computing and focuses on the situation, where the dataset size is too huge to be comfortably handled by one single computer. In this case, a distributed computation system with multiple computers has to be utilized. The second class of literature is about subsampling methods and concerns about the situation, where the sample size of dataset is small enough to be placed on one single computer but too large to be easily processed by its memory as a whole. The last class of literature studies those minibatch gradient related optimization techniques, which have been extensively used for optimizing various deep learning models. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2402.06031 [pdf, other]

An operator learning perspective on parameter-to-observable maps

Authors: Daniel Zhengyu Huang, Nicholas H. Nelsen, Margaret Trautner

Abstract: Computationally efficient surrogates for parametrized physical models play a crucial role in science and engineering. Operator learning provides data-driven surrogates that map between function spaces. However, instead of full-field measurements, often the available data are only finite-dimensional parametrizations of model inputs or finite observables of model outputs. Building on Fourier Neural… ▽ More Computationally efficient surrogates for parametrized physical models play a crucial role in science and engineering. Operator learning provides data-driven surrogates that map between function spaces. However, instead of full-field measurements, often the available data are only finite-dimensional parametrizations of model inputs or finite observables of model outputs. Building on Fourier Neural Operators, this paper introduces the Fourier Neural Map**s (FNMs) framework that is able to accommodate such finite-dimensional vector inputs or outputs. The paper develops universal approximation theorems for the method. Moreover, in many applications the underlying parameter-to-observable (PtO) map is defined implicitly through an infinite-dimensional operator, such as the solution operator of a partial differential equation. A natural question is whether it is more data-efficient to learn the PtO map end-to-end or first learn the solution operator and subsequently compute the observable from the full-field solution. A theoretical analysis of Bayesian nonparametric regression of linear functionals, which is of independent interest, suggests that the end-to-end approach can actually have worse sample complexity. Extending beyond the theory, numerical results for the FNM approximation of three nonlinear PtO maps demonstrate the benefits of the operator learning perspective that this paper adopts. △ Less

Submitted 6 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 63 pages, 10 figures, 1 table

MSC Class: 68T07; 62G20; 65J15

arXiv:2402.01148 [pdf, other]

The Optimality of Kernel Classifiers in Sobolev Space

Authors: Jianfa Lai, Zhifan Li, Dongming Huang, Qian Lin

Abstract: Kernel methods are widely used in machine learning, especially for classification problems. However, the theoretical analysis of kernel classification is still limited. This paper investigates the statistical performances of kernel classifiers. With some mild assumptions on the conditional probability $η(x)=\mathbb{P}(Y=1\mid X=x)$, we derive an upper bound on the classification excess risk of a k… ▽ More Kernel methods are widely used in machine learning, especially for classification problems. However, the theoretical analysis of kernel classification is still limited. This paper investigates the statistical performances of kernel classifiers. With some mild assumptions on the conditional probability $η(x)=\mathbb{P}(Y=1\mid X=x)$, we derive an upper bound on the classification excess risk of a kernel classifier using recent advances in the theory of kernel regression. We also obtain a minimax lower bound for Sobolev spaces, which shows the optimality of the proposed classifier. Our theoretical results can be extended to the generalization error of overparameterized neural network classifiers. To make our theoretical results more applicable in realistic settings, we also propose a simple method to estimate the interpolation smoothness of $2η(x)-1$ and apply the method to real datasets. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 21 pages, 2 figures

MSC Class: 62G08 (Primary); 68T07; 46E22 (secondary) ACM Class: G.3

arXiv:2401.10903 [pdf, other]

Application of Machine Learning in Stock Market Forecasting: A Case Study of Disney Stock

Authors: Dengxin Huang

Abstract: This document presents a stock market analysis conducted on a dataset consisting of 750 instances and 16 attributes donated in 2014-10-23. The analysis includes an exploratory data analysis (EDA) section, feature engineering, data preparation, model selection, and insights from the analysis. The Fama French 3-factor model is also utilized in the analysis. The results of the analysis are presented,… ▽ More This document presents a stock market analysis conducted on a dataset consisting of 750 instances and 16 attributes donated in 2014-10-23. The analysis includes an exploratory data analysis (EDA) section, feature engineering, data preparation, model selection, and insights from the analysis. The Fama French 3-factor model is also utilized in the analysis. The results of the analysis are presented, with linear regression being the best-performing model. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 9 pages, 7 figures

arXiv:2312.08728 [pdf, other]

Mini-batch Gradient Descent with Buffer

Authors: Haobo Qi, Du Huang, Yingqiu Zhu, Danyang Huang, Hansheng Wang

Abstract: In this paper, we studied a buffered mini-batch gradient descent (BMGD) algorithm for training complex model on massive datasets. The algorithm studied here is designed for fast training on a GPU-CPU system, which contains two steps: the buffering step and the computation step. In the buffering step, a large batch of data (i.e., a buffer) are loaded from the hard drive to the graphical memory of G… ▽ More In this paper, we studied a buffered mini-batch gradient descent (BMGD) algorithm for training complex model on massive datasets. The algorithm studied here is designed for fast training on a GPU-CPU system, which contains two steps: the buffering step and the computation step. In the buffering step, a large batch of data (i.e., a buffer) are loaded from the hard drive to the graphical memory of GPU. In the computation step, a standard mini-batch gradient descent(MGD) algorithm is applied to the buffered data. Compared to traditional MGD algorithm, the proposed BMGD algorithm can be more efficient for two reasons.First, the BMGD algorithm uses the buffered data for multiple rounds of gradient update, which reduces the expensive communication cost from the hard drive to GPU memory. Second, the buffering step can be executed in parallel so that the GPU does not have to stay idle when loading new data. We first investigate the theoretical properties of BMGD algorithms under a linear regression setting.The analysis is then extended to the Polyak-Lojasiewicz loss function class. The theoretical claims about the BMGD algorithm are numerically verified by simulation studies. The practical usefulness of the proposed method is demonstrated by three image-related real data analysis. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.05579 [pdf, other]

Conditional Stochastic Interpolation for Generative Learning

Authors: Ding Huang, Jian Huang, Ting Li, Guohao Shen

Abstract: We propose a conditional stochastic interpolation (CSI) approach to learning conditional distributions. CSI learns probability flow equations or stochastic differential equations that transport a reference distribution to the target conditional distribution. This is achieved by first learning the drift function and the conditional score function based on conditional stochastic interpolation, which… ▽ More We propose a conditional stochastic interpolation (CSI) approach to learning conditional distributions. CSI learns probability flow equations or stochastic differential equations that transport a reference distribution to the target conditional distribution. This is achieved by first learning the drift function and the conditional score function based on conditional stochastic interpolation, which are then used to construct a deterministic process governed by an ordinary differential equation or a diffusion process for conditional sampling. In our proposed CSI model, we incorporate an adaptive diffusion term to address the instability issues arising during the training process. We provide explicit forms of the conditional score function and the drift function in terms of conditional expectations under mild conditions, which naturally lead to an nonparametric regression approach to estimating these functions. Furthermore, we establish non-asymptotic error bounds for learning the target conditional distribution via conditional stochastic interpolation in terms of KL divergence, taking into account the neural network approximation error. We illustrate the application of CSI on image generation using a benchmark image dataset. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 44 pages, 4 figures

arXiv:2312.01815 [pdf, other]

Hypothesis Testing in Gaussian Graphical Models: Novel Goodness-of-Fit Tests and Conditional Randomization Tests

Authors: Xiaotong Lin, Fangqiao Tian, Dongming Huang

Abstract: We introduce novel hypothesis testing methods for Gaussian graphical models, whose foundation is an innovative algorithm that generates exchangeable copies from these models. We utilize the exchangeable copies to formulate a goodness-of-fit test, which is valid in both low and high-dimensional settings and flexible in choosing the test statistic. This test exhibits superior power performance, espe… ▽ More We introduce novel hypothesis testing methods for Gaussian graphical models, whose foundation is an innovative algorithm that generates exchangeable copies from these models. We utilize the exchangeable copies to formulate a goodness-of-fit test, which is valid in both low and high-dimensional settings and flexible in choosing the test statistic. This test exhibits superior power performance, especially in scenarios where the true precision matrix violates the null hypothesis with many small entries. Furthermore, we adapt the sampling algorithm for constructing a new conditional randomization test for the conditional independence between a response $Y$ and a vector of covariates $X$ given some other variables $Z$. Thanks to the model-X framework, this test does not require any modeling assumption about $Y$ and can utilize test statistics from advanced models. It also relaxes the assumptions of conditional randomization tests by allowing the number of unknown parameters of the distribution of $X$ to be much larger than the sample size. For both of our testing procedures, we propose several test statistics and conduct comprehensive simulation studies to demonstrate their superior performance in controlling the Type-I error and achieving high power. The usefulness of our methods is further demonstrated through three real-world applications. △ Less

Submitted 4 December, 2023; originally announced December 2023.

MSC Class: 62F03; 62H15

arXiv:2312.01411 [pdf, other]

Bayesian inference on Cox regression models using catalytic prior distributions

Authors: Weihao Li, Dongming Huang

Abstract: The Cox proportional hazards model (Cox model) is a popular model for survival data analysis. When the sample size is small relative to the dimension of the model, the standard maximum partial likelihood inference is often problematic. In this work, we propose the Cox catalytic prior distributions for Bayesian inference on Cox models, which is an extension of a general class of prior distributions… ▽ More The Cox proportional hazards model (Cox model) is a popular model for survival data analysis. When the sample size is small relative to the dimension of the model, the standard maximum partial likelihood inference is often problematic. In this work, we propose the Cox catalytic prior distributions for Bayesian inference on Cox models, which is an extension of a general class of prior distributions originally designed for stabilizing complex parametric models. The Cox catalytic prior is formulated as a weighted likelihood of the regression coefficients based on synthetic data and a surrogate baseline hazard constant. This surrogate hazard can be either provided by the user or estimated from the data, and the synthetic data are generated from the predictive distribution of a fitted simpler model. For point estimation, we derive an approximation of the marginal posterior mode, which can be computed conveniently as a regularized log partial likelihood estimator. We prove that our prior distribution is proper and the resulting estimator is consistent under mild conditions. In simulation studies, our proposed method outperforms standard maximum partial likelihood inference and is on par with existing shrinkage methods. We further illustrate the application of our method to a real dataset. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 34 pages

arXiv:2310.03597 [pdf, other]

Sampling via Gradient Flows in the Space of Probability Measures

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart

Abstract: Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design com… ▽ More Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically. △ Less

Submitted 9 March, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Related and text overlap with arXiv:2302.11024

arXiv:2306.10915 [pdf, other]

Practical Equivariances via Relational Conditional Neural Processes

Authors: Daolang Huang, Manuel Haussmann, Ulpu Remes, ST John, Grégoire Clarté, Kevin Sebastian Luck, Samuel Kaski, Luigi Acerbi

Abstract: Conditional Neural Processes (CNPs) are a class of metalearning models popular for combining the runtime efficiency of amortized inference with reliable uncertainty quantification. Many relevant machine learning tasks, such as in spatio-temporal modeling, Bayesian Optimization and continuous control, inherently contain equivariances -- for example to translation -- which the model can exploit for… ▽ More Conditional Neural Processes (CNPs) are a class of metalearning models popular for combining the runtime efficiency of amortized inference with reliable uncertainty quantification. Many relevant machine learning tasks, such as in spatio-temporal modeling, Bayesian Optimization and continuous control, inherently contain equivariances -- for example to translation -- which the model can exploit for maximal performance. However, prior attempts to include equivariances in CNPs do not scale effectively beyond two input dimensions. In this work, we propose Relational Conditional Neural Processes (RCNPs), an effective approach to incorporate equivariances into any neural process model. Our proposed method extends the applicability and impact of equivariant neural processes to higher dimensions. We empirically demonstrate the competitive performance of RCNPs on a large array of tasks naturally containing equivariances. △ Less

Submitted 5 November, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 38 pages, 8 figures. Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2306.04111 [pdf, other]

Quasi-Newton Updating for Large-Scale Distributed Learning

Authors: Shuyuan Wu, Danyang Huang, Hansheng Wang

Abstract: Distributed computing is critically important for modern statistical analysis. Herein, we develop a distributed quasi-Newton (DQN) framework with excellent statistical, computation, and communication efficiency. In the DQN method, no Hessian matrix inversion or communication is needed. This considerably reduces the computation and communication complexity of the proposed method. Notably, related e… ▽ More Distributed computing is critically important for modern statistical analysis. Herein, we develop a distributed quasi-Newton (DQN) framework with excellent statistical, computation, and communication efficiency. In the DQN method, no Hessian matrix inversion or communication is needed. This considerably reduces the computation and communication complexity of the proposed method. Notably, related existing methods only analyze numerical convergence and require a diverging number of iterations to converge. However, we investigate the statistical properties of the DQN method and theoretically demonstrate that the resulting estimator is statistically efficient over a small number of iterations under mild conditions. Extensive numerical analyses demonstrate the finite sample performance. △ Less

Submitted 11 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 56 pages, 3 figures

arXiv:2305.15871 [pdf, other]

Learning Robust Statistics for Simulation-based Inference under Model Misspecification

Authors: Daolang Huang, Ayush Bharti, Amauri Souza, Luigi Acerbi, Samuel Kaski

Abstract: Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. I… ▽ More Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified. △ Less

Submitted 5 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 22 pages, 13 figures, Published at NeurIPS 2023

arXiv:2304.06900 [pdf, other]

Subsampling-Based Modified Bayesian Information Criterion for Large-Scale Stochastic Block Models

Authors: Jiayi Deng, Danyang Huang, Xiangyu Chang, Bo Zhang

Abstract: Identifying the number of communities is a fundamental problem in community detection, which has received increasing attention recently. However, rapid advances in technology have led to the emergence of large-scale networks in various disciplines, thereby making existing methods computationally infeasible. To address this challenge, we propose a novel subsampling-based modified Bayesian informati… ▽ More Identifying the number of communities is a fundamental problem in community detection, which has received increasing attention recently. However, rapid advances in technology have led to the emergence of large-scale networks in various disciplines, thereby making existing methods computationally infeasible. To address this challenge, we propose a novel subsampling-based modified Bayesian information criterion (SM-BIC) for identifying the number of communities in a network generated via the stochastic block model and degree-corrected stochastic block model. We first propose a node-pair subsampling method to extract an informative subnetwork from the entire network, and then we derive a purely data-driven criterion to identify the number of communities for the subnetwork. In this way, the SM-BIC can identify the number of communities based on the subsampled network instead of the entire dataset. This leads to important computational advantages over existing methods. We theoretically investigate the computational complexity and identification consistency of the SM-BIC. Furthermore, the advantages of the SM-BIC are demonstrated by extensive numerical studies. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2302.11024 [pdf, other]

Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M. Stuart

Abstract: Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of… ▽ More Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of probability measures, may also be identified; particle approximations of these mean-field models form the basis of algorithms. The gradient flow approach is also the basis of algorithms for variational inference, in which the optimization is performed over a parameterized family of probability distributions such as Gaussians, and the underlying gradient flow is restricted to the parameterized family. By choosing different energy functionals and metrics for the gradient flow, different algorithms with different convergence properties arise. In this paper, we concentrate on the Kullback-Leibler divergence after showing that, up to scaling, it has the unique property that the gradient flows resulting from this choice of energy do not depend on the normalization constant. For the metrics, we focus on variants of the Fisher-Rao, Wasserstein, and Stein metrics; we introduce the affine invariance property for gradient flows, and their corresponding mean-field models, determine whether a given metric leads to affine invariance, and modify it to make it affine invariant if it does not. We study the resulting gradient flows in both probability density space and Gaussian space. The flow in the Gaussian space may be understood as a Gaussian approximation of the flow. We demonstrate that the Gaussian approximation based on the metric and through moment closure coincide, establish connections between them, and study their long-time convergence properties showing the advantages of affine invariance. △ Less

Submitted 2 November, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: 82 pages, 8 figures (Welcome any feedback!)

arXiv:2302.05872 [pdf, other]

I$^2$SB: Image-to-Image Schrödinger Bridge

Authors: Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, Anima Anandkumar

Abstract: We propose Image-to-Image Schrödinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions. These diffusion bridges are particularly useful for image restoration, as the degraded images are structurally informative priors for reconstructing the clean images. I$^2$SB belongs to a tractable class of Schröd… ▽ More We propose Image-to-Image Schrödinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions. These diffusion bridges are particularly useful for image restoration, as the degraded images are structurally informative priors for reconstructing the clean images. I$^2$SB belongs to a tractable class of Schrödinger bridge, the nonlinear extension to score-based models, whose marginal distributions can be computed analytically given boundary pairs. This results in a simulation-free framework for nonlinear diffusions, where the I$^2$SB training becomes scalable by adopting practical techniques used in standard diffusion models. We validate I$^2$SB in solving various image restoration tasks, including inpainting, super-resolution, deblurring, and JPEG restoration on ImageNet 256x256 and show that I$^2$SB surpasses standard conditional diffusion models with more interpretable generative processes. Moreover, I$^2$SB matches the performance of inverse methods that additionally require the knowledge of the corruption operators. Our work opens up new algorithmic opportunities for develo** efficient nonlinear diffusion models on a large scale. scale. Project page and codes: https://i2sb.github.io/ △ Less

Submitted 25 May, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

Comments: ICML camera ready (high-resolution figures)

arXiv:2208.14123 [pdf, other]

Catalytic Priors: Using Synthetic Data to Specify Prior Distributions in Bayesian Analysis

Authors: Dongming Huang, Feicheng Wang, Donald B. Rubin, S. C. Kou

Abstract: Catalytic prior distributions provide general, easy-to-use, and interpretable specifications of prior distributions for Bayesian analysis. They are particularly beneficial when the observed data are inadequate to stably estimate a complex target model. A catalytic prior distribution is constructed by augmenting the observed data with synthetic data that are sampled from the predictive distribution… ▽ More Catalytic prior distributions provide general, easy-to-use, and interpretable specifications of prior distributions for Bayesian analysis. They are particularly beneficial when the observed data are inadequate to stably estimate a complex target model. A catalytic prior distribution is constructed by augmenting the observed data with synthetic data that are sampled from the predictive distribution of a simpler model estimated from the observed data. We illustrate the usefulness of the catalytic prior approach using an example from labor economics. In the example, the resulting Bayesian inference reflects many important aspects of the observed data, and the estimation accuracy and predictive performance of the inference based on the catalytic prior are superior to, or comparable to, that of other commonly used prior distributions. We further explore the connection between the catalytic prior approach and a few popular regularization methods. We expect the catalytic prior approach to be useful in many applications. △ Less

Submitted 22 September, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

arXiv:2205.08364 [pdf, other]

Network Gradient Descent Algorithm for Decentralized Federated Learning

Authors: Shuyuan Wu, Danyang Huang, Hansheng Wang

Abstract: We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communication-based network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. Meanwhile, different clients communicate with each other… ▽ More We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communication-based network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. Meanwhile, different clients communicate with each other directly according to a carefully designed network structure without a central master. This greatly enhances the reliability of the entire algorithm. Those nice properties inspire us to carefully study the NGD method both theoretically and numerically. Theoretically, we start with a classical linear regression model. We find that both the learning rate and the network structure play significant roles in determining the NGD estimator's statistical efficiency. The resulting NGD estimator can be statistically as efficient as the global estimator, if the learning rate is sufficiently small and the network structure is well balanced, even if the data are distributed heterogeneously. Those interesting findings are then extended to general models and loss functions. Extensive numerical studies are presented to corroborate our theoretical findings. Classical deep learning models are also presented for illustration purpose. △ Less

Submitted 5 May, 2022; originally announced May 2022.

arXiv:2204.08247 [pdf, other]

Joint Multi-view Unsupervised Feature Selection and Graph Learning

Authors: Si-Guo Fang, Dong Huang, Chang-Dong Wang, Yong Tang

Abstract: Despite significant progress, previous multi-view unsupervised feature selection methods mostly suffer from two limitations. First, they generally utilize either cluster structure or similarity structure to guide the feature selection, which neglect the possibility of a joint formulation with mutual benefits. Second, they often learn the similarity structure by either global structure learning or… ▽ More Despite significant progress, previous multi-view unsupervised feature selection methods mostly suffer from two limitations. First, they generally utilize either cluster structure or similarity structure to guide the feature selection, which neglect the possibility of a joint formulation with mutual benefits. Second, they often learn the similarity structure by either global structure learning or local structure learning, which lack the capability of graph learning with both global and local structural awareness. In light of this, this paper presents a joint multi-view unsupervised feature selection and graph learning (JMVFG) approach. Particularly, we formulate the multi-view feature selection with orthogonal decomposition, where each target matrix is decomposed into a view-specific basis matrix and a view-consistent cluster indicator. The cross-space locality preservation is incorporated to bridge the cluster structure learning in the projected space and the similarity learning (i.e., graph learning) in the original space. Further, a unified objective function is presented to enable the simultaneous learning of the cluster structure, the global and local similarity structures, and the multi-view consistency and inconsistency, upon which an alternating optimization algorithm is developed with theoretically proved convergence. Extensive experiments on a variety of real-world multi-view datasets demonstrate the superiority of our approach for both the multi-view feature selection and graph learning tasks. The code is available at https://github.com/huangdonghere/JMVFG. △ Less

Submitted 11 August, 2023; v1 submitted 18 April, 2022; originally announced April 2022.

Comments: To appear in IEEE Transactions on Emerging Topics in Computational Intelligence

arXiv:2204.07615 [pdf, other]

TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets

Authors: Chengrun Yang, Gabriel Bender, Hanxiao Liu, Pieter-Jan Kindermans, Madeleine Udell, Yifeng Lu, Quoc Le, Da Huang

Abstract: The best neural architecture for a given machine learning problem depends on many factors: not only the complexity and structure of the dataset, but also on resource constraints including latency, compute, energy consumption, etc. Neural architecture search (NAS) for tabular datasets is an important but under-explored problem. Previous NAS algorithms designed for image search spaces incorporate re… ▽ More The best neural architecture for a given machine learning problem depends on many factors: not only the complexity and structure of the dataset, but also on resource constraints including latency, compute, energy consumption, etc. Neural architecture search (NAS) for tabular datasets is an important but under-explored problem. Previous NAS algorithms designed for image search spaces incorporate resource constraints directly into the reinforcement learning (RL) rewards. However, for NAS on tabular datasets, this protocol often discovers suboptimal architectures. This paper develops TabNAS, a new and more effective approach to handle resource constraints in tabular NAS using an RL controller motivated by the idea of rejection sampling. TabNAS immediately discards any architecture that violates the resource constraints without training or learning from that architecture. TabNAS uses a Monte-Carlo-based correction to the RL policy gradient update to account for this extra filtering step. Results on several tabular datasets demonstrate the superiority of TabNAS over previous reward-sha** methods: it finds better models that obey the constraints. △ Less

Submitted 20 October, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: NeurIPS 2022; 30 pages, 15 figures, 7 tables

arXiv:2204.05552 [pdf]

The Effects of Dynamic Learning and the Forgetting Process on an Optimizing Modelling for Full-Service Repair Pricing Contracts for Medical Devices

Authors: Ai** Jiang, Lin Li, Xuemin Xu, David Y. C. Huang

Abstract: In order to improve the profitability and customer service management of original equipment manufacturers (OEMs) in a market where full-service (FS) and on-call service (OS) co-exist, this article extends the optimizing modelling for pricing FS repair contracts with the effects of dynamic learning and forgetting. Along with considering autonomous learning in maintenance practice, this study also a… ▽ More In order to improve the profitability and customer service management of original equipment manufacturers (OEMs) in a market where full-service (FS) and on-call service (OS) co-exist, this article extends the optimizing modelling for pricing FS repair contracts with the effects of dynamic learning and forgetting. Along with considering autonomous learning in maintenance practice, this study also analyses how induced learning and forgetting process in a workplace put impact on the pricing optimizing model of FS contracts in the portfolio of FS and OS. A numerical analysis based on real data from a medical industry proves that the enhanced FS pricing model discussed here has two main advantages: (1) It could prominently improve repair efficiency, and (2) It help OEMs gain better profits compared to the original FS model and the sole OS maintenance. Sensitivity analysis shows that if internal failure rate increases, the optimized FS price rises gradually until reaching the maximum value, and profitability to the OEM increases overall; if frequency of induced learning goes up, the optimal FS price rises after a short-term downward trend, with a stable profitability to the OEM. △ Less

Submitted 12 April, 2022; originally announced April 2022.

arXiv:2203.11572 [pdf, other]

doi 10.1109/TKDE.2023.3236698

Fast Multi-view Clustering via Ensembles: Towards Scalability, Superiority, and Simplicity

Authors: Dong Huang, Chang-Dong Wang, Jian-Huang Lai

Abstract: Despite significant progress, there remain three limitations to the previous multi-view clustering algorithms. First, they often suffer from high computational complexity, restricting their feasibility for large-scale datasets. Second, they typically fuse multi-view information via one-stage fusion, neglecting the possibilities in multi-stage fusions. Third, dataset-specific hyperparameter-tuning… ▽ More Despite significant progress, there remain three limitations to the previous multi-view clustering algorithms. First, they often suffer from high computational complexity, restricting their feasibility for large-scale datasets. Second, they typically fuse multi-view information via one-stage fusion, neglecting the possibilities in multi-stage fusions. Third, dataset-specific hyperparameter-tuning is frequently required, further undermining their practicability. In light of this, we propose a fast multi-view clustering via ensembles (FastMICE) approach. Particularly, the concept of random view groups is presented to capture the versatile view-wise relationships, through which the hybrid early-late fusion strategy is designed to enable efficient multi-stage fusions. With multiple views extended to many view groups, three levels of diversity (w.r.t. features, anchors, and neighbors, respectively) are jointly leveraged for constructing the view-sharing bipartite graphs in the early-stage fusion. Then, a set of diversified base clusterings for different view groups are obtained via fast graph partitioning, which are further formulated into a unified bipartite graph for final clustering in the late-stage fusion. Notably, FastMICE has almost linear time and space complexity, and is free of dataset-specific tuning. Experiments on 22 multi-view datasets demonstrate its advantages in scalability (for extremely large datasets), superiority (in clustering performance), and simplicity (to be applied) over the state-of-the-art. Code available: https://github.com/huangdonghere/FastMICE. △ Less

Submitted 24 January, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: To appear in IEEE Transactions on Knowledge and Data Engineering

arXiv:2203.02709 [pdf, other]

A Wasserstein distance-based spectral clustering method for transaction data analysis

Authors: Yingqiu Zhu, Danyang Huang, Bo Zhang

Abstract: With the rapid development of online payment platforms, it is now possible to record massive transaction data. Clustering on transaction data significantly contributes to analyzing merchants' behavior patterns. This enables payment platforms to provide differentiated services or implement risk management strategies. However, traditional methods exploit transactions by generating low-dimensional fe… ▽ More With the rapid development of online payment platforms, it is now possible to record massive transaction data. Clustering on transaction data significantly contributes to analyzing merchants' behavior patterns. This enables payment platforms to provide differentiated services or implement risk management strategies. However, traditional methods exploit transactions by generating low-dimensional features, leading to inevitable information loss. In this study, we use the empirical cumulative distribution of transactions to characterize merchants. We adopt Wasserstein distance to measure the dissimilarity between any two merchants and propose the Wasserstein-distance-based spectral clustering (WSC) approach. Based on the similarities between merchants' transaction distributions, a graph of merchants is generated. Thus, we treat the clustering of merchants as a graph-cut problem and solve it under the framework of spectral clustering. To ensure feasibility of the proposed method on large-scale datasets with limited computational resources, we propose a subsampling method for WSC (SubWSC). The associated theoretical properties are investigated to verify the efficiency of the proposed approach. The simulations and empirical study demonstrate that the proposed method outperforms feature-based methods in finding behavior patterns of merchants. △ Less

Submitted 16 February, 2023; v1 submitted 5 March, 2022; originally announced March 2022.

arXiv:2110.13613 [pdf, other]

Subsampling Spectral Clustering for Large-Scale Social Networks

Authors: Jiayi Deng, Yi Ding, Yingqiu Zhu, Danyang Huang, Bingyi **g, Bo Zhang

Abstract: Online social network platforms such as Twitter and Sina Weibo have been extremely popular over the past 20 years. Identifying the network community of a social platform is essential to exploring and understanding the users' interests. However, the rapid development of science and technology has generated large amounts of social network data, creating great computational challenges for community d… ▽ More Online social network platforms such as Twitter and Sina Weibo have been extremely popular over the past 20 years. Identifying the network community of a social platform is essential to exploring and understanding the users' interests. However, the rapid development of science and technology has generated large amounts of social network data, creating great computational challenges for community detection in large-scale social networks. Here, we propose a novel subsampling spectral clustering algorithm to identify community structures in large-scale social networks with limited computing resources. More precisely, spectral clustering is conducted using only the information of a small subsample of the network nodes, resulting in a huge reduction in computational time. As a result, for large-scale datasets, the method can be realized even using a personal computer. Specifically, we introduce two different sampling techniques, namely simple random subsampling and degree corrected subsampling. The methodology is applied to the dataset collected from Sina Weibo, which is one of the largest Twitter-type social network platforms in China. Our method can very effectively identify the community structure of registered users. This community structure information can be applied to help Sina Weibo promote advertisements to target users and increase user activity. △ Less

Submitted 21 December, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

arXiv:2110.10210 [pdf, other]

Long Random Matrices and Tensor Unfolding

Authors: Gérard Ben Arous, Daniel Zhengyu Huang, Jiaoyang Huang

Abstract: In this paper, we consider the singular values and singular vectors of low rank perturbations of large rectangular random matrices, in the regime the matrix is "long": we allow the number of rows (columns) to grow polynomially in the number of columns (rows). We prove there exists a critical signal-to-noise ratio (depending on the dimensions of the matrix), and the extreme singular values and sing… ▽ More In this paper, we consider the singular values and singular vectors of low rank perturbations of large rectangular random matrices, in the regime the matrix is "long": we allow the number of rows (columns) to grow polynomially in the number of columns (rows). We prove there exists a critical signal-to-noise ratio (depending on the dimensions of the matrix), and the extreme singular values and singular vectors exhibit a BBP type phase transition. As a main application, we investigate the tensor unfolding algorithm for the asymmetric rank-one spiked tensor model, and obtain an exact threshold, which is independent of the procedure of tensor unfolding. If the signal-to-noise ratio is above the threshold, tensor unfolding detects the signals; otherwise, it fails to capture the signals. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: 29 pages, 4 figures

arXiv:2108.00543 [pdf]

Statistical Learning in Preclinical Drug Proarrhythmic Assessment

Authors: Nan Milex Xi, Yu-Yi Hsu, Qianyu Dang, Dalong Patrick Huang

Abstract: Torsades de pointes (TdP) is an irregular heart rhythm characterized by faster beat rates and potentially could lead to sudden cardiac death. Much effort has been invested in understanding the drug-induced TdP in preclinical studies. However, a comprehensive statistical learning framework that can accurately predict the drug-induced TdP risk from preclinical data is still lacking. We proposed ordi… ▽ More Torsades de pointes (TdP) is an irregular heart rhythm characterized by faster beat rates and potentially could lead to sudden cardiac death. Much effort has been invested in understanding the drug-induced TdP in preclinical studies. However, a comprehensive statistical learning framework that can accurately predict the drug-induced TdP risk from preclinical data is still lacking. We proposed ordinal logistic regression and ordinal random forest models to predict low-, intermediate-, and high-risk drugs based on datasets generated from two experimental protocols. Leave-one-drug-out cross-validation, stratified bootstrap, and permutation predictor importance were applied to estimate and interpret the model performance under uncertainty. The potential outlier drugs identified by our models are consistent with their descriptions in the literature. Our method is accurate, interpretable, and thus useable as supplemental evidence in the drug safety assessment. △ Less

Submitted 7 January, 2022; v1 submitted 1 August, 2021; originally announced August 2021.

arXiv:2009.08435 [pdf, other]

Large Norms of CNN Layers Do Not Hurt Adversarial Robustness

Authors: Youwei Liang, Dong Huang

Abstract: Since the Lipschitz properties of convolutional neural networks (CNNs) are widely considered to be related to adversarial robustness, we theoretically characterize the $\ell_1$ norm and $\ell_\infty$ norm of 2D multi-channel convolutional layers and provide efficient methods to compute the exact $\ell_1$ norm and $\ell_\infty$ norm. Based on our theorem, we propose a novel regularization method te… ▽ More Since the Lipschitz properties of convolutional neural networks (CNNs) are widely considered to be related to adversarial robustness, we theoretically characterize the $\ell_1$ norm and $\ell_\infty$ norm of 2D multi-channel convolutional layers and provide efficient methods to compute the exact $\ell_1$ norm and $\ell_\infty$ norm. Based on our theorem, we propose a novel regularization method termed norm decay, which can effectively reduce the norms of convolutional layers and fully-connected layers. Experiments show that norm-regularization methods, including norm decay, weight decay, and singular value clip**, can improve generalization of CNNs. However, they can slightly hurt adversarial robustness. Observing this unexpected phenomenon, we compute the norms of layers in the CNNs trained with three different adversarial training frameworks and surprisingly find that adversarially robust CNNs have comparable or even larger layer norms than their non-adversarially robust counterparts. Furthermore, we prove that under a mild assumption, adversarially robust classifiers can be achieved using neural networks, and an adversarially robust neural network can have an arbitrarily large Lipschitz constant. For this reason, enforcing small norms on CNN layers may be neither necessary nor effective in achieving adversarial robustness. The code is available at https://github.com/youweiliang/norm_robustness. △ Less

Submitted 15 August, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: AAAI 2021, including Appendix, 15 pages, 4 figures

arXiv:2008.10208 [pdf, other]

Multi-view Graph Learning by Joint Modeling of Consistency and Inconsistency

Authors: Youwei Liang, Dong Huang, Chang-Dong Wang, Philip S. Yu

Abstract: Graph learning has emerged as a promising technique for multi-view clustering with its ability to learn a unified and robust graph from multiple views. However, existing graph learning methods mostly focus on the multi-view consistency issue, yet often neglect the inconsistency across multiple views, which makes them vulnerable to possibly low-quality or noisy datasets. To overcome this limitation… ▽ More Graph learning has emerged as a promising technique for multi-view clustering with its ability to learn a unified and robust graph from multiple views. However, existing graph learning methods mostly focus on the multi-view consistency issue, yet often neglect the inconsistency across multiple views, which makes them vulnerable to possibly low-quality or noisy datasets. To overcome this limitation, we propose a new multi-view graph learning framework, which for the first time simultaneously and explicitly models multi-view consistency and multi-view inconsistency in a unified objective function, through which the consistent and inconsistent parts of each single-view graph as well as the unified graph that fuses the consistent parts can be iteratively learned. Though optimizing the objective function is NP-hard, we design a highly efficient optimization algorithm which is able to obtain an approximate solution with linear time complexity in the number of edges in the unified graph. Furthermore, our multi-view graph learning approach can be applied to both similarity graphs and dissimilarity graphs, which lead to two graph fusion-based variants in our framework. Experiments on twelve multi-view datasets have demonstrated the robustness and efficiency of the proposed approach. △ Less

Submitted 3 July, 2021; v1 submitted 24 August, 2020; originally announced August 2020.

Comments: Preprint, under review

ACM Class: I.5.3; I.5.1

arXiv:2004.03260 [pdf, other]

Automatic, Dynamic, and Nearly Optimal Learning Rate Specification by Local Quadratic Approximation

Authors: Yingqiu Zhu, Yu Chen, Danyang Huang, Bo Zhang, Hansheng Wang

Abstract: In deep learning tasks, the learning rate determines the update step size in each iteration, which plays a critical role in gradient-based optimization. However, the determination of the appropriate learning rate in practice typically replies on subjective judgement. In this work, we propose a novel optimization method based on local quadratic approximation (LQA). In each update step, given the gr… ▽ More In deep learning tasks, the learning rate determines the update step size in each iteration, which plays a critical role in gradient-based optimization. However, the determination of the appropriate learning rate in practice typically replies on subjective judgement. In this work, we propose a novel optimization method based on local quadratic approximation (LQA). In each update step, given the gradient direction, we locally approximate the loss function by a standard quadratic function of the learning rate. Then, we propose an approximation step to obtain a nearly optimal learning rate in a computationally efficient way. The proposed LQA method has three important features. First, the learning rate is automatically determined in each update step. Second, it is dynamically adjusted according to the current loss function value and the parameter estimates. Third, with the gradient direction fixed, the proposed method leads to nearly the greatest reduction in terms of the loss function. Extensive experiments have been conducted to prove the strengths of the proposed LQA method. △ Less

Submitted 7 April, 2020; originally announced April 2020.

Comments: 10 pages, 5 figures

MSC Class: 62-08; 41A99 ACM Class: G.0; I.0

arXiv:2004.02414 [pdf, other]

Efficient Estimation for Generalized Linear Models on a Distributed System with Nonrandomly Distributed Data

Authors: Feifei Wang, Danyang Huang, Yingqiu Zhu, Hansheng Wang

Abstract: Distributed systems have been widely used in practice to accomplish data analysis tasks of huge scales. In this work, we target on the estimation problem of generalized linear models on a distributed system with nonrandomly distributed data. We develop a Pseudo-Newton-Raphson algorithm for efficient estimation. In this algorithm, we first obtain a pilot estimator based on a small random sample col… ▽ More Distributed systems have been widely used in practice to accomplish data analysis tasks of huge scales. In this work, we target on the estimation problem of generalized linear models on a distributed system with nonrandomly distributed data. We develop a Pseudo-Newton-Raphson algorithm for efficient estimation. In this algorithm, we first obtain a pilot estimator based on a small random sample collected from different Workers. Then conduct one-step updating based on the computed derivatives of log-likelihood functions in each Worker at the pilot estimator. The final one-step estimator is proved to be statistically efficient as the global estimator, even with nonrandomly distributed data. In addition, it is computationally efficient, in terms of both communication cost and storage usage. Based on the one-step estimator, we also develop a likelihood ratio test for hypothesis testing. The theoretical properties of the one-step estimator and the corresponding likelihood ratio test are investigated. The finite sample performances are assessed through simulations. Finally, an American Airline dataset is analyzed on a Spark cluster for illustration purpose. △ Less

Submitted 6 April, 2020; originally announced April 2020.

arXiv:1911.09781 [pdf, other]

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels

Authors: Lu Jiang, Di Huang, Mason Liu, Weilong Yang

Abstract: Performing controlled experiments on noisy data is essential in understanding deep learning across noise levels. Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise, and real-world label noise has never been studied in a controlled setting. This paper makes three contributions. First, we establish the first benchmark of contro… ▽ More Performing controlled experiments on noisy data is essential in understanding deep learning across noise levels. Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise, and real-world label noise has never been studied in a controlled setting. This paper makes three contributions. First, we establish the first benchmark of controlled real-world label noise from the web. This new benchmark enables us to study the web label noise in a controlled setting for the first time. The second contribution is a simple but effective method to overcome both synthetic and real noisy labels. We show that our method achieves the best result on our dataset as well as on two public benchmarks (CIFAR and WebVision). Third, we conduct the largest study by far into understanding deep neural networks trained on noisy labels across different noise levels, noise types, network architectures, and training settings. The data and code are released at the following link: http://www.lujiang.info/cnlw.html △ Less

Submitted 27 August, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: published at ICML 2020

arXiv:1909.04503 [pdf, other]

ArduCode: Predictive Framework for Automation Engineering

Authors: Arquimedes Canedo, Palash Goyal, Di Huang, Amit Pandey, Gustavo Quiros

Abstract: Automation engineering is the task of integrating, via software, various sensors, actuators, and controls for automating a real-world process. Today, automation engineering is supported by a suite of software tools including integrated development environments (IDE), hardware configurators, compilers, and runtimes. These tools focus on the automation code itself, but leave the automation engineer… ▽ More Automation engineering is the task of integrating, via software, various sensors, actuators, and controls for automating a real-world process. Today, automation engineering is supported by a suite of software tools including integrated development environments (IDE), hardware configurators, compilers, and runtimes. These tools focus on the automation code itself, but leave the automation engineer unassisted in their decision making. This can lead to increased time for software development because of imperfections in decision making leading to multiple iterations between software and hardware. To address this, this paper defines multiple challenges often faced in automation engineering and propose solutions using machine learning to assist engineers tackle such challenges. We show that machine learning can be leveraged to assist the automation engineer in classifying automation, finding similar code snippets, and reasoning about the hardware selection of sensors and actuators. We validate our architecture on two real datasets consisting of 2,927 Arduino projects, and 683 Programmable Logic Controller (PLC) projects. Our results show that paragraph embedding techniques can be utilized to classify automation using code snippets with precision close to human annotation, giving an F1-score of 72%. Further, we show that such embedding techniques can help us find similar code snippets with high accuracy. Finally, we use autoencoder models for hardware recommendation and achieve a p@3 of 0.79 and p@5 of 0.95. △ Less

Submitted 6 July, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

arXiv:1909.02811 [pdf, other]

Graph Representation Ensemble Learning

Authors: Palash Goyal, Di Huang, Sujit Rokka Chhetri, Arquimedes Canedo, Jaya Shree, Evan Patterson

Abstract: Representation learning on graphs has been gaining attention due to its wide applicability in predicting missing links, and classifying and recommending nodes. Most embedding methods aim to preserve certain properties of the original graph in the low dimensional space. However, real world graphs have a combination of several properties which are difficult to characterize and capture by a single ap… ▽ More Representation learning on graphs has been gaining attention due to its wide applicability in predicting missing links, and classifying and recommending nodes. Most embedding methods aim to preserve certain properties of the original graph in the low dimensional space. However, real world graphs have a combination of several properties which are difficult to characterize and capture by a single approach. In this work, we introduce the problem of graph representation ensemble learning and provide a first of its kind framework to aggregate multiple graph embedding methods efficiently. We provide analysis of our framework and analyze -- theoretically and empirically -- the dependence between state-of-the-art embedding methods. We test our models on the node classification task on four real world graphs and show that proposed ensemble approaches can outperform the state-of-the-art methods by up to 8% on macro-F1. We further show that the approach is even more beneficial for underrepresented classes providing an improvement of up to 12%. △ Less

Submitted 12 September, 2019; v1 submitted 6 September, 2019; originally announced September 2019.

arXiv:1905.11669 [pdf, other]

CompactNet: Platform-Aware Automatic Optimization for Convolutional Neural Networks

Authors: Weicheng Li, Rui Wang, Zhongzhi Luan, Di Huang, Zidong Du, Yunji Chen, Depei Qian

Abstract: Convolutional Neural Network (CNN) based Deep Learning (DL) has achieved great progress in many real-life applications. Meanwhile, due to the complex model structures against strict latency and memory restriction, the implementation of CNN models on the resource-limited platforms is becoming more challenging. This work proposes a solution, called CompactNet\footnote{Project URL: \url{https://githu… ▽ More Convolutional Neural Network (CNN) based Deep Learning (DL) has achieved great progress in many real-life applications. Meanwhile, due to the complex model structures against strict latency and memory restriction, the implementation of CNN models on the resource-limited platforms is becoming more challenging. This work proposes a solution, called CompactNet\footnote{Project URL: \url{https://github.com/CompactNet/CompactNet}}, which automatically optimizes a pre-trained CNN model on a specific resource-limited platform given a specific target of inference speedup. Guided by a simulator of the target platform, CompactNet progressively trims a pre-trained network by removing certain redundant filters until the target speedup is reached and generates an optimal platform-specific model while maintaining the accuracy. We evaluate our work on two platforms of a mobile ARM CPU and a machine learning accelerator NPU (Cambricon-1A ISA) on a Huawei Mate10 smartphone. For the state-of-the-art slim CNN model made for the embedded platform, MobileNetV2, CompactNet achieves up to a 1.8x kernel computation speedup with equal or even higher accuracy for image classification tasks on the Cifar-10 dataset. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1904.10171 [pdf]

Driving Decision and Control for Autonomous Lane Change based on Deep Reinforcement Learning

Authors: Tianyu Shi, Pin Wang, Xuxin Cheng, Ching-Yao Chan, Ding Huang

Abstract: We apply Deep Q-network (DQN) with the consideration of safety during the task for deciding whether to conduct the maneuver. Furthermore, we design two similar Deep Q learning frameworks with quadratic approximator for deciding how to select a comfortable gap and just follow the preceding vehicle. Finally, a polynomial lane change trajectory is generated and Pure Pursuit Control is implemented for… ▽ More We apply Deep Q-network (DQN) with the consideration of safety during the task for deciding whether to conduct the maneuver. Furthermore, we design two similar Deep Q learning frameworks with quadratic approximator for deciding how to select a comfortable gap and just follow the preceding vehicle. Finally, a polynomial lane change trajectory is generated and Pure Pursuit Control is implemented for path tracking. We demonstrate the effectiveness of this framework in simulation, from both the decision-making and control layers. The proposed architecture also has the potential to be extended to other autonomous driving scenarios. △ Less

Submitted 30 July, 2019; v1 submitted 23 April, 2019; originally announced April 2019.

Comments: This Paper has been submitted to ITSC 2019

arXiv:1903.02806 [pdf, ps, other]

Relaxing the Assumptions of Knockoffs by Conditioning

Authors: Dongming Huang, Lucas Janson

Abstract: The recent paper Candès et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independent… ▽ More The recent paper Candès et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $Ω(n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions. △ Less

Submitted 12 June, 2020; v1 submitted 7 March, 2019; originally announced March 2019.

MSC Class: 62G10; 62B05; 62J02

arXiv:1903.01057 [pdf, other]

doi 10.1109/TKDE.2019.2903410

Ultra-Scalable Spectral Clustering and Ensemble Clustering

Authors: Dong Huang, Chang-Dong Wang, Jian-Sheng Wu, Jian-Huang Lai, Chee-Keong Kwoh

Abstract: This paper focuses on scalability and robustness of spectral clustering for extremely large-scale datasets with limited resources. Two novel algorithms are proposed, namely, ultra-scalable spectral clustering (U-SPEC) and ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative selection strategy and a fast approximation method for K-nearest representatives are proposed for… ▽ More This paper focuses on scalability and robustness of spectral clustering for extremely large-scale datasets with limited resources. Two novel algorithms are proposed, namely, ultra-scalable spectral clustering (U-SPEC) and ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative selection strategy and a fast approximation method for K-nearest representatives are proposed for the construction of a sparse affinity sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the transfer cut is then utilized to efficiently partition the graph and obtain the clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated into an ensemble clustering framework to enhance the robustness of U-SPEC while maintaining high efficiency. Based on the ensemble generation via multiple U-SEPC's, a new bipartite graph is constructed between objects and base clusters and then efficiently partitioned to achieve the consensus clustering result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time and space complexity, and are capable of robustly and efficiently partitioning ten-million-level nonlinearly-separable datasets on a PC with 64GB memory. Experiments on various large-scale datasets have demonstrated the scalability and robustness of our algorithms. The MATLAB code and experimental data are available at https://www.researchgate.net/publication/330760669. △ Less

Submitted 5 March, 2019; v1 submitted 3 March, 2019; originally announced March 2019.

Comments: To appear in IEEE Transactions on Knowledge and Data Engineering, 2019

arXiv:1902.09757 [pdf, other]

Interaction-aware Factorization Machines for Recommender Systems

Authors: Fuxing Hong, Dongbo Huang, Ge Chen

Abstract: Factorization Machine (FM) is a widely used supervised learning approach by effectively modeling of feature interactions. Despite the successful application of FM and its many deep learning variants, treating every feature interaction fairly may degrade the performance. For example, the interactions of a useless feature may introduce noises; the importance of a feature may also differ when interac… ▽ More Factorization Machine (FM) is a widely used supervised learning approach by effectively modeling of feature interactions. Despite the successful application of FM and its many deep learning variants, treating every feature interaction fairly may degrade the performance. For example, the interactions of a useless feature may introduce noises; the importance of a feature may also differ when interacting with different features. In this work, we propose a novel model named \emph{Interaction-aware Factorization Machine} (IFM) by introducing Interaction-Aware Mechanism (IAM), which comprises the \emph{feature aspect} and the \emph{field aspect}, to learn flexible interactions on two levels. The feature aspect learns feature interaction importance via an attention network while the field aspect learns the feature interaction effect as a parametric similarity of the feature interaction vector and the corresponding field interaction prototype. IFM introduces more structured control and learns feature interaction importance in a stratified manner, which allows for more leverage in tweaking the interactions on both feature-wise and field-wise levels. Besides, we give a more generalized architecture and propose Interaction-aware Neural Network (INN) and DeepIFM to capture higher-order interactions. To further improve both the performance and efficiency of IFM, a sampling scheme is developed to select interactions based on the field aspect importance. The experimental results from two well-known datasets show the superiority of the proposed models over the state-of-the-art methods. △ Less

Submitted 26 February, 2019; originally announced February 2019.

arXiv:1811.02172 [pdf, other]

Neural Phrase-to-Phrase Machine Translation

Authors: Jiangtao Feng, Lingpeng Kong, Po-Sen Huang, Chong Wang, Da Huang, Jiayuan Mao, Kan Qiao, Dengyong Zhou

Abstract: In this paper, we propose Neural Phrase-to-Phrase Machine Translation (NP$^2$MT). Our model uses a phrase attention mechanism to discover relevant input (source) segments that are used by a decoder to generate output (target) phrases. We also design an efficient dynamic programming algorithm to decode segments that allows the model to be trained faster than the existing neural phrase-based machine… ▽ More In this paper, we propose Neural Phrase-to-Phrase Machine Translation (NP$^2$MT). Our model uses a phrase attention mechanism to discover relevant input (source) segments that are used by a decoder to generate output (target) phrases. We also design an efficient dynamic programming algorithm to decode segments that allows the model to be trained faster than the existing neural phrase-based machine translation method by Huang et al. (2018). Furthermore, our method can naturally integrate with external phrase dictionaries during decoding. Empirical experiments show that our method achieves comparable performance with the state-of-the art methods on benchmark datasets. However, when the training and testing data are from different distributions or domains, our method performs better. △ Less

Submitted 6 November, 2018; originally announced November 2018.

arXiv:1810.12544 [pdf, other]

doi 10.1109/TSMC.2018.2876202

Enhanced Ensemble Clustering via Fast Propagation of Cluster-wise Similarities

Authors: Dong Huang, Chang-Dong Wang, Hongxing Peng, Jianhuang Lai, Chee-Keong Kwoh

Abstract: Ensemble clustering has been a popular research topic in data mining and machine learning. Despite its significant progress in recent years, there are still two challenging issues in the current ensemble clustering research. First, most of the existing algorithms tend to investigate the ensemble information at the object-level, yet often lack the ability to explore the rich information at higher l… ▽ More Ensemble clustering has been a popular research topic in data mining and machine learning. Despite its significant progress in recent years, there are still two challenging issues in the current ensemble clustering research. First, most of the existing algorithms tend to investigate the ensemble information at the object-level, yet often lack the ability to explore the rich information at higher levels of granularity. Second, they mostly focus on the direct connections (e.g., direct intersection or pair-wise co-occurrence) in the multiple base clusterings, but generally neglect the multi-scale indirect relationship hidden in them. To address these two issues, this paper presents a novel ensemble clustering approach based on fast propagation of cluster-wise similarities via random walks. We first construct a cluster similarity graph with the base clusters treated as graph nodes and the cluster-wise Jaccard coefficient exploited to compute the initial edge weights. Upon the constructed graph, a transition probability matrix is defined, based on which the random walk process is conducted to propagate the graph structural information. Specifically, by investigating the propagating trajectories starting from different nodes, a new cluster-wise similarity matrix can be derived by considering the trajectory relationship. Then, the newly obtained cluster-wise similarity matrix is mapped from the cluster-level to the object-level to achieve an enhanced co-association (ECA) matrix, which is able to simultaneously capture the object-wise co-occurrence relationship as well as the multi-scale cluster-wise relationship in ensembles. Finally, two novel consensus functions are proposed to obtain the consensus clustering result. Extensive experiments on a variety of real-world datasets have demonstrated the effectiveness and efficiency of our approach. △ Less

Submitted 30 October, 2018; originally announced October 2018.

Comments: To appear in IEEE Transactions on Systems, Man, and Cybernetics: Systems. The MATLAB source code of this work is available at: http://www.researchgate.net/publication/328581758

arXiv:1810.03064 [pdf, other]

CSI-Net: Unified Human Body Characterization and Pose Recognition

Authors: Fei Wang, **song Han, Shiyuan Zhang, Xu He, Dong Huang

Abstract: We build CSI-Net, a unified Deep Neural Network~(DNN), to learn the representation of WiFi signals. Using CSI-Net, we jointly solved two body characterization problems: biometrics estimation (including body fat, muscle, water, and bone rates) and person recognition. We also demonstrated the application of CSI-Net on two distinctive pose recognition tasks: the hand sign recognition (fine-scaled act… ▽ More We build CSI-Net, a unified Deep Neural Network~(DNN), to learn the representation of WiFi signals. Using CSI-Net, we jointly solved two body characterization problems: biometrics estimation (including body fat, muscle, water, and bone rates) and person recognition. We also demonstrated the application of CSI-Net on two distinctive pose recognition tasks: the hand sign recognition (fine-scaled action of the hand) and falling detection (coarse-scaled motion of the body). △ Less

Submitted 22 January, 2019; v1 submitted 6 October, 2018; originally announced October 2018.

Comments: 14 pages, 6 figures and 10 tables

arXiv:1808.00079 [pdf, other]

Optimal Gradient Checkpoint Search for Arbitrary Computation Graphs

Authors: Jianwei Feng, Dong Huang

Abstract: Deep Neural Networks(DNNs) require huge GPU memory when training on modern image/video databases. Unfortunately, the GPU memory is physically finite, which limits the image resolutions and batch sizes that could be used in training for better DNN performance. Unlike solutions that require physically upgrade GPUs, the Gradient CheckPointing(GCP) training trades computation for more memory beyond ex… ▽ More Deep Neural Networks(DNNs) require huge GPU memory when training on modern image/video databases. Unfortunately, the GPU memory is physically finite, which limits the image resolutions and batch sizes that could be used in training for better DNN performance. Unlike solutions that require physically upgrade GPUs, the Gradient CheckPointing(GCP) training trades computation for more memory beyond existing GPU hardware. GCP only stores a subset of intermediate tensors, called Gradient Checkpoints (GCs), during forward. Then during backward, extra local forwards are conducted to compute the missing tensors. The total training memory cost becomes the sum of (1) the memory cost of the gradient checkpoints and (2) the maximum memory cost of local forwards. To achieve maximal memory cut-offs, one needs optimal algorithms to select GCs. Existing GCP approaches rely on either manual input of GCs or heuristics-based GC search on Linear Computation Graphs (LCGs), and cannot apply to Arbitrary Computation Graphs(ACGs). In this paper, we present theories and optimal algorithms on GC selection that, for the first time, are applicable to ACGs and achieve the maximal memory cut-offs. Extensive experiments show that our approach not only outperforms existing approaches (only applicable on LCGs), and is applicable to a vast family of LCG and ACG networks, such as Alexnet, VGG, ResNet, Densenet, Inception Net and highly complicated DNNs by Network Architecture Search. Our work enables GCP training on ACGs, and cuts off up-to 80% of training memory with a moderate time overhead (~30%-50%). Codes are available △ Less

Submitted 18 March, 2021; v1 submitted 31 July, 2018; originally announced August 2018.

arXiv:1807.04369 [pdf, other]

Differentially-Private "Draw and Discard" Machine Learning

Authors: Vasyl Pihur, Aleksandra Korolova, Frederick Liu, Subhash Sankuratripati, Moti Yung, Dachuan Huang, Ruogu Zeng

Abstract: In this work, we propose a novel framework for privacy-preserving client-distributed machine learning. It is motivated by the desire to achieve differential privacy guarantees in the local model of privacy in a way that satisfies all systems constraints using asynchronous client-server communication and provides attractive model learning properties. We call it "Draw and Discard" because it relies… ▽ More In this work, we propose a novel framework for privacy-preserving client-distributed machine learning. It is motivated by the desire to achieve differential privacy guarantees in the local model of privacy in a way that satisfies all systems constraints using asynchronous client-server communication and provides attractive model learning properties. We call it "Draw and Discard" because it relies on random sampling of models for load distribution (scalability), which also provides additional server-side privacy protections and improved model quality through averaging. We present the mechanics of client and server components of "Draw and Discard" and demonstrate how the framework can be applied to learning Generalized Linear models. We then analyze the privacy guarantees provided by our approach against several types of adversaries and showcase experimental results that provide evidence for the framework's viability in practical deployments. △ Less

Submitted 10 October, 2018; v1 submitted 11 July, 2018; originally announced July 2018.

arXiv:1806.04166 [pdf, other]

Learning to Decompose and Disentangle Representations for Video Prediction

Authors: Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li Fei-Fei, Juan Carlos Niebles

Abstract: Our goal is to predict future video frames given a sequence of input frames. Despite large amounts of video data, this remains a challenging task because of the high-dimensionality of video frames. We address this challenge by proposing the Decompositional Disentangled Predictive Auto-Encoder (DDPAE), a framework that combines structured probabilistic models and deep networks to automatically (i)… ▽ More Our goal is to predict future video frames given a sequence of input frames. Despite large amounts of video data, this remains a challenging task because of the high-dimensionality of video frames. We address this challenge by proposing the Decompositional Disentangled Predictive Auto-Encoder (DDPAE), a framework that combines structured probabilistic models and deep networks to automatically (i) decompose the high-dimensional video that we aim to predict into components, and (ii) disentangle each component to have low-dimensional temporal dynamics that are easier to predict. Crucially, with an appropriately specified generative model of video frames, our DDPAE is able to learn both the latent decomposition and disentanglement without explicit supervision. For the Moving MNIST dataset, we show that DDPAE is able to recover the underlying components (individual digits) and disentanglement (appearance and location) as we would intuitively do. We further demonstrate that DDPAE can be applied to the Bouncing Balls dataset involving complex interactions between multiple objects to predict the video frame directly from the pixels and recover physical states without explicit supervision. △ Less

Submitted 17 October, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

arXiv:1806.00608 [pdf, other]

GamePad: A Learning Environment for Theorem Proving

Authors: Daniel Huang, Prafulla Dhariwal, Dawn Song, Ilya Sutskever

Abstract: In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. Interactive theorem provers such as Coq enable users to construct machine-checkable proofs in a step-by-step manner. Hence, they provide an opportunity to explore theorem proving with human supervision. We use GamePad to synthesi… ▽ More In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. Interactive theorem provers such as Coq enable users to construct machine-checkable proofs in a step-by-step manner. Hence, they provide an opportunity to explore theorem proving with human supervision. We use GamePad to synthesize proofs for a simple algebraic rewrite problem and train baseline models for a formalization of the Feit-Thompson theorem. We address position evaluation (i.e., predict the number of proof steps left) and tactic prediction (i.e., predict the next proof step) tasks, which arise naturally in tactic-based theorem proving. △ Less

Submitted 21 December, 2018; v1 submitted 2 June, 2018; originally announced June 2018.

arXiv:1609.06789 [pdf, ps, other]

Krigings Over Space and Time Based on Latent Low-Dimensional Structures

Authors: Da Huang, Qiwei Yao, Rongmao Zhang

Abstract: We propose a new approach to represent nonparametrically the linear dependence structure of a spatio-temporal process in terms of latent common factors. Though it is formally similar to the existing reduced rank approximation methods (Section 7.1.3 of Cressie and Wikle, 2011), the fundamental difference is that the low-dimensional structure is completely unknown in our setting, which is learned fr… ▽ More We propose a new approach to represent nonparametrically the linear dependence structure of a spatio-temporal process in terms of latent common factors. Though it is formally similar to the existing reduced rank approximation methods (Section 7.1.3 of Cressie and Wikle, 2011), the fundamental difference is that the low-dimensional structure is completely unknown in our setting, which is learned from the data collected irregularly over space but regularly over time. Furthermore a graph Laplacian is incorporated in the learning in order to take the advantage of the continuity over space, and a new aggregation method via randomly partitioning space is introduced to improve the efficiency. We do not impose any stationarity conditions over space either, as the learning is facilitated by the stationarity in time. Krigings over space and time are carried out based on the learned low-dimensional structure, which is scalable to the cases when the data are taken over a large number of locations and/or over a long time period. Asymptotic properties of the proposed methods are established. Illustration with both simulated and real data sets is also reported. △ Less

Submitted 18 March, 2018; v1 submitted 21 September, 2016; originally announced September 2016.

Comments: 35 pages, 2 figures

Showing 1–50 of 56 results for author: Huang, D