Search | arXiv e-print repository

Understanding Stochastic Natural Gradient Variational Inference

Abstract: Stochastic natural gradient variational inference (NGVI) is a popular posterior inference method with applications in various probabilistic models. Despite its wide usage, little is known about the non-asymptotic convergence rate in the \emph{stochastic} setting. We aim to lessen this gap and provide a better understanding. For conjugate likelihoods, we prove the first $\mathcal{O}(\frac{1}{T})$ n… ▽ More Stochastic natural gradient variational inference (NGVI) is a popular posterior inference method with applications in various probabilistic models. Despite its wide usage, little is known about the non-asymptotic convergence rate in the \emph{stochastic} setting. We aim to lessen this gap and provide a better understanding. For conjugate likelihoods, we prove the first $\mathcal{O}(\frac{1}{T})$ non-asymptotic convergence rate of stochastic NGVI. The complexity is no worse than stochastic gradient descent (\aka black-box variational inference) and the rate likely has better constant dependency that leads to faster convergence in practice. For non-conjugate likelihoods, we show that stochastic NGVI with the canonical parameterization implicitly optimizes a non-convex objective. Thus, a global convergence rate of $\mathcal{O}(\frac{1}{T})$ is unlikely without some significant new understanding of optimizing the ELBO using natural gradients. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2405.08276 [pdf, other]

Scalable Subsampling Inference for Deep Neural Networks

Authors: Ke** Wu, Dimitris N. Politis

Abstract: Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest r… ▽ More Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest results on the approximation ability of DNN. More importantly, however, a non-random subsampling technique--scalable subsampling--is applied to construct a `subagged' DNN estimator. Under regularity conditions, it is shown that the subagged DNN estimator is computationally efficient without sacrificing accuracy for either estimation or prediction tasks. Beyond point estimation/prediction, we propose different approaches to build confidence and prediction intervals based on the subagged DNN estimator. In addition to being asymptotically valid, the proposed confidence/prediction intervals appear to work well in finite samples. All in all, the scalable subsampling DNN estimator offers the complete package in terms of statistical inference, i.e., (a) computational efficiency; (b) point estimation/prediction accuracy; and (c) allowing for the construction of practically useful confidence and prediction intervals. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2402.08683 [pdf]

Order picking efficiency: A scattered storage and clustered allocation strategy in automated drug dispensing systems

Authors: Mengge Yuan, Ning Zhao, Kan Wu, Lulu Cheng

Abstract: In the smart hospital, optimizing prescription order fulfilment processes in outpatient pharmacies is crucial. A promising device, automated drug dispensing systems (ADDSs), has emerged to streamline these processes. These systems involve human order pickers who are assisted by ADDSs. The ADDS's robotic arm transports bins from storage locations to the input/output (I/O) points, while the pharmaci… ▽ More In the smart hospital, optimizing prescription order fulfilment processes in outpatient pharmacies is crucial. A promising device, automated drug dispensing systems (ADDSs), has emerged to streamline these processes. These systems involve human order pickers who are assisted by ADDSs. The ADDS's robotic arm transports bins from storage locations to the input/output (I/O) points, while the pharmacist sorts the requested drugs from the bins at the I/O points. This paper focuses on coordinating the ADDS and the pharmacists to optimize the order-picking strategy. Another critical aspect of order-picking systems is the storage location assignment problem (SLAP), which determines the allocation of drugs to storage locations. In this study, we consider the ADDS as a smart warehouse and propose a two-stage scattered storage and clustered allocation (SSCA) strategy to optimize the SLAP for ADDSs. The first stage primarily adopts a scattered storage approach, and we develop a mathematical programming model to group drugs accordingly. In the second stage, we introduce a sequential alternating (SA) heuristic algorithm that takes into account the drug demand frequency and the correlation between drugs to cluster and locate them effectively. To evaluate the proposed SSCA strategy, we develop a double objective integer programming model for the order-picking problem in ADDSs to minimize the number of machines visited in prescription orders while maintaining the shortest average picking time of orders. The numerical results demonstrate that the proposed strategy can optimize the SLAP in ADDSs and improve significantly the order-picking efficiency of ADDSs in a human-robot cooperation environment. △ Less

Submitted 18 December, 2023; originally announced February 2024.

arXiv:2311.00294 [pdf, ps, other]

Multi-step ahead prediction intervals for non-parametric autoregressions via bootstrap: consistency, debiasing and pertinence

Authors: Dimitris N. Politis, Ke** Wu

Abstract: To address the difficult problem of multi-step ahead prediction of non-parametric autoregressions, we consider a forward bootstrap approach. Employing a local constant estimator, we can analyze a general type of non-parametric time series model, and show that the proposed point predictions are consistent with the true optimal predictor. We construct a quantile prediction interval that is asymptoti… ▽ More To address the difficult problem of multi-step ahead prediction of non-parametric autoregressions, we consider a forward bootstrap approach. Employing a local constant estimator, we can analyze a general type of non-parametric time series model, and show that the proposed point predictions are consistent with the true optimal predictor. We construct a quantile prediction interval that is asymptotically valid. Moreover, using a debiasing technique, we can asymptotically approximate the distribution of multi-step ahead non-parametric estimation by bootstrap. As a result, we can build bootstrap prediction intervals that are pertinent, i.e., can capture the model estimation variability, thus improving upon the standard quantile prediction intervals. Simulation studies are given to illustrate the performance of our point predictions and pertinent prediction intervals for finite samples. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.17137 [pdf, other]

Large-Scale Gaussian Processes via Alternating Projection

Authors: Kaiwen Wu, Jonathan Wenger, Haydn Jones, Geoff Pleiss, Jacob R. Gardner

Abstract: Training and inference in Gaussian processes (GPs) require solving linear systems with $n\times n$ kernel matrices. To address the prohibitive $\mathcal{O}(n^3)$ time complexity, recent work has employed fast iterative methods, like conjugate gradients (CG). However, as datasets increase in magnitude, the kernel matrices become increasingly ill-conditioned and still require $\mathcal{O}(n^2)$ spac… ▽ More Training and inference in Gaussian processes (GPs) require solving linear systems with $n\times n$ kernel matrices. To address the prohibitive $\mathcal{O}(n^3)$ time complexity, recent work has employed fast iterative methods, like conjugate gradients (CG). However, as datasets increase in magnitude, the kernel matrices become increasingly ill-conditioned and still require $\mathcal{O}(n^2)$ space without partitioning. Thus, while CG increases the size of datasets GPs can be trained on, modern datasets reach scales beyond its applicability. In this work, we propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling mini-batching. Our algorithm, based on alternating projection, has $\mathcal{O}(n)$ per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets. Theoretically, we prove the method enjoys linear convergence. Empirically, we demonstrate its fast convergence in practice and robustness to ill-conditioning. On large-scale benchmark datasets with up to four million data points, our approach accelerates GP training and inference by speed-up factors up to $27\times$ and $72 \times$, respectively, compared to CG. △ Less

Submitted 8 March, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: AISTATS 2024

arXiv:2310.00902 [pdf, other]

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models

Authors: Yongchan Kwon, Eric Wu, Kevin Wu, James Zou

Abstract: Quantifying the impact of training data points is crucial for understanding the outputs of machine learning models and for improving the transparency of the AI pipeline. The influence function is a principled and popular data attribution method, but its computational cost often makes it challenging to use. This issue becomes more pronounced in the setting of large language models and text-to-image… ▽ More Quantifying the impact of training data points is crucial for understanding the outputs of machine learning models and for improving the transparency of the AI pipeline. The influence function is a principled and popular data attribution method, but its computational cost often makes it challenging to use. This issue becomes more pronounced in the setting of large language models and text-to-image models. In this work, we propose DataInf, an efficient influence approximation method that is practical for large-scale generative AI models. Leveraging an easy-to-compute closed-form expression, DataInf outperforms existing influence computation algorithms in terms of computational and memory efficiency. Our theoretical analysis shows that DataInf is particularly well-suited for parameter-efficient fine-tuning techniques such as LoRA. Through systematic empirical evaluations, we show that DataInf accurately approximates influence scores and is orders of magnitude faster than existing methods. In applications to RoBERTa-large, Llama-2-13B-chat, and stable-diffusion-v1.5 models, DataInf effectively identifies the most influential fine-tuning examples better than other approximate influence scores. Moreover, it can help to identify which data points are mislabeled. △ Less

Submitted 13 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: ICLR 2024

arXiv:2309.10301 [pdf, other]

Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms

Authors: Keru Wu, Yuansi Chen, Wooseok Ha, Bin Yu

Abstract: Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. While many DA algorithms have demonstrated considerable empirical success, blindly applying these algorithms can often lead to worse performance on new datasets. To address this, it is crucial to clarify… ▽ More Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. While many DA algorithms have demonstrated considerable empirical success, blindly applying these algorithms can often lead to worse performance on new datasets. To address this, it is crucial to clarify the assumptions under which a DA algorithm has good target performance. In this work, we focus on the assumption of the presence of conditionally invariant components (CICs), which are relevant for prediction and remain conditionally invariant across the source and target data. We demonstrate that CICs, which can be estimated through conditional invariant penalty (CIP), play three prominent roles in providing target risk guarantees in DA. First, we propose a new algorithm based on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target risk guarantees beyond simple settings such as covariate shift and label shift. Second, we show that CICs help identify large discrepancies between source and target risks of other DA algorithms. Finally, we demonstrate that incorporating CICs into the domain invariant projection (DIP) algorithm can address its failure scenario caused by label-flip** features. We support our new algorithms and theoretical findings via numerical experiments on synthetic data, MNIST, CelebA, and Camelyon17 datasets. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2306.15209 [pdf, other]

doi 10.1109/JBHI.2024.3371097

Dynamic Reconfiguration of Brain Functional Network in Stroke

Authors: Kaichao Wu, Beth Jelfs, Katrina Neville, Wenzhen He, Qiang Fang

Abstract: The brain continually reorganizes its functional network to adapt to post-stroke functional impairments. Previous studies using static modularity analysis have presented global-level behavior patterns of this network reorganization. However, it is far from understood how the brain reconfigures its functional network dynamically following a stroke. This study collected resting-state functional MRI… ▽ More The brain continually reorganizes its functional network to adapt to post-stroke functional impairments. Previous studies using static modularity analysis have presented global-level behavior patterns of this network reorganization. However, it is far from understood how the brain reconfigures its functional network dynamically following a stroke. This study collected resting-state functional MRI data from 15 stroke patients, with mild (n = 6) and severe (n = 9) two subgroups based on their clinical symptoms. Additionally, 15 age-matched healthy subjects were considered as controls. By applying a multilayer network method, a dynamic modular structure was recognized based on a time-resolved function network. Then dynamic network measurements (recruitment, integration, and flexibility) were calculated to characterize the dynamic reconfiguration of post-stroke brain functional networks, hence, to reveal the neural functional rebuilding process. It was found from this investigation that severe patients tended to have reduced recruitment and increased between-network integration, while mild patients exhibited low network flexibility and less network integration. It is also noted that this severity-dependent alteration in network interaction was not able to be revealed by previous studies using static methods. Clinically, the obtained knowledge of the diverse patterns of dynamic adjustment in brain functional networks observed from the brain signal could help understand the underlying mechanism of the motor, speech, and cognitive functional impairments caused by stroke attacks. The proposed method not only could be used to evaluate patients' current brain status but also has the potential to provide insights into prognosis analysis and prediction. △ Less

Submitted 22 March, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Has been accepted by IEEE Journal of Biomedical and Health Informatics, doi:10.1109/JBHI.2024.3371097

Journal ref: IEEE Journal of Biomedical and Health Informatics 2024

arXiv:2306.05398 [pdf, other]

Bayesian model calibration for diblock copolymer thin film self-assembly using power spectrum of microscopy data and machine learning surrogate

Authors: Lianghao Cao, Keyi Wu, J. Tinsley Oden, Peng Chen, Omar Ghattas

Abstract: Identifying parameters of computational models from experimental data, or model calibration, is fundamental for assessing and improving the predictability and reliability of computer simulations. In this work, we propose a method for Bayesian calibration of models that predict morphological patterns of diblock copolymer (Di-BCP) thin film self-assembly while accounting for various sources of uncer… ▽ More Identifying parameters of computational models from experimental data, or model calibration, is fundamental for assessing and improving the predictability and reliability of computer simulations. In this work, we propose a method for Bayesian calibration of models that predict morphological patterns of diblock copolymer (Di-BCP) thin film self-assembly while accounting for various sources of uncertainties in pattern formation and data acquisition. This method extracts the azimuthally-averaged power spectrum (AAPS) of the top-down microscopy characterization of Di-BCP thin film patterns as summary statistics for Bayesian inference of model parameters via the pseudo-marginal method. We derive the analytical and approximate form of a conditional likelihood for the AAPS of image data. We demonstrate that AAPS-based image data reduction retains the mutual information, particularly on important length scales, between image data and model parameters while being relatively agnostic to the aleatoric uncertainties associated with the random long-range disorder of Di-BCP patterns. Additionally, we propose a phase-informed prior distribution for Bayesian model calibration. Furthermore, reducing image data to AAPS enables us to efficiently build surrogate models to accelerate the proposed Bayesian model calibration procedure. We present the formulation and training of two multi-layer perceptrons for approximating the parameter-to-spectrum map, which enables fast integrated likelihood evaluations. We validate the proposed Bayesian model calibration method through numerical examples, for which the neural network surrogate delivers a fivefold reduction of the number of model simulations performed for a single calibration task. △ Less

Submitted 3 August, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: Minor changes from the original submission, including a change in the title

arXiv:2306.04126 [pdf, ps, other]

Bootstrap Prediction Inference of Non-linear Autoregressive Models

Authors: Ke** Wu, Dimitris N. Politis

Abstract: The non-linear autoregressive (NLAR) model plays an important role in modeling and predicting time series. One-step ahead prediction is straightforward using the NLAR model, but the multi-step ahead prediction is cumbersome. For instance, iterating the one-step ahead predictor is a convenient strategy for linear autoregressive (LAR) models, but it is suboptimal under NLAR. In this paper, we first… ▽ More The non-linear autoregressive (NLAR) model plays an important role in modeling and predicting time series. One-step ahead prediction is straightforward using the NLAR model, but the multi-step ahead prediction is cumbersome. For instance, iterating the one-step ahead predictor is a convenient strategy for linear autoregressive (LAR) models, but it is suboptimal under NLAR. In this paper, we first propose a simulation and/or bootstrap algorithm to construct optimal point predictors under an $L_1$ or $L_2$ loss criterion. In addition, we construct bootstrap prediction intervals in the multi-step ahead prediction problem; in particular, we develop an asymptotically valid quantile prediction interval as well as a pertinent prediction interval for future values. In order to correct the undercoverage of prediction intervals with finite samples, we further employ predictive -- as opposed to fitted -- residuals in the bootstrap process. Simulation studies are also given to substantiate the finite sample performance of our methods. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2305.15572 [pdf, other]

The Behavior and Convergence of Local Bayesian Optimization

Authors: Kaiwen Wu, Kyurae Kim, Roman Garnett, Jacob R. Gardner

Abstract: A recent development in Bayesian optimization is the use of local optimization strategies, which can deliver strong empirical performance on high-dimensional problems compared to traditional global strategies. The "folk wisdom" in the literature is that the focus on local optimization sidesteps the curse of dimensionality; however, little is known concretely about the expected behavior or converge… ▽ More A recent development in Bayesian optimization is the use of local optimization strategies, which can deliver strong empirical performance on high-dimensional problems compared to traditional global strategies. The "folk wisdom" in the literature is that the focus on local optimization sidesteps the curse of dimensionality; however, little is known concretely about the expected behavior or convergence of Bayesian local optimization routines. We first study the behavior of the local approach, and find that the statistics of individual local solutions of Gaussian process sample paths are surprisingly good compared to what we would expect to recover from global methods. We then present the first rigorous analysis of such a Bayesian local optimization algorithm recently proposed by Müller et al. (2021), and derive convergence rates in both the noisy and noiseless settings. △ Less

Submitted 8 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 27 pages; NeurIPS 2023

arXiv:2305.15349 [pdf, other]

On the Convergence of Black-Box Variational Inference

Authors: Kyurae Kim, Jisu Oh, Kaiwen Wu, Yi-An Ma, Jacob R. Gardner

Abstract: We provide the first convergence guarantee for full black-box variational inference (BBVI), also known as Monte Carlo variational inference. While preliminary investigations worked on simplified versions of BBVI (e.g., bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications. Our results hold for log-smooth posterior dens… ▽ More We provide the first convergence guarantee for full black-box variational inference (BBVI), also known as Monte Carlo variational inference. While preliminary investigations worked on simplified versions of BBVI (e.g., bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications. Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family. Also, our analysis reveals that certain algorithm design choices commonly employed in practice, particularly, nonlinear parameterizations of the scale of the variational approximation, can result in suboptimal convergence rates. Fortunately, running BBVI with proximal stochastic gradient descent fixes these limitations, and thus achieves the strongest known convergence rate guarantees. We evaluate this theoretical insight by comparing proximal SGD against other standard implementations of BBVI on large-scale Bayesian inference problems. △ Less

Submitted 10 January, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted to NeurIPS'23; previous title: "Black-Box Variational Inference Converges"

arXiv:2303.10472 [pdf, ps, other]

Practical and Matching Gradient Variance Bounds for Black-Box Variational Bayesian Inference

Authors: Kyurae Kim, Kaiwen Wu, Jisu Oh, Jacob R. Gardner

Abstract: Understanding the gradient variance of black-box variational inference (BBVI) is a crucial step for establishing its convergence and develo** algorithmic improvements. However, existing studies have yet to show that the gradient variance of BBVI satisfies the conditions used to study the convergence of stochastic gradient descent (SGD), the workhorse of BBVI. In this work, we show that BBVI sati… ▽ More Understanding the gradient variance of black-box variational inference (BBVI) is a crucial step for establishing its convergence and develo** algorithmic improvements. However, existing studies have yet to show that the gradient variance of BBVI satisfies the conditions used to study the convergence of stochastic gradient descent (SGD), the workhorse of BBVI. In this work, we show that BBVI satisfies a matching bound corresponding to the $ABC$ condition used in the SGD literature when applied to smooth and quadratically-growing log-likelihoods. Our results generalize to nonlinear covariance parameterizations widely used in the practice of BBVI. Furthermore, we show that the variance of the mean-field parameterization has provably superior dimensional dependence. △ Less

Submitted 3 June, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

Comments: Accepted to ICML'23 for live oral presentation

arXiv:2302.03358 [pdf, other]

Deep-OSG: Deep Learning of Operators in Semigroup

Authors: Junfeng Chen, Kailiang Wu

Abstract: This paper proposes a novel deep learning approach for learning operators in semigroup, with applications to modeling unknown autonomous dynamical systems using time series data collected at varied time lags. It is a sequel to the previous flow map learning (FML) works [T. Qin, K. Wu, and D. Xiu, J. Comput. Phys., 395:620--635, 2019], [K. Wu and D. Xiu, J. Comput. Phys., 408:109307, 2020], and [Z.… ▽ More This paper proposes a novel deep learning approach for learning operators in semigroup, with applications to modeling unknown autonomous dynamical systems using time series data collected at varied time lags. It is a sequel to the previous flow map learning (FML) works [T. Qin, K. Wu, and D. Xiu, J. Comput. Phys., 395:620--635, 2019], [K. Wu and D. Xiu, J. Comput. Phys., 408:109307, 2020], and [Z. Chen, V. Churchill, K. Wu, and D. Xiu, J. Comput. Phys., 449:110782, 2022], which focused on learning single evolution operator with a fixed time step. This paper aims to learn a family of evolution operators with variable time steps, which constitute a semigroup for an autonomous system. The semigroup property is very crucial and links the system's evolutionary behaviors across varying time scales, but it was not considered in the previous works. We propose for the first time a framework of embedding the semigroup property into the data-driven learning process, through a novel neural network architecture and new loss functions. The framework is very feasible, can be combined with any suitable neural networks, and is applicable to learning general autonomous ODEs and PDEs. We present the rigorous error estimates and variance analysis to understand the prediction accuracy and robustness of our approach, showing the remarkable advantages of semigroup awareness in our model. Moreover, our approach allows one to arbitrarily choose the time steps for prediction and ensures that the predicted results are well self-matched and consistent. Extensive numerical experiments demonstrate that embedding the semigroup property notably reduces the data dependency of deep learning models and greatly improves the accuracy, robustness, and stability for long-time prediction. △ Less

Submitted 12 September, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

arXiv:2205.09703 [pdf, other]

Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow

Authors: Jeeyung Kim, Mengtian **, Youkow Homma, Alex Sim, Wilko Kroeger, Kesheng Wu

Abstract: In modeling time series data, we often need to augment the existing data records to increase the modeling accuracy. In this work, we describe a number of techniques to extract dynamic information about the current state of a large scientific workflow, which could be generalized to other types of applications. The specific task to be modeled is the time needed for transferring a file from an experi… ▽ More In modeling time series data, we often need to augment the existing data records to increase the modeling accuracy. In this work, we describe a number of techniques to extract dynamic information about the current state of a large scientific workflow, which could be generalized to other types of applications. The specific task to be modeled is the time needed for transferring a file from an experimental facility to a data center. The key idea of our approach is to find recent past data transfer events that match the current event in some ways. Tests showed that we could identify recent events matching some recorded properties and reduce the prediction error by about 12% compared to the similar models with only static features. We additionally explored an application specific technique to extract information about the data production process, and was able to reduce the average prediction error by 44%. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2205.06622 [pdf]

What Makes You Hold on to That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions

Authors: Ling **, Alina Lazar, Caitlin Brown, Bingrong Sun, Venu Garikapati, Srinath Ravulaparthy, Qianmiao Chen, Alexander Sim, Kesheng Wu, Tin Ho, Thomas Wenzel, C. Anna Spurlock

Abstract: What makes you hold on that old car? While the vast majority of the household vehicles are still powered by conventional internal combustion engines, the progress of adopting emerging vehicle technologies will critically depend on how soon the existing vehicles are transacted out of the household fleet. Leveraging a nationally representative longitudinal data set, the Panel Study of Income Dynamic… ▽ More What makes you hold on that old car? While the vast majority of the household vehicles are still powered by conventional internal combustion engines, the progress of adopting emerging vehicle technologies will critically depend on how soon the existing vehicles are transacted out of the household fleet. Leveraging a nationally representative longitudinal data set, the Panel Study of Income Dynamics, this study examines how household decisions to dispose of or replace a given vehicle are: (1) influenced by the vehicle's attributes, (2) mediated by households' concurrent socio-demographic and economic attributes, and (3) triggered by key life cycle events. Coupled with a newly developed machine learning interpretation tool, TreeExplainer, we demonstrate an innovative use of machine learning models to augment traditional logit modeling to both generate behavioral insights and improve model performance. We find the two gradient-boosting-based methods, CatBoost and LightGBM, are the best performing machine learning models for this problem. The multinomial logistic model can achieve similar performance levels after its model specification is informed by TreeExplainer. Both machine learning and multinomial logit models suggest that while older vehicles are more likely to be disposed of or replaced than newer ones, such probability decreases as the vehicles serve the family longer. We find that married families, families with higher education levels, homeowners, and older families tend to keep their vehicles longer. Life events such as childbirth, residential relocation, and change of household composition and income are found to increase vehicle disposal and/or replacement. We provide additional insights on the timing of vehicle replacement or disposal, in particular, the presence of children and childbirth events are more strongly associated with vehicle replacement among younger parents. △ Less

Submitted 13 May, 2022; originally announced May 2022.

arXiv:2203.09611 [pdf, other]

doi 10.1080/13658816.2022.2053980

STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity

Authors: Yuhao Kang, Kunlin Wu, Song Gao, Ignavier Ng, **meng Rao, Shan Ye, Fan Zhang, Teng Fei

Abstract: Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-B… ▽ More Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc. △ Less

Submitted 30 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Journal ref: International Journal of Geographical Information Science, Year 2022

arXiv:2202.12636 [pdf, other]

Learning Multi-Task Gaussian Process Over Heterogeneous Input Domains

Authors: Haitao Liu, Kai Wu, Yew-Soon Ong, Chao Bian, Xiaomo Jiang, Xiaofang Wang

Abstract: Multi-task Gaussian process (MTGP) is a well-known non-parametric Bayesian model for learning correlated tasks effectively by transferring knowledge across tasks. But current MTGPs are usually limited to the multi-task scenario defined in the same input domain, leaving no space for tackling the heterogeneous case, i.e., the features of input domains vary over tasks. To this end, this paper present… ▽ More Multi-task Gaussian process (MTGP) is a well-known non-parametric Bayesian model for learning correlated tasks effectively by transferring knowledge across tasks. But current MTGPs are usually limited to the multi-task scenario defined in the same input domain, leaving no space for tackling the heterogeneous case, i.e., the features of input domains vary over tasks. To this end, this paper presents a novel heterogeneous stochastic variational linear model of coregionalization (HSVLMC) model for simultaneously learning the tasks with varied input domains. Particularly, we develop the stochastic variational framework with Bayesian calibration that (i) takes into account the effect of dimensionality reduction raised by domain map**s in order to achieve effective input alignment; and (ii) employs a residual modeling strategy to leverage the inductive bias brought by prior domain map**s for better model inference. Finally, the superiority of the proposed model against existing LMC models has been extensively verified on diverse heterogeneous multi-task cases and a practical multi-fidelity steam turbine exhaust problem. △ Less

Submitted 18 June, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2112.08601 [pdf, ps, other]

A New Model-free Prediction Method: GA-NoVaS

Authors: Ke** Wu, Sayar Karmakar

Abstract: Volatility forecasting plays an important role in the financial econometrics. Previous works in this regime are mainly based on applying various GARCH-type models. However, it is hard for people to choose a specific GARCH model which works for general cases and such traditional methods are unstable for dealing with high-volatile period or using small sample size. The newly proposed normalizing and… ▽ More Volatility forecasting plays an important role in the financial econometrics. Previous works in this regime are mainly based on applying various GARCH-type models. However, it is hard for people to choose a specific GARCH model which works for general cases and such traditional methods are unstable for dealing with high-volatile period or using small sample size. The newly proposed normalizing and variance stabilizing (NoVaS) method is a more robust and accurate prediction technique. This Model-free method is built by taking advantage of an inverse transformation which is based on the ARCH model. Inspired by the historic development of the ARCH to GARCH model, we propose a novel NoVaS-type method which exploits the GARCH model structure. By performing extensive data analysis, we find our model has better time-aggregated prediction performance than the current state-of-the-art NoVaS method on forecasting short and volatile data. The victory of our new method corroborates that and also opens up avenues where one can explore other NoVaS structures to improve on the existing ones or solve specific prediction problems. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2101.02273

arXiv:2109.13055 [pdf, other]

Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling

Authors: Keru Wu, Scott Schmidler, Yuansi Chen

Abstract: We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution. We establish its optimal minimax mixing time under a warm start. Our main contribution is two-fold. First, for a $d$-dimensional log-concave density with condition number $κ$, we show that MALA with a warm start mixes in $\tilde O(κ\sqrt{d})$ iteration… ▽ More We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution. We establish its optimal minimax mixing time under a warm start. Our main contribution is two-fold. First, for a $d$-dimensional log-concave density with condition number $κ$, we show that MALA with a warm start mixes in $\tilde O(κ\sqrt{d})$ iterations up to logarithmic factors. This improves upon the previous work on the dependency of either the condition number $κ$ or the dimension $d$. Our proof relies on comparing the leapfrog integrator with the continuous Hamiltonian dynamics, where we establish a new concentration bound for the acceptance rate. Second, we prove a spectral gap based mixing time lower bound for reversible MCMC algorithms on general state spaces. We apply this lower bound result to construct a hard distribution for which MALA requires at least $\tilde Ω(κ\sqrt{d})$ steps to mix. The lower bound for MALA matches our upper bound in terms of condition number and dimension. Finally, numerical experiments are included to validate our theoretical results. △ Less

Submitted 2 October, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

Comments: 63 pages, 2 figures

Journal ref: Journal of Machine Learning Research, Vol. 23, No. 270, pp. 1-63 (2022)

arXiv:2102.11027 [pdf, other]

Investigating Underlying Drivers of Variability in Residential Energy Usage Patterns with Daily Load Shape Clustering of Smart Meter Data

Authors: Ling **, C. Anna Spurlock, Sam Borgeson, Alina Lazar, Daniel Fredman, Annika Todd, Alexander Sim, Kesheng Wu

Abstract: Residential customers have traditionally not been treated as individual entities due to the high volatility in residential consumption patterns as well as a historic focus on aggregated loads from the utility and system feeder perspective. Large-scale deployment of smart meters has motivated increasing studies to explore disaggregated daily load patterns, which can reveal important heterogeneity a… ▽ More Residential customers have traditionally not been treated as individual entities due to the high volatility in residential consumption patterns as well as a historic focus on aggregated loads from the utility and system feeder perspective. Large-scale deployment of smart meters has motivated increasing studies to explore disaggregated daily load patterns, which can reveal important heterogeneity across different time scales, weather conditions, as well as within and across individual households. This paper aims to shed light on the mechanisms by which electricity consumption patterns exhibit variability and the different constraints that may affect demand-response (DR) flexibility. We systematically evaluate the relationship between daily time-of-use patterns and their variability to external and internal influencing factors, including time scales of interest, meteorological conditions, and household characteristics by application of an improved version of the adaptive K-means clustering method to profile "household-days" of a summer peaking utility. We find that for this summer-peaking utility, outdoor temperature is the most important external driver of the load shape variability relative to seasonality and day-of-week. The top three consumption patterns represent approximately 50% of usage on the highest temperature days. The variability in summer load shapes across customers can be explained by the responsiveness of the households to outside temperature. Our results suggest that depending on the influencing factors, not all the consumption variability can be readily translated to consumption flexibility. Such information needs to be further explored in segmenting customers for better program targeting and tailoring to meet the needs of the rapidly evolving electricity grid. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: 11 pages, 11 figures

arXiv:2101.02273 [pdf, ps, other]

Model-free time-aggregated predictions for econometric datasets

Authors: Ke** Wu, Sayar Karmakar

Abstract: This article explores the existing normalizing and variance-stabilizing (NoVaS) method on predicting squared log-returns of financial data. First, we explore the robustness of the existing NoVaS method for long-term time-aggregated predictions. Then we develop a more parsimonious variant of the existing method. With systematic justification and extensive data analysis, our new method shows better… ▽ More This article explores the existing normalizing and variance-stabilizing (NoVaS) method on predicting squared log-returns of financial data. First, we explore the robustness of the existing NoVaS method for long-term time-aggregated predictions. Then we develop a more parsimonious variant of the existing method. With systematic justification and extensive data analysis, our new method shows better performance than current NoVaS and standard GARCH(1,1) methods on both short- and long-term time-aggregated predictions. △ Less

Submitted 4 November, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

arXiv:2011.05625 [pdf, other]

CAN: Feature Co-Action for Click-Through Rate Prediction

Authors: Weijie Bian, Kailun Wu, Lejian Ren, Qi Pi, Yu**g Zhang, Can Xiao, Xiang-Rong Sheng, Yong-Nan Zhu, Zhangming Chan, Na Mou, Xinchen Luo, Shiming Xiang, Guorui Zhou, Xiaoqiang Zhu, Hongbo Deng

Abstract: Feature interaction has been recognized as an important problem in machine learning, which is also very essential for click-through rate (CTR) prediction tasks. In recent years, Deep Neural Networks (DNNs) can automatically learn implicit nonlinear interactions from original sparse features, and therefore have been widely used in industrial CTR prediction tasks. However, the implicit feature inter… ▽ More Feature interaction has been recognized as an important problem in machine learning, which is also very essential for click-through rate (CTR) prediction tasks. In recent years, Deep Neural Networks (DNNs) can automatically learn implicit nonlinear interactions from original sparse features, and therefore have been widely used in industrial CTR prediction tasks. However, the implicit feature interactions learned in DNNs cannot fully retain the complete representation capacity of the original and empirical feature interactions (e.g., cartesian product) without loss. For example, a simple attempt to learn the combination of feature A and feature B <A, B> as the explicit cartesian product representation of new features can outperform previous implicit feature interaction models including factorization machine (FM)-based models and their variations. In this paper, we propose a Co-Action Network (CAN) to approximate the explicit pairwise feature interactions without introducing too many additional parameters. More specifically, giving feature A and its associated feature B, their feature interaction is modeled by learning two sets of parameters: 1) the embedding of feature A, and 2) a Multi-Layer Perceptron (MLP) to represent feature B. The approximated feature interaction can be obtained by passing the embedding of feature A through the MLP network of feature B. We refer to such pairwise feature interaction as feature co-action, and such a Co-Action Network unit can provide a very powerful capacity to fitting complex feature interactions. Experimental results on public and industrial datasets show that CAN outperforms state-of-the-art CTR models and the cartesian product method. Moreover, CAN has been deployed in the display advertisement system in Alibaba, obtaining 12\% improvement on CTR and 8\% on Revenue Per Mille (RPM), which is a great improvement to the business. △ Less

Submitted 7 December, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

Comments: WSDM 2022

MSC Class: Machine Learning (stat.ML); Information Retrieval (cs.IR); Machine Learning (cs.LG) ACM Class: I.2.6

arXiv:2011.01058 [pdf, other]

doi 10.1016/j.cma.2021.114020

Bayesian inference of heterogeneous epidemic models: Application to COVID-19 spread accounting for long-term care facilities

Authors: Peng Chen, Keyi Wu, Omar Ghattas

Abstract: We propose a high dimensional Bayesian inference framework for learning heterogeneous dynamics of a COVID-19 model, with a specific application to the dynamics and severity of COVID-19 inside and outside long-term care (LTC) facilities. We develop a heterogeneous compartmental model that accounts for the heterogeneity of the time-varying spread and severity of COVID-19 inside and outside LTC facil… ▽ More We propose a high dimensional Bayesian inference framework for learning heterogeneous dynamics of a COVID-19 model, with a specific application to the dynamics and severity of COVID-19 inside and outside long-term care (LTC) facilities. We develop a heterogeneous compartmental model that accounts for the heterogeneity of the time-varying spread and severity of COVID-19 inside and outside LTC facilities, which is characterized by time-dependent stochastic processes and time-independent parameters in $\sim$1500 dimensions after discretization. To infer these parameters, we use reported data on the number of confirmed, hospitalized, and deceased cases with suitable post-processing in both a deterministic inversion approach with appropriate regularization as a first step, followed by Bayesian inversion with proper prior distributions. To address the curse of dimensionality and the ill-posedness of the high-dimensional inference problem, we propose use of a dimension-independent projected Stein variational gradient descent method, and demonstrate the intrinsic low-dimensionality of the inverse problem. We present inference results with quantified uncertainties for both New Jersey and Texas, which experienced different epidemic phases and patterns. Moreover, we also present forecasting and validation results based on the empirical posterior samples of our inference for the future trajectory of COVID-19. △ Less

Submitted 2 November, 2020; originally announced November 2020.

arXiv:2008.02883 [pdf, other]

Stronger and Faster Wasserstein Adversarial Attacks

Authors: Kaiwen Wu, Allen Houze Wang, Yaoliang Yu

Abstract: Deep models, while being extremely flexible and accurate, are surprisingly vulnerable to "small, imperceptible" perturbations known as adversarial attacks. While the majority of existing attacks focus on measuring perturbations under the $\ell_p$ metric, Wasserstein distance, which takes geometry in pixel space into account, has long been known to be a suitable metric for measuring image quality a… ▽ More Deep models, while being extremely flexible and accurate, are surprisingly vulnerable to "small, imperceptible" perturbations known as adversarial attacks. While the majority of existing attacks focus on measuring perturbations under the $\ell_p$ metric, Wasserstein distance, which takes geometry in pixel space into account, has long been known to be a suitable metric for measuring image quality and has recently risen as a compelling alternative to the $\ell_p$ metric in adversarial attacks. However, constructing an effective attack under the Wasserstein metric is computationally much more challenging and calls for better optimization algorithms. We address this gap in two ways: (a) we develop an exact yet efficient projection operator to enable a stronger projected gradient attack; (b) we show that the Frank-Wolfe method equipped with a suitable linear minimization oracle works extremely fast under Wasserstein constraints. Our algorithms not only converge faster but also generate much stronger attacks. For instance, we decrease the accuracy of a residual network on CIFAR-10 to $3.4\%$ within a Wasserstein perturbation ball of radius $0.005$, in contrast to $65.6\%$ using the previous Wasserstein attack based on an \emph{approximate} projection operator. Furthermore, employing our stronger attacks in adversarial training significantly improves the robustness of adversarially trained models. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Comments: 30 pages, accepted to ICML 2020

arXiv:2007.09762 [pdf, other]

A Theory of Multiple-Source Adaptation with Limited Target Labeled Data

Authors: Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh, Ke Wu

Abstract: We present a theoretical and algorithmic study of the multiple-source domain adaptation problem in the common scenario where the learner has access only to a limited amount of labeled target data, but where the learner has at disposal a large amount of labeled data from multiple source domains. We show that a new family of algorithms based on model selection ideas benefits from very favorable guar… ▽ More We present a theoretical and algorithmic study of the multiple-source domain adaptation problem in the common scenario where the learner has access only to a limited amount of labeled target data, but where the learner has at disposal a large amount of labeled data from multiple source domains. We show that a new family of algorithms based on model selection ideas benefits from very favorable guarantees in this scenario and discuss some theoretical obstacles affecting some alternative techniques. We also report the results of several experiments with our algorithms that demonstrate their practical effectiveness. △ Less

Submitted 29 October, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

Comments: 20 pages

arXiv:2006.14592 [pdf, other]

Newton-type Methods for Minimax Optimization

Authors: Guojun Zhang, Kaiwen Wu, Pascal Poupart, Yaoliang Yu

Abstract: Differential games, in particular two-player sequential zero-sum games (a.k.a. minimax optimization), have been an important modeling tool in applied science and received renewed interest in machine learning due to many recent applications, such as adversarial training, generative models and reinforcement learning. However, existing theory mostly focuses on convex-concave functions with few except… ▽ More Differential games, in particular two-player sequential zero-sum games (a.k.a. minimax optimization), have been an important modeling tool in applied science and received renewed interest in machine learning due to many recent applications, such as adversarial training, generative models and reinforcement learning. However, existing theory mostly focuses on convex-concave functions with few exceptions. In this work, we propose two novel Newton-type algorithms for nonconvex-nonconcave minimax optimization. We prove their local convergence at strict local minimax points, which are surrogates of global solutions. We argue that our Newton-type algorithms nicely complement existing ones in that (a) they converge faster to strict local minimax points; (b) they are much more effective when the problem is ill-conditioned; (c) their computational complexity remains similar. We verify the effectiveness of our Newton-type algorithms through experiments on training GANs which are intrinsically nonconvex and ill-conditioned. Our code is available at https://github.com/watml/min-max-2nd-order. △ Less

Submitted 18 February, 2023; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: code update

arXiv:2003.05681 [pdf]

doi 10.1007/s11071-020-05862-6

Generalized logistic growth modeling of the COVID-19 outbreak: comparing the dynamics in the 29 provinces in China and in the rest of the world

Authors: Ke Wu, Didier Darcet, Qian Wang, Didier Sornette

Abstract: Started in Wuhan, China, the COVID-19 has been spreading all over the world. We calibrate the logistic growth model, the generalized logistic growth model, the generalized Richards model and the generalized growth model to the reported number of infected cases for the whole of China, 29 provinces in China, and 33 countries and regions that have been or are undergoing major outbreaks. We dissect th… ▽ More Started in Wuhan, China, the COVID-19 has been spreading all over the world. We calibrate the logistic growth model, the generalized logistic growth model, the generalized Richards model and the generalized growth model to the reported number of infected cases for the whole of China, 29 provinces in China, and 33 countries and regions that have been or are undergoing major outbreaks. We dissect the development of the epidemics in China and the impact of the drastic control measures both at the aggregate level and within each province. We quantitatively document four phases of the outbreak in China with a detailed analysis on the heterogeneous situations across provinces. The extreme containment measures implemented by China were very effective with some instructive variations across provinces. Borrowing from the experience of China, we made scenario projections on the development of the outbreak in other countries. We identified that outbreaks in 14 countries (mostly in western Europe) have ended, while resurgences of cases have been identified in several among them. The modeling results clearly show longer after-peak trajectories in western countries, in contrast to most provinces in China where the after-peak trajectory is characterized by a much faster decay. We identified three groups of countries in different level of outbreak progress, and provide informative implications for the current global pandemic. △ Less

Submitted 22 September, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

Journal ref: Nonlinear Dynamics, 2020

arXiv:2003.03616 [pdf, other]

Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks

Authors: Lenore Cowen, Kapil Devkota, Xiaozhe Hu, James M. Murphy, Kaiyi Wu

Abstract: Data-dependent metrics are powerful tools for learning the underlying structure of high-dimensional data. This article develops and analyzes a data-dependent metric known as diffusion state distance (DSD), which compares points using a data-driven diffusion process. Unlike related diffusion methods, DSDs incorporate information across time scales, which allows for the intrinsic data structure to b… ▽ More Data-dependent metrics are powerful tools for learning the underlying structure of high-dimensional data. This article develops and analyzes a data-dependent metric known as diffusion state distance (DSD), which compares points using a data-driven diffusion process. Unlike related diffusion methods, DSDs incorporate information across time scales, which allows for the intrinsic data structure to be inferred in a parameter-free manner. This article develops a theory for DSD based on the multitemporal emergence of mesoscopic equilibria in the underlying diffusion process. New algorithms for denoising and dimension reduction with DSD are also proposed and analyzed. These approaches are based on a weighted spectral decomposition of the underlying diffusion process, and experiments on synthetic datasets and real biological networks illustrate the efficacy of the proposed algorithms in terms of both speed and accuracy. Throughout, comparisons with related methods are made, in order to illustrate the distinct advantages of DSD for datasets exhibiting multiscale structure. △ Less

Submitted 7 March, 2020; originally announced March 2020.

Comments: 28 pages

arXiv:2003.02387 [pdf, other]

doi 10.1007/s10915-020-01324-8

Methods to Recover Unknown Processes in Partial Differential Equations Using Data

Authors: Zhen Chen, Kailiang Wu, Dongbin Xiu

Abstract: We study the problem of identifying unknown processes embedded in time-dependent partial differential equation (PDE) using observational data, with an application to advection-diffusion type PDE. We first conduct theoretical analysis and derive conditions to ensure the solvability of the problem. We then present a set of numerical approaches, including Galerkin type algorithm and collocation type… ▽ More We study the problem of identifying unknown processes embedded in time-dependent partial differential equation (PDE) using observational data, with an application to advection-diffusion type PDE. We first conduct theoretical analysis and derive conditions to ensure the solvability of the problem. We then present a set of numerical approaches, including Galerkin type algorithm and collocation type algorithm. Analysis of the algorithms are presented, along with their implementation detail. The Galerkin algorithm is more suitable for practical situations, particularly those with noisy data, as it avoids using derivative/gradient data. Various numerical examples are then presented to demonstrate the performance and properties of the numerical methods. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: 21 pages, 11 figures

Journal ref: Journal of Scientific Computing 85, 23 (2020)

arXiv:2003.01436 [pdf, other]

Learning to Generate Time Series Conditioned Graphs with Generative Adversarial Nets

Authors: Shanchao Yang, **g Liu, Kai Wu, Mingming Li

Abstract: Deep learning based approaches have been utilized to model and generate graphs subjected to different distributions recently. However, they are typically unsupervised learning based and unconditioned generative models or simply conditioned on the graph-level contexts, which are not associated with rich semantic node-level contexts. Differently, in this paper, we are interested in a novel problem n… ▽ More Deep learning based approaches have been utilized to model and generate graphs subjected to different distributions recently. However, they are typically unsupervised learning based and unconditioned generative models or simply conditioned on the graph-level contexts, which are not associated with rich semantic node-level contexts. Differently, in this paper, we are interested in a novel problem named Time Series Conditioned Graph Generation: given an input multivariate time series, we aim to infer a target relation graph modeling the underlying interrelationships between time series with each node corresponding to each time series. For example, we can study the interrelationships between genes in a gene regulatory network of a certain disease conditioned on their gene expression data recorded as time series. To achieve this, we propose a novel Time Series conditioned Graph Generation-Generative Adversarial Networks (TSGG-GAN) to handle challenges of rich node-level context structures conditioning and measuring similarities directly between graphs and time series. Extensive experiments on synthetic and real-word gene regulatory networks datasets demonstrate the effectiveness and generalizability of the proposed TSGG-GAN. △ Less

Submitted 26 August, 2023; v1 submitted 3 March, 2020; originally announced March 2020.

arXiv:2002.04658 [pdf, other]

A Non-Intrusive Correction Algorithm for Classification Problems with Corrupted Data

Authors: Jun Hou, Tong Qin, Kailiang Wu, Dongbin Xiu

Abstract: A novel correction algorithm is proposed for multi-class classification problems with corrupted training data. The algorithm is non-intrusive, in the sense that it post-processes a trained classification model by adding a correction procedure to the model prediction. The correction procedure can be coupled with any approximators, such as logistic regression, neural networks of various architecture… ▽ More A novel correction algorithm is proposed for multi-class classification problems with corrupted training data. The algorithm is non-intrusive, in the sense that it post-processes a trained classification model by adding a correction procedure to the model prediction. The correction procedure can be coupled with any approximators, such as logistic regression, neural networks of various architectures, etc. When training dataset is sufficiently large, we prove that the corrected models deliver correct classification results as if there is no corruption in the training data. For datasets of finite size, the corrected models produce significantly better recovery results, compared to the models without the correction algorithm. All of the theoretical findings in the paper are verified by our numerical examples. △ Less

Submitted 11 February, 2020; originally announced February 2020.

arXiv:1910.06948 [pdf, ps, other]

doi 10.1016/j.jcp.2020.109307

Data-Driven Deep Learning of Partial Differential Equations in Modal Space

Authors: Kailiang Wu, Dongbin Xiu

Abstract: We present a framework for recovering/approximating unknown time-dependent partial differential equation (PDE) using its solution data. Instead of identifying the terms in the underlying PDE, we seek to approximate the evolution operator of the underlying PDE numerically. The evolution operator of the PDE, defined in infinite-dimensional space, maps the solution from a current time to a future tim… ▽ More We present a framework for recovering/approximating unknown time-dependent partial differential equation (PDE) using its solution data. Instead of identifying the terms in the underlying PDE, we seek to approximate the evolution operator of the underlying PDE numerically. The evolution operator of the PDE, defined in infinite-dimensional space, maps the solution from a current time to a future time and completely characterizes the solution evolution of the underlying unknown PDE. Our recovery strategy relies on approximation of the evolution operator in a properly defined modal space, i.e., generalized Fourier space, in order to reduce the problem to finite dimensions. The finite dimensional approximation is then accomplished by training a deep neural network structure, which is based on residual network (ResNet), using the given data. Error analysis is provided to illustrate the predictive accuracy of the proposed method. A set of examples of different types of PDEs, including inviscid Burgers' equation that develops discontinuity in its solution, are presented to demonstrate the effectiveness of the proposed method. △ Less

Submitted 18 October, 2019; v1 submitted 15 October, 2019; originally announced October 2019.

Comments: Minor notational changes

Journal ref: Journal of Computational Physics, 408: 109307, 2020

arXiv:1907.11780 [pdf, other]

Understanding Adversarial Robustness: The Trade-off between Minimum and Average Margin

Authors: Kaiwen Wu, Yaoliang Yu

Abstract: Deep models, while being extremely versatile and accurate, are vulnerable to adversarial attacks: slight perturbations that are imperceptible to humans can completely flip the prediction of deep models. Many attack and defense mechanisms have been proposed, although a satisfying solution still largely remains elusive. In this work, we give strong evidence that during training, deep models maximize… ▽ More Deep models, while being extremely versatile and accurate, are vulnerable to adversarial attacks: slight perturbations that are imperceptible to humans can completely flip the prediction of deep models. Many attack and defense mechanisms have been proposed, although a satisfying solution still largely remains elusive. In this work, we give strong evidence that during training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the \emph{average} margin hence hurting robustness. Our empirical results highlight an intrinsic trade-off between accuracy and robustness for current deep model training. To further address this issue, we propose a new regularizer to explicitly promote average margin, and we verify through extensive experiments that it does lead to better robustness. Our regularized objective remains Fisher-consistent, hence asymptotically can still recover the Bayes optimal classifier. △ Less

Submitted 26 July, 2019; originally announced July 2019.

arXiv:1906.10304 [pdf, other]

Res-embedding for Deep Learning Based Click-Through Rate Prediction Modeling

Authors: Guorui Zhou, Kailun Wu, Weijie Bian, Zhao Yang, Xiaoqiang Zhu, Kun Gai

Abstract: Recently, click-through rate (CTR) prediction models have evolved from shallow methods to deep neural networks. Most deep CTR models follow an Embedding\&MLP paradigm, that is, first map** discrete id features, e.g. user visited items, into low dimensional vectors with an embedding module, then learn a multi-layer perception (MLP) to fit the target. In this way, embedding module performs as the… ▽ More Recently, click-through rate (CTR) prediction models have evolved from shallow methods to deep neural networks. Most deep CTR models follow an Embedding\&MLP paradigm, that is, first map** discrete id features, e.g. user visited items, into low dimensional vectors with an embedding module, then learn a multi-layer perception (MLP) to fit the target. In this way, embedding module performs as the representative learning and plays a key role in the model performance. However, in many real-world applications, deep CTR model often suffers from poor generalization performance, which is mostly due to the learning of embedding parameters. In this paper, we model user behavior using an interest delay model, study carefully the embedding mechanism, and obtain two important results: (i) We theoretically prove that small aggregation radius of embedding vectors of items which belongs to a same user interest domain will result in good generalization performance of deep CTR model. (ii) Following our theoretical analysis, we design a new embedding structure named res-embedding. In res-embedding module, embedding vector of each item is the sum of two components: (i) a central embedding vector calculated from an item-based interest graph (ii) a residual embedding vector with its scale to be relatively small. Empirical evaluation on several public datasets demonstrates the effectiveness of the proposed res-embedding structure, which brings significant improvement on the model performance. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1905.10396 [pdf, other]

doi 10.1137/19M1264011

Structure-preserving Method for Reconstructing Unknown Hamiltonian Systems from Trajectory Data

Authors: Kailiang Wu, Tong Qin, Dongbin Xiu

Abstract: We present a numerical approach for approximating unknown Hamiltonian systems using observation data. A distinct feature of the proposed method is that it is structure-preserving, in the sense that it enforces conservation of the reconstructed Hamiltonian. This is achieved by directly approximating the underlying unknown Hamiltonian, rather than the right-hand-side of the governing equations. We p… ▽ More We present a numerical approach for approximating unknown Hamiltonian systems using observation data. A distinct feature of the proposed method is that it is structure-preserving, in the sense that it enforces conservation of the reconstructed Hamiltonian. This is achieved by directly approximating the underlying unknown Hamiltonian, rather than the right-hand-side of the governing equations. We present the technical details of the proposed algorithm and its error estimate in a special case, along with a practical de-noising procedure to cope with noisy data. A set of numerical examples are then presented to demonstrate the structure-preserving property and effectiveness of the algorithm. △ Less

Submitted 19 August, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

Comments: 27 pages, 19 figures

Journal ref: SIAM Journal on Scientific Computing 42 (6), A3704--A3729, 2020

arXiv:1905.06125 [pdf, other]

Distributional Reinforcement Learning for Efficient Exploration

Authors: Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yaoliang Yu

Abstract: In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distr… ▽ More In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain across 49 games in cumulative rewards over QR-DQN with a big win in Venture). We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves near-optimal safety rewards twice faster than QRDQN. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Journal ref: ICML, 2019

arXiv:1902.02627 [pdf, other]

Fast Transient Simulation of High-Speed Channels Using Recurrent Neural Network

Authors: Thong Nguyen, Tianjian Lu, Ken Wu, Jose Schutt-Aine

Abstract: Generating eye diagrams by using a circuit simulator can be very computationally intensive, especially in the presence of nonlinearities. It often involves multiple Newton-like iterations at every time step when a SPICE-like circuit simulator handles a nonlinear system in the transient regime. In this paper, we leverage machine learning methods, to be specific, the recurrent neural network (RNN),… ▽ More Generating eye diagrams by using a circuit simulator can be very computationally intensive, especially in the presence of nonlinearities. It often involves multiple Newton-like iterations at every time step when a SPICE-like circuit simulator handles a nonlinear system in the transient regime. In this paper, we leverage machine learning methods, to be specific, the recurrent neural network (RNN), to generate black-box macromodels and achieve significant reduction of computation time. Through the proposed approach, an RNN model is first trained and then validated on a relatively short sequence generated from a circuit simulator. Once the training completes, the RNN can be used to make predictions on the remaining sequence in order to generate an eye diagram. The training cost can also be amortized when the trained RNN starts making predictions. Besides, the proposed approach requires no complex circuit simulations nor substantial domain knowledge. We use two high-speed link examples to demonstrate that the proposed approach provides adequate accuracy while the computation time can be dramatically reduced. In the high-speed link example with a PAM4 driver, the eye diagram generated by RNN models shows good agreement with that obtained from a commercial circuit simulator. This paper also investigates the impacts of various RNN topologies, training schemes, and tunable parameters on both the accuracy and the generalization capability of an RNN model. It is found out that the long short-term memory (LSTM) network outperforms the vanilla RNN in terms of the accuracy in predicting transient waveforms. △ Less

Submitted 7 February, 2019; v1 submitted 25 January, 2019; originally announced February 2019.

arXiv:1812.09645 [pdf, other]

Mixed Membership Recurrent Neural Networks

Authors: Ghazal Fazelnia, Mark Ibrahim, Ceena Modarres, Kevin Wu, John Paisley

Abstract: Models for sequential data such as the recurrent neural network (RNN) often implicitly model a sequence as having a fixed time interval between observations and do not account for group-level effects when multiple sequences are observed. We propose a model for grouped sequential data based on the RNN that accounts for varying time intervals between observations in a sequence by learning a group-le… ▽ More Models for sequential data such as the recurrent neural network (RNN) often implicitly model a sequence as having a fixed time interval between observations and do not account for group-level effects when multiple sequences are observed. We propose a model for grouped sequential data based on the RNN that accounts for varying time intervals between observations in a sequence by learning a group-level base parameter to which each sequence can revert. Our approach is motivated by the mixed membership framework, and we show how it can be used for dynamic topic modeling in which the distribution on topics (not the topics themselves) are evolving in time. We demonstrate our approach on a dataset of 3.4 million online grocery shop** orders made by 206K customers. △ Less

Submitted 22 December, 2018; originally announced December 2018.

arXiv:1811.05537 [pdf, other]

doi 10.1016/j.jcp.2019.06.042

Data Driven Governing Equations Approximation Using Deep Neural Networks

Authors: Tong Qin, Kailiang Wu, Dongbin Xiu

Abstract: We present a numerical framework for approximating unknown governing equations using observation data and deep neural networks (DNN). In particular, we propose to use residual network (ResNet) as the basic building block for equation approximation. We demonstrate that the ResNet block can be considered as a one-step method that is exact in temporal integration. We then present two multi-step metho… ▽ More We present a numerical framework for approximating unknown governing equations using observation data and deep neural networks (DNN). In particular, we propose to use residual network (ResNet) as the basic building block for equation approximation. We demonstrate that the ResNet block can be considered as a one-step method that is exact in temporal integration. We then present two multi-step methods, recurrent ResNet (RT-ResNet) method and recursive ReNet (RS-ResNet) method. The RT-ResNet is a multi-step method on uniform time steps, whereas the RS-ResNet is an adaptive multi-step method using variable time steps. All three methods presented here are based on integral form of the underlying dynamical system. As a result, they do not require time derivative data for equation recovery and can cope with relatively coarsely distributed trajectory data. Several numerical examples are presented to demonstrate the performance of the methods. △ Less

Submitted 13 November, 2018; originally announced November 2018.

arXiv:1811.00620 [pdf, other]

Efficient Online Hyperparameter Optimization for Kernel Ridge Regression with Applications to Traffic Time Series Prediction

Authors: Hongyuan Zhan, Gabriel Gomes, Xiaoye S. Li, Kamesh Madduri, Kesheng Wu

Abstract: Computational efficiency is an important consideration for deploying machine learning models for time series prediction in an online setting. Machine learning algorithms adjust model parameters automatically based on the data, but often require users to set additional parameters, known as hyperparameters. Hyperparameters can significantly impact prediction accuracy. Traffic measurements, typically… ▽ More Computational efficiency is an important consideration for deploying machine learning models for time series prediction in an online setting. Machine learning algorithms adjust model parameters automatically based on the data, but often require users to set additional parameters, known as hyperparameters. Hyperparameters can significantly impact prediction accuracy. Traffic measurements, typically collected online by sensors, are serially correlated. Moreover, the data distribution may change gradually. A typical adaptation strategy is periodically re-tuning the model hyperparameters, at the cost of computational burden. In this work, we present an efficient and principled online hyperparameter optimization algorithm for Kernel Ridge regression applied to traffic prediction problems. In tests with real traffic measurement data, our approach requires as little as one-seventh of the computation time of other tuning methods, while achieving better or similar prediction accuracy. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: An extended version of "Efficient Online Hyperparameter Learning for Traffic Flow Prediction" published in The 21st IEEE International Conference on Intelligent Transportation Systems (ITSC 2018)

Journal ref: H. Zhan, G. Gomes, X. S. Li, K. Madduri, and K. Wu. Efficient Online Hyperparameter Learning for Traffic Flow Prediction. In 2018 IEEE 21th International Conference on Intelligent Transportation Systems (ITSC), pages 1-6. IEEE, 2018

arXiv:1810.01483 [pdf, other]

doi 10.1016/j.ascom.2019.100307

DeepCMB: Lensing Reconstruction of the Cosmic Microwave Background with Deep Neural Networks

Authors: João Caldeira, W. L. Kimmy Wu, Brian Nord, Camille Avestruz, Shubhendu Trivedi, Kyle T. Story

Abstract: Next-generation cosmic microwave background (CMB) experiments will have lower noise and therefore increased sensitivity, enabling improved constraints on fundamental physics parameters such as the sum of neutrino masses and the tensor-to-scalar ratio r. Achieving competitive constraints on these parameters requires high signal-to-noise extraction of the projected gravitational potential from the C… ▽ More Next-generation cosmic microwave background (CMB) experiments will have lower noise and therefore increased sensitivity, enabling improved constraints on fundamental physics parameters such as the sum of neutrino masses and the tensor-to-scalar ratio r. Achieving competitive constraints on these parameters requires high signal-to-noise extraction of the projected gravitational potential from the CMB maps. Standard methods for reconstructing the lensing potential employ the quadratic estimator (QE). However, the QE performs suboptimally at the low noise levels expected in upcoming experiments. Other methods, like maximum likelihood estimators (MLE), are under active development. In this work, we demonstrate reconstruction of the CMB lensing potential with deep convolutional neural networks (CNN) - ie, a ResUNet. The network is trained and tested on simulated data, and otherwise has no physical parametrization related to the physical processes of the CMB and gravitational lensing. We show that, over a wide range of angular scales, ResUNets recover the input gravitational potential with a higher signal-to-noise ratio than the QE method, reaching levels comparable to analytic approximations of MLE methods. We demonstrate that the network outputs quantifiably different lensing maps when given input CMB maps generated with different cosmologies. We also show we can use the reconstructed lensing map for cosmological parameter estimation. This application of CNN provides a few innovations at the intersection of cosmology and machine learning. First, while training and regressing on images, we predict a continuous-variable field rather than discrete classes. Second, we are able to establish uncertainty measures for the network output that are analogous to standard methods. We expect this approach to excel in capturing hard-to-model non-Gaussian astrophysical foreground and noise contributions. △ Less

Submitted 12 June, 2020; v1 submitted 2 October, 2018; originally announced October 2018.

Comments: 19 pages; LaTeX; 12 figures; changes to match published version

Report number: FERMILAB-PUB-18-515-A-CD

Journal ref: Astronomy and Computing 28 100307 (2019)

arXiv:1809.09170 [pdf, other]

doi 10.1016/j.jcp.2019.01.030

Numerical Aspects for Approximating Governing Equations Using Data

Authors: Kailiang Wu, Dongbin Xiu

Abstract: We present effective numerical algorithms for locally recovering unknown governing differential equations from measurement data. We employ a set of standard basis functions, e.g., polynomials, to approximate the governing equation with high accuracy. Upon recasting the problem into a function approximation problem, we discuss several important aspects for accurate approximation. Most notably, we d… ▽ More We present effective numerical algorithms for locally recovering unknown governing differential equations from measurement data. We employ a set of standard basis functions, e.g., polynomials, to approximate the governing equation with high accuracy. Upon recasting the problem into a function approximation problem, we discuss several important aspects for accurate approximation. Most notably, we discuss the importance of using a large number of short bursts of trajectory data, rather than using data from a single long trajectory. Several options for the numerical algorithms to perform accurate approximation are then presented, along with an error estimate of the final equation approximation. We then present an extensive set of numerical examples of both linear and nonlinear systems to demonstrate the properties and effectiveness of our equation recovery algorithms. △ Less

Submitted 24 September, 2018; originally announced September 2018.

Comments: 26 pages, 17 figures

Journal ref: Journal of Computational Physics, 384, 200-221, 2019

arXiv:1703.10951 [pdf, other]

Comparison of multi-task convolutional neural network (MT-CNN) and a few other methods for toxicity prediction

Authors: Kedi Wu, Guo-Wei Wei

Abstract: Toxicity analysis and prediction are of paramount importance to human health and environmental protection. Existing computational methods are built from a wide variety of descriptors and regressors, which makes their performance analysis difficult. For example, deep neural network (DNN), a successful approach in many occasions, acts like a black box and offers little conceptual elegance or physica… ▽ More Toxicity analysis and prediction are of paramount importance to human health and environmental protection. Existing computational methods are built from a wide variety of descriptors and regressors, which makes their performance analysis difficult. For example, deep neural network (DNN), a successful approach in many occasions, acts like a black box and offers little conceptual elegance or physical understanding. The present work constructs a common set of microscopic descriptors based on established physical models for charges, surface areas and free energies to assess the performance of multi-task convolutional neural network (MT-CNN) architectures and a few other approaches, including random forest (RF) and gradient boosting decision tree (GBDT), on an equal footing. Comparison is also given to convolutional neural network (CNN) and non-convolutional deep neural network (DNN) algorithms. Four benchmark toxicity data sets (i.e., endpoints) are used to evaluate various approaches. Extensive numerical studies indicate that the present MT-CNN architecture is able to outperform the state-of-the-art methods. △ Less

Submitted 31 March, 2017; originally announced March 2017.

arXiv:1607.07834 [pdf]

A W-test collapsing method for rare variant testing with applications to exome sequencing data of hypertensive disorder

Authors: Rui Sun, Haoyi Weng, Inchi Hu, Junfeng Guo, William K. K. Wu, Benny Chung-Ying Zee, Maggie Haitian Wang

Abstract: Advancement in sequencing technology enables the study of association between complex disorders and rare variants with low minor allele frequencies. One of the major challenges in rare variant testing is lack of statistical power of traditional testing methods due to extremely low variances of single nucleotide polymorphisms. In this paper, we introduce a W-test collapsing method that evaluates th… ▽ More Advancement in sequencing technology enables the study of association between complex disorders and rare variants with low minor allele frequencies. One of the major challenges in rare variant testing is lack of statistical power of traditional testing methods due to extremely low variances of single nucleotide polymorphisms. In this paper, we introduce a W-test collapsing method that evaluates the distributional differences in cases and controls using a combined log of odds ratio. The proposed method is compared with the Weighted-Sum Statistic and Sequence Kernel Association Test using simulation data sets and showed better performances and faster computing speed. In the study of real next generation sequencing data set of hypertensive disorder, we identified genes of interesting biological functions that are associated to metabolism disorder and inflammation, which include the MACROD1, NLRP7, AGK, PAK6 and APBB1. The W-test collapsing method offers a fast, effective and alternative way for rare variants association analysis. △ Less

Submitted 26 July, 2016; originally announced July 2016.

Comments: 18 pages, 1 figure, 4 tables. Genetic Epidemiology accepted

arXiv:1508.06700 [pdf, other]

doi 10.1016/j.jcp.2016.06.020

A surrogate accelerated multicanonical Monte Carlo method for uncertainty quantification

Authors: Keyi Wu, **glai Li

Abstract: In this work we consider a class of uncertainty quantification problems where the system performance or reliability is characterized by a scalar parameter $y$. The performance parameter $y$ is random due to the presence of various sources of uncertainty in the system, and our goal is to estimate the probability density function (PDF) of $y$. We propose to use the multicanonical Monte Carlo (MMC) m… ▽ More In this work we consider a class of uncertainty quantification problems where the system performance or reliability is characterized by a scalar parameter $y$. The performance parameter $y$ is random due to the presence of various sources of uncertainty in the system, and our goal is to estimate the probability density function (PDF) of $y$. We propose to use the multicanonical Monte Carlo (MMC) method, a special type of adaptive importance sampling algorithm, to compute the PDF of interest. Moreover, we develop an adaptive algorithm to construct local Gaussian process surrogates to further accelerate the MMC iterations. With numerical examples we demonstrate that the proposed method can achieve several orders of magnitudes of speedup over the standard Monte Carlo method. △ Less

Submitted 12 April, 2016; v1 submitted 26 August, 2015; originally announced August 2015.

MSC Class: 65C05; 65C50

Showing 1–46 of 46 results for author: Wu, K