-
Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost
Authors:
Zhong Zheng,
Haochen Zhang,
Lingzhou Xue
Abstract:
In this paper, we consider model-free federated reinforcement learning for tabular episodic Markov decision processes. Under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. Despite recent advances in federated Q-learning algorithms achieving near-linear regret speedup with low communication co…
▽ More
In this paper, we consider model-free federated reinforcement learning for tabular episodic Markov decision processes. Under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. Despite recent advances in federated Q-learning algorithms achieving near-linear regret speedup with low communication cost, existing algorithms only attain suboptimal regrets compared to the information bound. We propose a novel model-free federated Q-learning algorithm, termed FedQ-Advantage. Our algorithm leverages reference-advantage decomposition for variance reduction and operates under two distinct mechanisms: synchronization between the agents and the server, and policy update, both triggered by events. We prove that our algorithm not only requires a lower logarithmic communication cost but also achieves an almost optimal regret, reaching the information bound up to a logarithmic factor and near-linear regret speedup compared to its single-agent counterpart when the time horizon is sufficiently large.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Quantifying Uncertainty in Classification Performance: ROC Confidence Bands Using Conformal Prediction
Authors:
Zheshi Zheng,
Bo Yang,
Peter Song
Abstract:
To evaluate a classification algorithm, it is common practice to plot the ROC curve using test data. However, the inherent randomness in the test data can undermine our confidence in the conclusions drawn from the ROC curve, necessitating uncertainty quantification. In this article, we propose an algorithm to construct confidence bands for the ROC curve, quantifying the uncertainty of classificati…
▽ More
To evaluate a classification algorithm, it is common practice to plot the ROC curve using test data. However, the inherent randomness in the test data can undermine our confidence in the conclusions drawn from the ROC curve, necessitating uncertainty quantification. In this article, we propose an algorithm to construct confidence bands for the ROC curve, quantifying the uncertainty of classification on the test data in terms of sensitivity and specificity. The algorithm is based on a procedure called conformal prediction, which constructs individualized confidence intervals for the test set and the confidence bands for the ROC curve can be obtained by combining the individualized intervals together. Furthermore, we address both scenarios where the test data are either iid or non-iid relative to the observed data set and propose distinct algorithms for each case with valid coverage probability. The proposed method is validated through both theoretical results and numerical experiments.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Language Model Prompt Selection via Simulation Optimization
Authors:
Haoting Zhang,
**ghai He,
Rhonda Righter,
Zeyu Zheng
Abstract:
With the advancement in generative language models, the selection of prompts has gained significant attention in recent years. A prompt is an instruction or description provided by the user, serving as a guide for the generative language model in content generation. Despite existing methods for prompt selection that are based on human labor, we consider facilitating this selection through simulati…
▽ More
With the advancement in generative language models, the selection of prompts has gained significant attention in recent years. A prompt is an instruction or description provided by the user, serving as a guide for the generative language model in content generation. Despite existing methods for prompt selection that are based on human labor, we consider facilitating this selection through simulation optimization, aiming to maximize a pre-defined score for the selected prompt. Specifically, we propose a two-stage framework. In the first stage, we determine a feasible set of prompts in sufficient numbers, where each prompt is represented by a moderate-dimensional vector. In the subsequent stage for evaluation and selection, we construct a surrogate model of the score regarding the moderate-dimensional vectors that represent the prompts. We propose sequentially selecting the prompt for evaluation based on this constructed surrogate model. We prove the consistency of the sequential evaluation procedure in our framework. We also conduct numerical experiments to demonstrate the efficacy of our proposed framework, providing practical instructions for implementation.
△ Less
Submitted 19 May, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Human Alignment of Large Language Models through Online Preference Optimisation
Authors:
Daniele Calandriello,
Daniel Guo,
Remi Munos,
Mark Rowland,
Yunhao Tang,
Bernardo Avila Pires,
Pierre Harvey Richemond,
Charline Le Lan,
Michal Valko,
Tianqi Liu,
Rishabh Joshi,
Zeyu Zheng,
Bilal Piot
Abstract:
Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contributio…
▽ More
Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contribution is two-fold. First, we show the equivalence between two recent alignment methods, namely Identity Policy Optimisation (IPO) and Nash Mirror Descent (Nash-MD). Second, we introduce a generalisation of IPO, named IPO-MD, that leverages the regularised sampling approach proposed by Nash-MD.
This equivalence may seem surprising at first sight, since IPO is an offline method whereas Nash-MD is an online method using a preference model. However, this equivalence can be proven when we consider the online version of IPO, that is when both generations are sampled by the online policy and annotated by a trained preference model. Optimising the IPO loss with such a stream of data becomes then equivalent to finding the Nash equilibrium of the preference model through self-play. Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm. We compare online-IPO and IPO-MD to different online versions of existing losses on preference data such as DPO and SLiC on a summarisation task.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Multivariate Probabilistic Time Series Forecasting with Correlated Errors
Authors:
Vincent Zhihao Zheng,
Lijun Sun
Abstract:
Accurately modeling the correlation structure of errors is essential for reliable uncertainty quantification in probabilistic time series forecasting. Recent deep learning models for multivariate time series have developed efficient parameterizations for time-varying contemporaneous covariance, but they often assume temporal independence of errors for simplicity. However, real-world data frequentl…
▽ More
Accurately modeling the correlation structure of errors is essential for reliable uncertainty quantification in probabilistic time series forecasting. Recent deep learning models for multivariate time series have developed efficient parameterizations for time-varying contemporaneous covariance, but they often assume temporal independence of errors for simplicity. However, real-world data frequently exhibit significant error autocorrelation and cross-lag correlation due to factors such as missing covariates. In this paper, we present a plug-and-play method that learns the covariance structure of errors over multiple steps for autoregressive models with Gaussian-distributed errors. To achieve scalable inference and computational efficiency, we model the contemporaneous covariance using a low-rank-plus-diagonal parameterization and characterize cross-covariance through a group of independent latent temporal processes. The learned covariance matrix can be used to calibrate predictions based on observed residuals. We evaluate our method on probabilistic models built on RNN and Transformer architectures, and the results confirm the effectiveness of our approach in enhancing predictive accuracy and uncertainty quantification without significantly increasing the parameter size.
△ Less
Submitted 31 May, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Weakly Supervised Learners for Correction of AI Errors with Provable Performance Guarantees
Authors:
Ivan Y. Tyukin,
Tatiana Tyukina,
Daniel van Helden,
Zedong Zheng,
Evgeny M. Mirkes,
Oliver J. Sutton,
Qinghua Zhou,
Alexander N. Gorban,
Penelope Allison
Abstract:
We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining fro…
▽ More
We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining from making a decision. A key technical focus of the work is in providing performance guarantees for these new AI correctors through bounds on the probabilities of incorrect decisions. These bounds are distribution agnostic and do not rely on assumptions on the data dimension. Our empirical example illustrates how the framework can be applied to improve the performance of an image classifier in a challenging real-world task where training data are scarce.
△ Less
Submitted 13 February, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Federated Q-Learning: Linear Regret Speedup with Low Communication Cost
Authors:
Zhong Zheng,
Fengyu Gao,
Lingzhou Xue,
**g Yang
Abstract:
In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample com…
▽ More
In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample complexity, in similar settings, it is unclear whether it is possible to design a model-free algorithm to achieve linear regret speedup with low communication cost. We propose two federated Q-Learning algorithms termed as FedQ-Hoeffding and FedQ-Bernstein, respectively, and show that the corresponding total regrets achieve a linear speedup compared with their single-agent counterparts when the time horizon is sufficiently large, while the communication cost scales logarithmically in the total number of time steps $T$. Those results rely on an event-triggered synchronization mechanism between the agents and the server, a novel step size selection when the server aggregates the local estimates of the state-action values to form the global estimates, and a set of new concentration inequalities to bound the sum of non-martingale differences. This is the first work showing that linear regret speedup and logarithmic communication cost can be achieved by model-free algorithms in federated reinforcement learning.
△ Less
Submitted 7 May, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Authors:
Xiaoxia Wu,
Haojun Xia,
Stephen Youn,
Zhen Zheng,
Shiyang Chen,
Arash Bakhtiari,
Michael Wyatt,
Reza Yazdani Aminabadi,
Yuxiong He,
Olatunji Ruwase,
Leon Song,
Zhewei Yao
Abstract:
This study examines 4-bit quantization methods like GPTQ in large language models (LLMs), highlighting GPTQ's overfitting and limited enhancement in Zero-Shot tasks. While prior works merely focusing on zero-shot measurement, we extend task scope to more generative categories such as code generation and abstractive summarization, in which we found that INT4 quantization can significantly underperf…
▽ More
This study examines 4-bit quantization methods like GPTQ in large language models (LLMs), highlighting GPTQ's overfitting and limited enhancement in Zero-Shot tasks. While prior works merely focusing on zero-shot measurement, we extend task scope to more generative categories such as code generation and abstractive summarization, in which we found that INT4 quantization can significantly underperform. However, simply shifting to higher precision formats like FP6 has been particularly challenging, thus overlooked, due to poor performance caused by the lack of sophisticated integration and system acceleration strategies on current AI hardware. Our results show that FP6, even with a coarse-grain quantization scheme, performs robustly across various algorithms and tasks, demonstrating its superiority in accuracy and versatility. Notably, with the FP6 quantization, \codestar-15B model performs comparably to its FP16 counterpart in code generation, and for smaller models like the 406M it closely matches their baselines in summarization. Neither can be achieved by INT4. To better accommodate various AI hardware and achieve the best system performance, we propose a novel 4+2 design for FP6 to achieve similar latency to the state-of-the-art INT4 fine-grain quantization. With our design, FP6 can become a promising solution to the current 4-bit quantization methods used in LLMs.
△ Less
Submitted 18 December, 2023; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Causal inference with Machine Learning-Based Covariate Representation
Authors:
Yuhang Wu,
**ghai He,
Zeyu Zheng
Abstract:
Utilizing covariate information has been a powerful approach to improve the efficiency and accuracy for causal inference, which support massive amount of randomized experiments run on data-driven enterprises. However, state-of-art approaches can become practically unreliable when the dimension of covariate increases to just 50, whereas experiments on large platforms can observe even higher dimensi…
▽ More
Utilizing covariate information has been a powerful approach to improve the efficiency and accuracy for causal inference, which support massive amount of randomized experiments run on data-driven enterprises. However, state-of-art approaches can become practically unreliable when the dimension of covariate increases to just 50, whereas experiments on large platforms can observe even higher dimension of covariate. We propose a machine-learning-assisted covariate representation approach that can effectively make use of historical experiment or observational data that are run on the same platform to understand which lower dimensions can effectively represent the higher-dimensional covariate. We then propose design and estimation methods with the covariate representation. We prove statistically reliability and performance guarantees for the proposed methods. The empirical performance is demonstrated using numerical experiments.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
SOFARI: High-Dimensional Manifold-Based Inference
Authors:
Zemin Zheng,
Xin Zhou,
Yingying Fan,
**chi Lv
Abstract:
Multi-task learning is a widely used technique for harnessing information from various tasks. Recently, the sparse orthogonal factor regression (SOFAR) framework, based on the sparse singular value decomposition (SVD) within the coefficient matrix, was introduced for interpretable multi-task learning, enabling the discovery of meaningful latent feature-response association networks across differen…
▽ More
Multi-task learning is a widely used technique for harnessing information from various tasks. Recently, the sparse orthogonal factor regression (SOFAR) framework, based on the sparse singular value decomposition (SVD) within the coefficient matrix, was introduced for interpretable multi-task learning, enabling the discovery of meaningful latent feature-response association networks across different layers. However, conducting precise inference on the latent factor matrices has remained challenging due to orthogonality constraints inherited from the sparse SVD constraint. In this paper, we suggest a novel approach called high-dimensional manifold-based SOFAR inference (SOFARI), drawing on the Neyman near-orthogonality inference while incorporating the Stiefel manifold structure imposed by the SVD constraints. By leveraging the underlying Stiefel manifold structure, SOFARI provides bias-corrected estimators for both latent left factor vectors and singular values, for which we show to enjoy the asymptotic mean-zero normal distributions with estimable variances. We introduce two SOFARI variants to handle strongly and weakly orthogonal latent factors, where the latter covers a broader range of applications. We illustrate the effectiveness of SOFARI and justify our theoretical results through simulation examples and a real data application in economic forecasting.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Better Batch for Deep Probabilistic Time Series Forecasting
Authors:
Vincent Zhihao Zheng,
Seong** Choi,
Lijun Sun
Abstract:
Deep probabilistic time series forecasting has gained attention for its ability to provide nonlinear approximation and valuable uncertainty quantification for decision-making. However, existing models often oversimplify the problem by assuming a time-independent error process and overlooking serial correlation. To overcome this limitation, we propose an innovative training method that incorporates…
▽ More
Deep probabilistic time series forecasting has gained attention for its ability to provide nonlinear approximation and valuable uncertainty quantification for decision-making. However, existing models often oversimplify the problem by assuming a time-independent error process and overlooking serial correlation. To overcome this limitation, we propose an innovative training method that incorporates error autocorrelation to enhance probabilistic forecasting accuracy. Our method constructs a mini-batch as a collection of $D$ consecutive time series segments for model training. It explicitly learns a time-varying covariance matrix over each mini-batch, encoding error correlation among adjacent time steps. The learned covariance matrix can be used to improve prediction accuracy and enhance uncertainty quantification. We evaluate our method on two different neural forecasting models and multiple public datasets. Experimental results confirm the effectiveness of the proposed approach in improving the performance of both models across a range of datasets, resulting in notable improvements in predictive accuracy.
△ Less
Submitted 21 May, 2024; v1 submitted 26 May, 2023;
originally announced May 2023.
-
A New Inexact Proximal Linear Algorithm with Adaptive Stop** Criteria for Robust Phase Retrieval
Authors:
Zhong Zheng,
Shiqian Ma,
Lingzhou Xue
Abstract:
This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stop** criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and re…
▽ More
This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stop** criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and real datasets, we demonstrate that our methods are much more efficient than existing methods, such as the original proximal linear algorithm and the subgradient method.
△ Less
Submitted 8 February, 2024; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk
Authors:
David Simchi-Levi,
Zeyu Zheng,
Feng Zhu
Abstract:
We study the trade-off between expectation and tail risk for regret distribution in the stochastic multi-armed bandit problem. We fully characterize the interplay among three desired properties for policy design: worst-case optimality, instance-dependent consistency, and light-tailed risk. We show how the order of expected regret exactly affects the decaying rate of the regret tail probability for…
▽ More
We study the trade-off between expectation and tail risk for regret distribution in the stochastic multi-armed bandit problem. We fully characterize the interplay among three desired properties for policy design: worst-case optimality, instance-dependent consistency, and light-tailed risk. We show how the order of expected regret exactly affects the decaying rate of the regret tail probability for both the worst-case and instance-dependent scenario. A novel policy is proposed to characterize the optimal regret tail probability for any regret threshold. Concretely, for any given $α\in[1/2, 1)$ and $β\in[0, α]$, our policy achieves a worst-case expected regret of $\tilde O(T^α)$ (we call it $α$-optimal) and an instance-dependent expected regret of $\tilde O(T^β)$ (we call it $β$-consistent), while enjoys a probability of incurring an $\tilde O(T^δ)$ regret ($δ\geqα$ in the worst-case scenario and $δ\geqβ$ in the instance-dependent scenario) that decays exponentially with a polynomial $T$ term. Such decaying rate is proved to be best achievable. Moreover, we discover an intrinsic gap of the optimal tail rate under the instance-dependent scenario between whether the time horizon $T$ is known a priori or not. Interestingly, when it comes to the worst-case scenario, this gap disappears. Finally, we extend our proposed policy design to (1) a stochastic multi-armed bandit setting with non-stationary baseline rewards, and (2) a stochastic linear bandit setting. Our results reveal insights on the trade-off between regret expectation and regret tail risk for both worst-case and instance-dependent scenarios, indicating that more sub-optimality and inconsistency leave space for more light-tailed risk of incurring a large regret, and that knowing the planning horizon in advance can make a difference on alleviating tail risks.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
Best Arm Identification with Fairness Constraints on Subpopulations
Authors:
Yuhang Wu,
Zeyu Zheng,
Tingyu Zhu
Abstract:
We formulate, analyze and solve the problem of best arm identification with fairness constraints on subpopulations (BAICS). Standard best arm identification problems aim at selecting an arm that has the largest expected reward where the expectation is taken over the entire population. The BAICS problem requires that an selected arm must be fair to all subpopulations (e.g., different ethnic groups,…
▽ More
We formulate, analyze and solve the problem of best arm identification with fairness constraints on subpopulations (BAICS). Standard best arm identification problems aim at selecting an arm that has the largest expected reward where the expectation is taken over the entire population. The BAICS problem requires that an selected arm must be fair to all subpopulations (e.g., different ethnic groups, age groups, or customer types) by satisfying constraints that the expected reward conditional on every subpopulation needs to be larger than some thresholds. The BAICS problem aims at correctly identify, with high confidence, the arm with the largest expected reward from all arms that satisfy subpopulation constraints. We analyze the complexity of the BAICS problem by proving a best achievable lower bound on the sample complexity with closed-form representation. We then design an algorithm and prove that the algorithm's sample complexity matches with the lower bound in terms of order. A brief account of numerical experiments are conducted to illustrate the theoretical findings.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
Enhancing Deep Traffic Forecasting Models with Dynamic Regression
Authors:
Vincent Zhihao Zheng,
Seong** Choi,
Lijun Sun
Abstract:
Deep learning models for traffic forecasting often assume the residual is independent and isotropic across time and space. This assumption simplifies loss functions such as mean absolute error, but real-world residual processes often exhibit significant autocorrelation and structured spatiotemporal correlation. This paper introduces a dynamic regression (DR) framework to enhance existing spatiotem…
▽ More
Deep learning models for traffic forecasting often assume the residual is independent and isotropic across time and space. This assumption simplifies loss functions such as mean absolute error, but real-world residual processes often exhibit significant autocorrelation and structured spatiotemporal correlation. This paper introduces a dynamic regression (DR) framework to enhance existing spatiotemporal traffic forecasting models by incorporating structured learning for the residual process. We assume the residual of the base model (i.e., a well-developed traffic forecasting model) follows a matrix-variate seasonal autoregressive (AR) model, which is seamlessly integrated into the training process through the redesign of the loss function. Importantly, the parameters of the DR framework are jointly optimized alongside the base model. We evaluate the effectiveness of the proposed framework on state-of-the-art (SOTA) deep traffic forecasting models using both speed and flow datasets, demonstrating improved performance and providing interpretable AR coefficients and spatiotemporal covariance matrices.
△ Less
Submitted 31 May, 2024; v1 submitted 16 January, 2023;
originally announced January 2023.
-
Anonymous Pattern Molecular Fingerprint and its Applications on Property Identification
Authors:
Xue Liu,
Qian Cheng,
Dan Sun,
Xing Li,
Wei Wei,
Zhiming Zheng
Abstract:
Molecular fingerprints are significant cheminformatics tools to map molecules into vectorial space according to their characteristics in diverse functional groups, atom sequences, and other topological structures. In this paper, we set out to investigate a novel molecular fingerprint \emph{Anonymous-FP} that possesses abundant perception about the underlying interactions shaped in small, medium, a…
▽ More
Molecular fingerprints are significant cheminformatics tools to map molecules into vectorial space according to their characteristics in diverse functional groups, atom sequences, and other topological structures. In this paper, we set out to investigate a novel molecular fingerprint \emph{Anonymous-FP} that possesses abundant perception about the underlying interactions shaped in small, medium, and large molecular scale links. In detail, the possible inherent atom chains are sampled from each molecule and are extended in a certain anonymous pattern. After that, the molecular fingerprint \emph{Anonymous-FP} is encoded in virtue of the Natural Language Processing technique \emph{PV-DBOW}. \emph{Anonymous-FP} is studied on molecular property identification and has shown valuable advantages such as rich information content, high experimental performance, and full structural significance. During the experimental verification, the scale of the atom chain or its anonymous manner matters significantly to the overall representation ability of \emph{Anonymous-FP}. Generally, the typical scale $r = 8$ enhances the performance on a series of real-world molecules, and specifically, the accuracy could level up to above $93\%$ on all NCI datasets.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Efficient Targeted Learning of Heterogeneous Treatment Effects for Multiple Subgroups
Authors:
Waverly Wei,
Maya Petersen,
Mark J van der Laan,
Zeyu Zheng,
Chong Wu,
**gshen Wang
Abstract:
In biomedical science, analyzing treatment effect heterogeneity plays an essential role in assisting personalized medicine. The main goals of analyzing treatment effect heterogeneity include estimating treatment effects in clinically relevant subgroups and predicting whether a patient subpopulation might benefit from a particular treatment. Conventional approaches often evaluate the subgroup treat…
▽ More
In biomedical science, analyzing treatment effect heterogeneity plays an essential role in assisting personalized medicine. The main goals of analyzing treatment effect heterogeneity include estimating treatment effects in clinically relevant subgroups and predicting whether a patient subpopulation might benefit from a particular treatment. Conventional approaches often evaluate the subgroup treatment effects via parametric modeling and can thus be susceptible to model mis-specifications. In this manuscript, we take a model-free semiparametric perspective and aim to efficiently evaluate the heterogeneous treatment effects of multiple subgroups simultaneously under the one-step targeted maximum-likelihood estimation (TMLE) framework. When the number of subgroups is large, we further expand this path of research by looking at a variation of the one-step TMLE that is robust to the presence of small estimated propensity scores in finite samples. From our simulations, our method demonstrates substantial finite sample improvements compared to conventional methods. In a case study, our method unveils the potential treatment effect heterogeneity of rs12916-T allele (a proxy for statin usage) in decreasing Alzheimer's disease risk.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
Adaptive A/B Tests and Simultaneous Treatment Parameter Optimization
Authors:
Yuhang Wu,
Zeyu Zheng,
Guangyu Zhang,
Zuohua Zhang,
Chu Wang
Abstract:
Constructing asymptotically valid confidence intervals through a valid central limit theorem is crucial for A/B tests, where a classical goal is to statistically assert whether a treatment plan is significantly better than a control plan. In some emerging applications for online platforms, the treatment plan is not a single plan, but instead encompasses an infinite continuum of plans indexed by a…
▽ More
Constructing asymptotically valid confidence intervals through a valid central limit theorem is crucial for A/B tests, where a classical goal is to statistically assert whether a treatment plan is significantly better than a control plan. In some emerging applications for online platforms, the treatment plan is not a single plan, but instead encompasses an infinite continuum of plans indexed by a continuous treatment parameter. As such, the experimenter not only needs to provide valid statistical inference, but also desires to effectively and adaptively find the optimal choice of value for the treatment parameter to use for the treatment plan. However, we find that classical optimization algorithms, despite of their fast convergence rates under convexity assumptions, do not come with a central limit theorem that can be used to construct asymptotically valid confidence intervals. We fix this issue by providing a new optimization algorithm that on one hand maintains the same fast convergence rate and on the other hand permits the establishment of a valid central limit theorem. We discuss practical implementations of the proposed algorithm and conduct numerical experiments to illustrate the theoretical findings.
△ Less
Submitted 6 December, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Application of Neural Network in the Prediction of NOx Emissions from Degrading Gas Turbine
Authors:
Zhenkun Zheng,
Alan Rezazadeh
Abstract:
This paper is aiming to apply neural network algorithm for predicting the process response (NOx emissions) from degrading natural gas turbines. Nine different process variables, or predictors, are considered in the predictive modelling. It is found out that the model trained by neural network algorithm should use part of recent data in the training and validation sets accounting for the impact of…
▽ More
This paper is aiming to apply neural network algorithm for predicting the process response (NOx emissions) from degrading natural gas turbines. Nine different process variables, or predictors, are considered in the predictive modelling. It is found out that the model trained by neural network algorithm should use part of recent data in the training and validation sets accounting for the impact of the system degradation. R-Square values of the training and validation sets demonstrate the validity of the model. The residue plot, without any clear pattern, shows the model is appropriate. The ranking of the importance of the process variables are demonstrated and the prediction profile confirms the significance of the process variables. The model trained by using neural network algorithm manifests the optimal settings of the process variables to reach the minimum value of NOx emissions from the degrading gas turbine system.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Inference on the Best Policies with Many Covariates
Authors:
Waverly Wei,
Yuqing Zhou,
Zeyu Zheng,
**gshen Wang
Abstract:
Understanding the impact of the most effective policies or treatments on a response variable of interest is desirable in many empirical works in economics, statistics and other disciplines. Due to the widespread winner's curse phenomenon, conventional statistical inference assuming that the top policies are chosen independent of the random sample may lead to overly optimistic evaluations of the be…
▽ More
Understanding the impact of the most effective policies or treatments on a response variable of interest is desirable in many empirical works in economics, statistics and other disciplines. Due to the widespread winner's curse phenomenon, conventional statistical inference assuming that the top policies are chosen independent of the random sample may lead to overly optimistic evaluations of the best policies. In recent years, given the increased availability of large datasets, such an issue can be further complicated when researchers include many covariates to estimate the policy or treatment effects in an attempt to control for potential confounders. In this manuscript, to simultaneously address the above-mentioned issues, we propose a resampling-based procedure that not only lifts the winner's curse in evaluating the best policies observed in a random sample, but also is robust to the presence of many covariates. The proposed inference procedure yields accurate point estimates and valid frequentist confidence intervals that achieve the exact nominal level as the sample size goes to infinity for multiple best policy effect sizes. We illustrate the finite-sample performance of our approach through Monte Carlo experiments and two empirical studies, evaluating the most effective policies in charitable giving and the most beneficial group of workers in the National Supported Work program.
△ Less
Submitted 21 October, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Stochastic Bandits
Authors:
David Simchi-Levi,
Zeyu Zheng,
Feng Zhu
Abstract:
We study the stochastic multi-armed bandit problem and design new policies that enjoy both worst-case optimality for expected regret and light-tailed risk for regret distribution. Starting from the two-armed bandit setting with time horizon $T$, we propose a simple policy and prove that the policy (i) enjoys the worst-case optimality for the expected regret at order $O(\sqrt{T\ln T})$ and (ii) has…
▽ More
We study the stochastic multi-armed bandit problem and design new policies that enjoy both worst-case optimality for expected regret and light-tailed risk for regret distribution. Starting from the two-armed bandit setting with time horizon $T$, we propose a simple policy and prove that the policy (i) enjoys the worst-case optimality for the expected regret at order $O(\sqrt{T\ln T})$ and (ii) has the worst-case tail probability of incurring a linear regret decay at an exponential rate $\exp(-Ω(\sqrt{T}))$, a rate that we prove to be best achievable for all worst-case optimal policies. Briefly, our proposed policy achieves a delicate balance between doing more exploration at the beginning of the time horizon and doing more exploitation when approaching the end, compared to the standard Successive Elimination policy and Upper Confidence Bound policy. We then improve the policy design and analysis to work for the general $K$-armed bandit setting. Specifically, the worst-case probability of incurring a regret larger than any $x>0$ is upper bounded by $\exp(-Ω(x/\sqrt{KT}))$. We then enhance the policy design to accommodate the "any-time" setting where $T$ is not known a priori, and prove equivalently desired policy performances as compared to the "fixed-time" setting with known $T$. A brief account of numerical experiments is conducted to illustrate the theoretical findings. We conclude by extending our proposed policy design to the general stochastic linear bandit setting and proving that the policy leads to both worst-case optimality in terms of expected regret order and light-tailed risk on the regret distribution.
△ Less
Submitted 14 November, 2022; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Higher Order Correlation Analysis for Multi-View Learning
Authors:
Jiawang Nie,
Li Wang,
Zequn Zheng
Abstract:
Multi-view learning is frequently used in data science. The pairwise correlation maximization is a classical approach for exploring the consensus of multiple views. Since the pairwise correlation is inherent for two views, the extensions to more views can be diversified and the intrinsic interconnections among views are generally lost. To address this issue, we propose to maximize higher order cor…
▽ More
Multi-view learning is frequently used in data science. The pairwise correlation maximization is a classical approach for exploring the consensus of multiple views. Since the pairwise correlation is inherent for two views, the extensions to more views can be diversified and the intrinsic interconnections among views are generally lost. To address this issue, we propose to maximize higher order correlations. This can be formulated as a low rank approximation problem with the higher order correlation tensor of multi-view data. We use the generating polynomial method to solve the low rank approximation problem. Numerical results on real multi-view data demonstrate that this method consistently outperforms prior existing methods.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Selecting the Best Optimizing System
Authors:
Nian Si,
Zeyu Zheng
Abstract:
We formulate selecting the best optimizing system (SBOS) problems and provide solutions for those problems. In an SBOS problem, a finite number of systems are contenders. Inside each system, a continuous decision variable affects the system's expected performance. An SBOS problem compares different systems based on their expected performances under their own optimally chosen decision to select the…
▽ More
We formulate selecting the best optimizing system (SBOS) problems and provide solutions for those problems. In an SBOS problem, a finite number of systems are contenders. Inside each system, a continuous decision variable affects the system's expected performance. An SBOS problem compares different systems based on their expected performances under their own optimally chosen decision to select the best, without advance knowledge of expected performances of the systems nor the optimizing decision inside each system. We design easy-to-implement algorithms that adaptively chooses a system and a choice of decision to evaluate the noisy system performance, sequentially eliminates inferior systems, and eventually recommends a system as the best after spending a user-specified budget. The proposed algorithms integrate the stochastic gradient descent method and the sequential elimination method to simultaneously exploit the structure inside each system and make comparisons across systems. For the proposed algorithms, we prove exponential rates of convergence to zero for the probability of false selection, as the budget grows to infinity. We conduct three numerical examples that represent three practical cases of SBOS problems. Our proposed algorithms demonstrate consistent and stronger performances in terms of the probability of false selection over benchmark algorithms under a range of problem settings and sampling budgets.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
Offline Planning and Online Learning under Recovering Rewards
Authors:
David Simchi-Levi,
Zeyu Zheng,
Feng Zhu
Abstract:
Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops…
▽ More
Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the arm's idle time increases. With the objective of maximizing the expected cumulative reward over $T$ time periods, we design a class of ``Purely Periodic Policies'' that jointly set a period to pull each arm. For the proposed policies, we prove performance guarantees for both the offline problem and the online problems. For the offline problem when all model parameters are known, the proposed periodic policy obtains an approximation ratio that is at the order of $1-\mathcal O(1/\sqrt{K})$, which is asymptotically optimal when $K$ grows to infinity. For the online problem when the model parameters are unknown and need to be dynamically learned, we integrate the offline periodic policy with the upper confidence bound procedure to construct on online policy. The proposed online policy is proved to approximately have $\widetilde{\mathcal O}(N\sqrt{T})$ regret against the offline benchmark. Our framework and policy design may shed light on broader offline planning and online learning applications with non-stationary and recovering rewards.
△ Less
Submitted 21 December, 2021; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Parallel integrative learning for large-scale multi-response regression with incomplete outcomes
Authors:
Ruipeng Dong,
Daoji Li,
Zemin Zheng
Abstract:
Multi-task learning is increasingly used to investigate the association structure between multiple responses and a single set of predictor variables in many applications. In the era of big data, the coexistence of incomplete outcomes, large number of responses, and high dimensionality in predictors poses unprecedented challenges in estimation, prediction, and computation. In this paper, we propose…
▽ More
Multi-task learning is increasingly used to investigate the association structure between multiple responses and a single set of predictor variables in many applications. In the era of big data, the coexistence of incomplete outcomes, large number of responses, and high dimensionality in predictors poses unprecedented challenges in estimation, prediction, and computation. In this paper, we propose a scalable and computationally efficient procedure, called PEER, for large-scale multi-response regression with incomplete outcomes, where both the numbers of responses and predictors can be high-dimensional. Motivated by sparse factor regression, we convert the multi-response regression into a set of univariate-response regressions, which can be efficiently implemented in parallel. Under some mild regularity conditions, we show that PEER enjoys nice sampling properties including consistency in estimation, prediction, and variable selection. Extensive simulation studies show that our proposal compares favorably with several existing methods in estimation accuracy, variable selection, and computation efficiency.
△ Less
Submitted 11 April, 2021;
originally announced April 2021.
-
Learning Energy-Based Model with Variational Auto-Encoder as Amortized Sampler
Authors:
Jianwen Xie,
Zilong Zheng,
** Li
Abstract:
Due to the intractable partition function, training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo (MCMC) sampling to approximate the gradient of the Kullback-Leibler divergence between data and model distributions. However, it is non-trivial to sample from an EBM because of the difficulty of mixing between modes. In this paper, we propose to learn a variational…
▽ More
Due to the intractable partition function, training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo (MCMC) sampling to approximate the gradient of the Kullback-Leibler divergence between data and model distributions. However, it is non-trivial to sample from an EBM because of the difficulty of mixing between modes. In this paper, we propose to learn a variational auto-encoder (VAE) to initialize the finite-step MCMC, such as Langevin dynamics that is derived from the energy function, for efficient amortized sampling of the EBM. With these amortized MCMC samples, the EBM can be trained by maximum likelihood, which follows an "analysis by synthesis" scheme; while the VAE learns from these MCMC samples via variational Bayes. We call this joint training algorithm the variational MCMC teaching, in which the VAE chases the EBM toward data distribution. We interpret the learning algorithm as a dynamic alternating projection in the context of information geometry. Our proposed models can generate samples comparable to GANs and EBMs. Additionally, we demonstrate that our model can learn effective probabilistic distribution toward supervised conditional learning tasks.
△ Less
Submitted 23 December, 2021; v1 submitted 29 December, 2020;
originally announced December 2020.
-
A Doubly Stochastic Simulator with Applications in Arrivals Modeling and Simulation
Authors:
Yufeng Zheng,
Zeyu Zheng,
Tingyu Zhu
Abstract:
We propose a framework that integrates classical Monte Carlo simulators and Wasserstein generative adversarial networks to model, estimate, and simulate a broad class of arrival processes with general non-stationary and multi-dimensional random arrival rates. Classical Monte Carlo simulators have advantages at capturing the interpretable "physics" of a stochastic object, whereas neural-network-bas…
▽ More
We propose a framework that integrates classical Monte Carlo simulators and Wasserstein generative adversarial networks to model, estimate, and simulate a broad class of arrival processes with general non-stationary and multi-dimensional random arrival rates. Classical Monte Carlo simulators have advantages at capturing the interpretable "physics" of a stochastic object, whereas neural-network-based simulators have advantages at capturing less-interpretable complicated dependence within a high-dimensional distribution. We propose a doubly stochastic simulator that integrates a stochastic generative neural network and a classical Monte Carlo Poisson simulator, to utilize both advantages. Such integration brings challenges to both theoretical reliability and computational tractability for the estimation of the simulator given real data, where the estimation is done through minimizing the Wasserstein distance between the distribution of the simulation output and the distribution of real data. Regarding theoretical properties, we prove consistency and convergence rate for the estimated simulator under a non-parametric smoothness assumption. Regarding computational efficiency and tractability for the estimation procedure, we address a challenge in gradient evaluation that arise from the discontinuity in the Monte Carlo Poisson simulator. Numerical experiments with synthetic and real data sets are implemented to illustrate the performance of the proposed framework.
△ Less
Submitted 9 June, 2023; v1 submitted 27 December, 2020;
originally announced December 2020.
-
Sequential scaled sparse factor regression
Authors:
Zemin Zheng,
Yang Li,
Jie Wu,
Yuchen Wang
Abstract:
Large-scale association analysis between multivariate responses and predictors is of great practical importance, as exemplified by modern business applications including social media marketing and crisis management. Despite the rapid methodological advances, how to obtain scalable estimators with free tuning of the regularization parameters remains unclear under general noise covariance structures…
▽ More
Large-scale association analysis between multivariate responses and predictors is of great practical importance, as exemplified by modern business applications including social media marketing and crisis management. Despite the rapid methodological advances, how to obtain scalable estimators with free tuning of the regularization parameters remains unclear under general noise covariance structures. In this paper, we develop a new methodology called sequential scaled sparse factor regression (SESS) based on a new viewpoint that the problem of recovering a jointly low-rank and sparse regression coefficient matrix can be decomposed into several univariate response sparse regressions through regular eigenvalue decomposition. It combines the strengths of sequential estimation and scaled sparse regression, thus sharing the scalability and the tuning free property for sparsity parameters inherited from the two approaches. The stepwise convex formulation, sequential factor regression framework, and tuning insensitiveness make SESS highly scalable for big data applications. Comprehensive theoretical justifications with new insights into high-dimensional multi-response regressions are also provided. We demonstrate the scalability and effectiveness of the proposed method by simulation studies and stock short interest data analysis.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
MixTwice: large-scale hypothesis testing for peptide arrays by variance mixing
Authors:
Zihao Zheng,
Aisha M. Mergaert,
Irene M. Ong,
Miriam A. Shelef,
Michael A. Newton
Abstract:
Peptide microarrays have emerged as a powerful technology in immunoproteomics as they provide a tool to measure the abundance of different antibodies in patient serum samples. The high dimensionality and small sample size of many experiments challenge conventional statistical approaches, including those aiming to control the false discovery rate (FDR). Motivated by limitations in reproducibility a…
▽ More
Peptide microarrays have emerged as a powerful technology in immunoproteomics as they provide a tool to measure the abundance of different antibodies in patient serum samples. The high dimensionality and small sample size of many experiments challenge conventional statistical approaches, including those aiming to control the false discovery rate (FDR). Motivated by limitations in reproducibility and power of current methods, we advance an empirical Bayesian tool that computes local false discovery rate statistics and local false sign rate statistics when provided with data on estimated effects and estimated standard errors from all the measured peptides. As the name suggests, the \verb+MixTwice+ tool involves the estimation of two mixing distributions, one on underlying effects and one on underlying variance parameters. Constrained optimization techniques provide for model fitting of mixing distributions under weak shape constraints (unimodality of the effect distribution). Numerical experiments show that \verb+MixTwice+ can accurately estimate generative parameters and powerfully identify non-null peptides. In a peptide array study of rheumatoid arthritis (RA), \verb+MixTwice+ recovers meaningful peptide markers in one case where the signal is weak, and has strong reproducibility properties in one case where the signal is strong. \verb+MixTwice+ is available as an R software package. \href{https://github.com/wiscstatman/MixTwice}{https://github.com/wiscstatman/MixTwice}
△ Less
Submitted 14 November, 2020;
originally announced November 2020.
-
Interpretable Neural Networks for Panel Data Analysis in Economics
Authors:
Yucheng Yang,
Zhong Zheng,
Weinan E
Abstract:
The lack of interpretability and transparency are preventing economists from using advanced tools like neural networks in their empirical research. In this paper, we propose a class of interpretable neural network models that can achieve both high prediction accuracy and interpretability. The model can be written as a simple function of a regularized number of interpretable features, which are out…
▽ More
The lack of interpretability and transparency are preventing economists from using advanced tools like neural networks in their empirical research. In this paper, we propose a class of interpretable neural network models that can achieve both high prediction accuracy and interpretability. The model can be written as a simple function of a regularized number of interpretable features, which are outcomes of interpretable functions encoded in the neural network. Researchers can design different forms of interpretable functions based on the nature of their tasks. In particular, we encode a class of interpretable functions named persistent change filters in the neural network to study time series cross-sectional data. We apply the model to predicting individual's monthly employment status using high-dimensional administrative data. We achieve an accuracy of 94.5% in the test set, which is comparable to the best performed conventional machine learning methods. Furthermore, the interpretability of the model allows us to understand the mechanism that underlies the prediction: an individual's employment status is closely related to whether she pays different types of insurances. Our work is a useful step towards overcoming the black-box problem of neural networks, and provide a new tool for economists to study administrative and proprietary big data.
△ Less
Submitted 29 November, 2020; v1 submitted 11 October, 2020;
originally announced October 2020.
-
Adversarial Attack on Large Scale Graph
Authors:
**tang Li,
Tao Xie,
Liang Chen,
Fenfang Xie,
Xiangnan He,
Zibin Zheng
Abstract:
Recent studies have shown that graph neural networks (GNNs) are vulnerable against perturbations due to lack of robustness and can therefore be easily fooled. Currently, most works on attacking GNNs are mainly using gradient information to guide the attack and achieve outstanding performance. However, the high complexity of time and space makes them unmanageable for large scale graphs and becomes…
▽ More
Recent studies have shown that graph neural networks (GNNs) are vulnerable against perturbations due to lack of robustness and can therefore be easily fooled. Currently, most works on attacking GNNs are mainly using gradient information to guide the attack and achieve outstanding performance. However, the high complexity of time and space makes them unmanageable for large scale graphs and becomes the major bottleneck that prevents the practical usage. We argue that the main reason is that they have to use the whole graph for attacks, resulting in the increasing time and space complexity as the data scale grows. In this work, we propose an efficient Simplified Gradient-based Attack (SGA) method to bridge this gap. SGA can cause the GNNs to misclassify specific target nodes through a multi-stage attack framework, which needs only a much smaller subgraph. In addition, we present a practical metric named Degree Assortativity Change (DAC) to measure the impacts of adversarial attacks on graph data. We evaluate our attack method on four real-world graph networks by attacking several commonly used GNNs. The experimental results demonstrate that SGA can achieve significant time and memory efficiency improvements while maintaining competitive attack performance compared to state-of-art attack techniques. Codes are available via: https://github.com/EdisonLeeeee/SGAttack.
△ Less
Submitted 6 May, 2021; v1 submitted 7 September, 2020;
originally announced September 2020.
-
A Deep Prediction Network for Understanding Advertiser Intent and Satisfaction
Authors:
Liyi Guo,
Rui Lu,
Haoqi Zhang,
Junqi **,
Zhenzhe Zheng,
Fan Wu,
** Li,
Haiyang Xu,
Han Li,
Wenkai Lu,
Jian Xu,
Kun Gai
Abstract:
For e-commerce platforms such as Taobao and Amazon, advertisers play an important role in the entire digital ecosystem: their behaviors explicitly influence users' browsing and shop** experience; more importantly, advertiser's expenditure on advertising constitutes a primary source of platform revenue. Therefore, providing better services for advertisers is essential for the long-term prosperity…
▽ More
For e-commerce platforms such as Taobao and Amazon, advertisers play an important role in the entire digital ecosystem: their behaviors explicitly influence users' browsing and shop** experience; more importantly, advertiser's expenditure on advertising constitutes a primary source of platform revenue. Therefore, providing better services for advertisers is essential for the long-term prosperity for e-commerce platforms. To achieve this goal, the ad platform needs to have an in-depth understanding of advertisers in terms of both their marketing intents and satisfaction over the advertising performance, based on which further optimization could be carried out to service the advertisers in the correct direction. In this paper, we propose a novel Deep Satisfaction Prediction Network (DSPN), which models advertiser intent and satisfaction simultaneously. It employs a two-stage network structure where advertiser intent vector and satisfaction are jointly learned by considering the features of advertiser's action information and advertising performance indicators. Experiments on an Alibaba advertisement dataset and online evaluations show that our proposed DSPN outperforms state-of-the-art baselines and has stable performance in terms of AUC in the online environment. Further analyses show that DSPN not only predicts advertisers' satisfaction accurately but also learns an explainable advertiser intent, revealing the opportunities to optimize the advertising performance further.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
Temporal Calibrated Regularization for Robust Noisy Label Learning
Authors:
Dongxian Wu,
Yisen Wang,
Zhuobin Zheng,
Shu-tao Xia
Abstract:
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets. However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality (i.e., having noisy labels). Training on these noisy labeled datasets may adversely deteriorate their generalization performance. Existing methods eithe…
▽ More
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets. However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality (i.e., having noisy labels). Training on these noisy labeled datasets may adversely deteriorate their generalization performance. Existing methods either rely on complex training stage division or bring too much computation for marginal performance improvement. In this paper, we propose a Temporal Calibrated Regularization (TCR), in which we utilize the original labels and the predictions in the previous epoch together to make DNN inherit the simple pattern it has learned with little overhead. We conduct extensive experiments on various neural network architectures and datasets, and find that it consistently enhances the robustness of DNNs to label noise.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising
Authors:
Xiaotian Hao,
Zhaoqing Peng,
Yi Ma,
Guan Wang,
Junqi **,
Jianye Hao,
Shan Chen,
Rongquan Bai,
Mingzhou Xie,
Miao Xu,
Zhenzhe Zheng,
Chuan Yu,
Han Li,
Jian Xu,
Kun Gai
Abstract:
In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing adver…
▽ More
In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing advertising systems mainly focus on the immediate revenue with single ad exposures, ignoring the contribution of each exposure to the final conversion, thus usually falls into suboptimal solutions. In this paper, we formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space while ensuring the solution quality. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach. Extensive offline and online experiments show the superior performance of our approaches over state-of-the-art baselines in terms of cumulative revenue.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification
Authors:
Tianyi Lin,
Zeyu Zheng,
Elynn Y. Chen,
Marco Cuturi,
Michael I. Jordan
Abstract:
Optimal transport (OT) distances are increasingly used as loss functions for statistical inference, notably in the learning of generative models or supervised learning. Yet, the behavior of minimum Wasserstein estimators is poorly understood, notably in high-dimensional regimes or under model misspecification. In this work we adopt the viewpoint of projection robust (PR) OT, which seeks to maximiz…
▽ More
Optimal transport (OT) distances are increasingly used as loss functions for statistical inference, notably in the learning of generative models or supervised learning. Yet, the behavior of minimum Wasserstein estimators is poorly understood, notably in high-dimensional regimes or under model misspecification. In this work we adopt the viewpoint of projection robust (PR) OT, which seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected. Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances, complementing and improving previous literature that has been restricted to one-dimensional and well-specified cases. Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces. Our complexity bounds can help explain why both PRW and IPRW distances outperform Wasserstein distances empirically in high-dimensional inference tasks. Finally, we consider parametric inference using the PRW distance. We provide an asymptotic guarantee of two types of minimum PRW estimators and formulate a central limit theorem for max-sliced Wasserstein estimator under model misspecification. To enable our analysis on PRW with projection dimension larger than one, we devise a novel combination of variational analysis and statistical theory.
△ Less
Submitted 17 July, 2021; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Scalable Interpretable Learning for Multi-Response Error-in-Variables Regression
Authors:
J. Wu,
Z. Zheng,
Y. Li,
Y. Zhang
Abstract:
Corrupted data sets containing noisy or missing observations are prevalent in various contemporary applications such as economics, finance and bioinformatics. Despite the recent methodological and algorithmic advances in high-dimensional multi-response regression, how to achieve scalable and interpretable estimation under contaminated covariates is unclear. In this paper, we develop a new methodol…
▽ More
Corrupted data sets containing noisy or missing observations are prevalent in various contemporary applications such as economics, finance and bioinformatics. Despite the recent methodological and algorithmic advances in high-dimensional multi-response regression, how to achieve scalable and interpretable estimation under contaminated covariates is unclear. In this paper, we develop a new methodology called convex conditioned sequential sparse learning (COSS) for error-in-variables multi-response regression under both additive measurement errors and random missing data. It combines the strengths of the recently developed sequential sparse factor regression and the nearest positive semi-definite matrix projection, thus enjoying stepwise convexity and scalability in large-scale association analyses. Comprehensive theoretical guarantees are provided and we demonstrate the effectiveness of the proposed methodology through numerical studies.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
An Uncoupled Training Architecture for Large Graph Learning
Authors:
Dalong Yang,
Chuan Chen,
Youhao Zheng,
Zibin Zheng,
Shih-wei Liao
Abstract:
Graph Convolutional Network (GCN) has been widely used in graph learning tasks. However, GCN-based models (GCNs) is an inherently coupled training framework repetitively conducting the complex neighboring aggregation, which leads to the limitation of flexibility in processing large-scale graph. With the depth of layers increases, the computational and memory cost of GCNs grow explosively due to th…
▽ More
Graph Convolutional Network (GCN) has been widely used in graph learning tasks. However, GCN-based models (GCNs) is an inherently coupled training framework repetitively conducting the complex neighboring aggregation, which leads to the limitation of flexibility in processing large-scale graph. With the depth of layers increases, the computational and memory cost of GCNs grow explosively due to the recursive neighborhood expansion. To tackle these issues, we present Node2Grids, a flexible uncoupled training framework that leverages the independent mapped data for obtaining the embedding. Instead of directly processing the coupled nodes as GCNs, Node2Grids supports a more efficacious method in practice, map** the coupled graph data into the independent grid-like data which can be fed into the efficient Convolutional Neural Network (CNN). This simple but valid strategy significantly saves memory and computational resource while achieving comparable results with the leading GCN-based models. Specifically, by ranking each node's influence through degree, Node2Grids selects the most influential first-order as well as second-order neighbors with central node fusion information to construct the grid-like data. For further improving the efficiency of downstream tasks, a simple CNN-based neural network is employed to capture the significant information from the mapped grid-like data. Moreover, the grid-level attention mechanism is implemented, which enables implicitly specifying the different weights for neighboring nodes with different influences. In addition to the typical transductive and inductive learning tasks, we also verify our framework on million-scale graphs to demonstrate the superiority of the proposed Node2Grids model against the state-of-the-art GCN-based approaches.
△ Less
Submitted 21 July, 2020; v1 submitted 21 March, 2020;
originally announced March 2020.
-
Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix
Authors:
Kun Chen,
Ruipeng Dong,
Wanwan Xu,
Zemin Zheng
Abstract:
The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association netwo…
▽ More
The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
A Survey of Adversarial Learning on Graphs
Authors:
Liang Chen,
**tang Li,
Jiaying Peng,
Tao Xie,
Zengxu Cao,
Kun Xu,
Xiangnan He,
Zibin Zheng,
Bingzhe Wu
Abstract:
Deep learning models on graphs have achieved remarkable performance in various graph analysis tasks, e.g., node classification, link prediction, and graph clustering. However, they expose uncertainty and unreliability against the well-designed inputs, i.e., adversarial examples. Accordingly, a line of studies has emerged for both attack and defense addressed in different graph analysis tasks, lead…
▽ More
Deep learning models on graphs have achieved remarkable performance in various graph analysis tasks, e.g., node classification, link prediction, and graph clustering. However, they expose uncertainty and unreliability against the well-designed inputs, i.e., adversarial examples. Accordingly, a line of studies has emerged for both attack and defense addressed in different graph analysis tasks, leading to the arms race in graph adversarial learning. Despite the booming works, there still lacks a unified problem definition and a comprehensive review. To bridge this gap, we investigate and summarize the existing works on graph adversarial learning tasks systemically. Specifically, we survey and unify the existing works w.r.t. attack and defense in graph analysis tasks, and give appropriate definitions and taxonomies at the same time. Besides, we emphasize the importance of related evaluation metrics, investigate and summarize them comprehensively. Hopefully, our works can provide a comprehensive overview and offer insights for the relevant researchers. Latest advances in graph adversarial learning are summarized in our GitHub repository https://github.com/EdisonLeeeee/Graph-Adversarial-Learning.
△ Less
Submitted 5 April, 2022; v1 submitted 10 March, 2020;
originally announced March 2020.
-
Distributed Optimization over Block-Cyclic Data
Authors:
Yucheng Ding,
Chaoyue Niu,
Yikai Yan,
Zhenzhe Zheng,
Fan Wu,
Guihai Chen,
Shaojie Tang,
Rongfei Jia
Abstract:
We consider practical data characteristics underlying federated learning, where unbalanced and non-i.i.d. data from clients have a block-cyclic structure: each cycle contains several blocks, and each client's training data follow block-specific and non-i.i.d. distributions. Such a data structure would introduce client and block biases during the collaborative training: the single global model woul…
▽ More
We consider practical data characteristics underlying federated learning, where unbalanced and non-i.i.d. data from clients have a block-cyclic structure: each cycle contains several blocks, and each client's training data follow block-specific and non-i.i.d. distributions. Such a data structure would introduce client and block biases during the collaborative training: the single global model would be biased towards the client or block specific data. To overcome the biases, we propose two new distributed optimization algorithms called multi-model parallel SGD (MM-PSGD) and multi-chain parallel SGD (MC-PSGD) with a convergence rate of $O(1/\sqrt{NT})$, achieving a linear speedup with respect to the total number of clients. In particular, MM-PSGD adopts the block-mixed training strategy, while MC-PSGD further adds the block-separate training strategy. Both algorithms create a specific predictor for each block by averaging and comparing the historical global models generated in this block from different cycles. We extensively evaluate our algorithms over the CIFAR-10 dataset. Evaluation results demonstrate that our algorithms significantly outperform the conventional federated averaging algorithm in terms of test accuracy, and also preserve robustness for the variance of critical parameters.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability
Authors:
Yikai Yan,
Chaoyue Niu,
Yucheng Ding,
Zhenzhe Zheng,
Fan Wu,
Guihai Chen,
Shaojie Tang,
Zhihua Wu
Abstract:
Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue when deploying federated learning in mobile environments: intermittent client availability, where the set of eligible clients may change during the training process. Such in…
▽ More
Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue when deploying federated learning in mobile environments: intermittent client availability, where the set of eligible clients may change during the training process. Such intermittent client availability would seriously deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). Thus, we propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(E^{1/2}/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement FedLaAvg along with several baselines and evaluate them over the benchmarking MNIST and Sentiment140 datasets. The evaluation results demonstrate that FedLaAvg achieves more stable training than FedAvg in both convex and non-convex settings and indeed reaches a sublinear speedup.
△ Less
Submitted 21 December, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Prophet: Proactive Candidate-Selection for Federated Learning by Predicting the Qualities of Training and Reporting Phases
Authors:
Huawei Huang,
Kangying Lin,
Song Guo,
Pan Zhou,
Zibin Zheng
Abstract:
Although the challenge of the device connection is much relieved in 5G networks, the training latency is still an obstacle preventing Federated Learning (FL) from being largely adopted. One of the most fundamental problems that lead to large latency is the bad candidate-selection for FL. In the dynamic environment, the mobile devices selected by the existing reactive candidate-selection algorithms…
▽ More
Although the challenge of the device connection is much relieved in 5G networks, the training latency is still an obstacle preventing Federated Learning (FL) from being largely adopted. One of the most fundamental problems that lead to large latency is the bad candidate-selection for FL. In the dynamic environment, the mobile devices selected by the existing reactive candidate-selection algorithms very possibly fail to complete the training and reporting phases of FL, because the FL parameter server only knows the currently-observed resources of all candidates. To this end, we study the proactive candidate-selection for FL in this paper. We first let each candidate device predict the qualities of both its training and reporting phases locally using LSTM. Then, the proposed candidateselection algorithm is implemented by the Deep Reinforcement Learning (DRL) framework. Finally, the real-world trace-driven experiments prove that the proposed approach outperforms the existing reactive algorithms
△ Less
Submitted 18 May, 2020; v1 submitted 3 February, 2020;
originally announced February 2020.
-
How Fast You Can Actually Fly: A Comparative Investigation of Flight Airborne Time in China and the U.S
Authors:
Ke Liu,
Zhe Zheng,
Bo Zou,
Mark Hansen
Abstract:
Actual airborne time (AAT) is the time between wheels-off and wheels-on of a flight. Understanding the behavior of AAT is increasingly important given the ever growing demand for air travel and flight delays becoming more rampant. As no research on AAT exists, this paper performs the first empirical analysis of AAT behavior, comparatively for the U.S. and China. The focus is on how AAT is affected…
▽ More
Actual airborne time (AAT) is the time between wheels-off and wheels-on of a flight. Understanding the behavior of AAT is increasingly important given the ever growing demand for air travel and flight delays becoming more rampant. As no research on AAT exists, this paper performs the first empirical analysis of AAT behavior, comparatively for the U.S. and China. The focus is on how AAT is affected by scheduled block time (SBT), origin-destination (OD) distance, and the possible pressure to reduce AAT from other parts of flight operations. Multiple econometric models are developed. The estimation results show that in both countries AAT is highly correlated with SBT and OD distance. Flights in the U.S. are faster than in China. On the other hand, facing ground delay prior to takeoff, a flight has limited capability to speed up. The pressure from short turnaround time after landing to reduce AAT is immaterial. Sensitivity analysis of AAT to flight length and aircraft utilization is further conducted. Given the more abundant airspace, flexible routing networks, and efficient ATFM procedures, a counterfactual that the AAT behavior in the U.S. were adopted in China is examined. We find that by doing so significant efficiency gains could be achieved in the Chinese air traffic system. On average, 11.8 minutes of AAT per flight would be saved, coming from both reduction in SBT and reduction in AAT relative to the new SBT. Systemwide fuel saving would amount to over 300 million gallons with direct airline operating cost saving of nearly $1.3 billion nationwide in 2016.
△ Less
Submitted 21 January, 2020;
originally announced January 2020.
-
Towards Better Forecasting by Fusing Near and Distant Future Visions
Authors:
Jiezhu Cheng,
Kaizhu Huang,
Zibin Zheng
Abstract:
Multivariate time series forecasting is an important yet challenging problem in machine learning. Most existing approaches only forecast the series value of one future moment, ignoring the interactions between predictions of future moments with different temporal distance. Such a deficiency probably prevents the model from getting enough information about the future, thus limiting the forecasting…
▽ More
Multivariate time series forecasting is an important yet challenging problem in machine learning. Most existing approaches only forecast the series value of one future moment, ignoring the interactions between predictions of future moments with different temporal distance. Such a deficiency probably prevents the model from getting enough information about the future, thus limiting the forecasting accuracy. To address this problem, we propose Multi-Level Construal Neural Network (MLCNN), a novel multi-task deep learning framework. Inspired by the Construal Level Theory of psychology, this model aims to improve the predictive performance by fusing forecasting information (i.e., future visions) of different future time. We first use the Convolution Neural Network to extract multi-level abstract representations of the raw data for near and distant future predictions. We then model the interplay between multiple predictive tasks and fuse their future visions through a modified Encoder-Decoder architecture. Finally, we combine traditional Autoregression model with the neural network to solve the scale insensitive problem. Experiments on three real-world datasets show that our method achieves statistically significant improvements compared to the most state-of-the-art baseline methods, with average 4.59% reduction on RMSE metric and average 6.87% reduction on MAE metric.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
Data Poisoning Attacks on Neighborhood-based Recommender Systems
Authors:
Liang Chen,
Yangjun Xu,
Fenfang Xie,
Min Huang,
Zibin Zheng
Abstract:
Nowadays, collaborative filtering recommender systems have been widely deployed in many commercial companies to make profit. Neighbourhood-based collaborative filtering is common and effective. To date, despite its effectiveness, there has been little effort to explore their robustness and the impact of data poisoning attacks on their performance. Can the neighbourhood-based recommender systems be…
▽ More
Nowadays, collaborative filtering recommender systems have been widely deployed in many commercial companies to make profit. Neighbourhood-based collaborative filtering is common and effective. To date, despite its effectiveness, there has been little effort to explore their robustness and the impact of data poisoning attacks on their performance. Can the neighbourhood-based recommender systems be easily fooled? To this end, we shed light on the robustness of neighbourhood-based recommender systems and propose a novel data poisoning attack framework encoding the purpose of attack and constraint against them. We firstly illustrate how to calculate the optimal data poisoning attack, namely UNAttack. We inject a few well-designed fake users into the recommender systems such that target items will be recommended to as many normal users as possible. Extensive experiments are conducted on three real-world datasets to validate the effectiveness and the transferability of our proposed method. Besides, some interesting phenomenons can be found. For example, 1) neighbourhood-based recommender systems with Euclidean Distance-based similarity have strong robustness. 2) the fake users can be transferred to attack the state-of-the-art collaborative filtering recommender systems such as Neural Collaborative Filtering and Bayesian Personalized Ranking Matrix Factorization.
△ Less
Submitted 1 December, 2019;
originally announced December 2019.
-
Robust DCD-Based Recursive Adaptive Algorithms
Authors:
Y. Yu,
L. Lu,
Z. Zheng,
W. Wang,
Y. Zakharov,
R. C. de Lamare
Abstract:
The dichotomous coordinate descent (DCD) algorithm has been successfully used for significant reduction in the complexity of recursive least squares (RLS) algorithms. In this work, we generalize the application of the DCD algorithm to RLS adaptive filtering in impulsive noise scenarios and derive a unified update formula. By employing different robust strategies against impulsive noise, we develop…
▽ More
The dichotomous coordinate descent (DCD) algorithm has been successfully used for significant reduction in the complexity of recursive least squares (RLS) algorithms. In this work, we generalize the application of the DCD algorithm to RLS adaptive filtering in impulsive noise scenarios and derive a unified update formula. By employing different robust strategies against impulsive noise, we develop novel computationally efficient DCD-based robust recursive algorithms. Furthermore, to equip the proposed algorithms with the ability to track abrupt changes in unknown systems, a simple variable forgetting factor mechanism is also developed. Simulation results for channel identification scenarios in impulsive noise demonstrate the effectiveness of the proposed algorithms.
△ Less
Submitted 17 August, 2019;
originally announced August 2019.
-
T-EDGE: Temporal WEighted MultiDiGraph Embedding for Ethereum Transaction Network Analysis
Authors:
Jia**g Wu,
Dan Lin,
Zibin Zheng,
Qi Yuan
Abstract:
Recently, graph embedding techniques have been widely used in the analysis of various networks, but most of the existing embedding methods omit the network dynamics and the multiplicity of edges, so it is difficult to accurately describe the detailed characteristics of the transaction networks. Ethereum is a blockchain-based platform supporting smart contracts. The open nature of blockchain makes…
▽ More
Recently, graph embedding techniques have been widely used in the analysis of various networks, but most of the existing embedding methods omit the network dynamics and the multiplicity of edges, so it is difficult to accurately describe the detailed characteristics of the transaction networks. Ethereum is a blockchain-based platform supporting smart contracts. The open nature of blockchain makes the transaction data on Ethereum completely public, and also brings unprecedented opportunities for the transaction network analysis. By taking the realistic rules and features of transaction networks into consideration, we first model the Ethereum transaction network as a Temporal Weighted Multidigraph (TWMDG), where each node is a unique Ethereum account and each edge represents a transaction weighted by amount and assigned with timestamp. Then we define the problem of Temporal Weighted Multidigraph Embedding (T-EDGE) by incorporating both temporal and weighted information of the edges, the purpose being to capture more comprehensive properties of dynamic transaction networks. To evaluate the effectiveness of the proposed embedding method, we conduct experiments of node classification on real-world transaction data collected from Ethereum. Experimental results demonstrate that T-EDGE outperforms baseline embedding methods, indicating that time-dependent walks and multiplicity characteristic of edges are informative and essential for time-sensitive transaction networks.
△ Less
Submitted 31 July, 2020; v1 submitted 13 May, 2019;
originally announced May 2019.
-
Cooperative Training of Fast Thinking Initializer and Slow Thinking Solver for Conditional Learning
Authors:
Jianwen Xie,
Zilong Zheng,
Xiaolin Fang,
Song-Chun Zhu,
Ying Nian Wu
Abstract:
This paper studies the problem of learning the conditional distribution of a high-dimensional output given an input, where the output and input may belong to two different domains, e.g., the output is a photo image and the input is a sketch image. We solve this problem by cooperative training of a fast thinking initializer and slow thinking solver. The initializer generates the output directly by…
▽ More
This paper studies the problem of learning the conditional distribution of a high-dimensional output given an input, where the output and input may belong to two different domains, e.g., the output is a photo image and the input is a sketch image. We solve this problem by cooperative training of a fast thinking initializer and slow thinking solver. The initializer generates the output directly by a non-linear transformation of the input as well as a noise vector that accounts for latent variability in the output. The slow thinking solver learns an objective function in the form of a conditional energy function, so that the output can be generated by optimizing the objective function, or more rigorously by sampling from the conditional energy-based model. We propose to learn the two models jointly, where the fast thinking initializer serves to initialize the sampling of the slow thinking solver, and the solver refines the initial output by an iterative algorithm. The solver learns from the difference between the refined output and the observed output, while the initializer learns from how the solver refines its initial output. We demonstrate the effectiveness of the proposed method on various conditional learning tasks, e.g., class-to-image generation, image-to-image translation, and image recovery. The advantage of our method over GAN-based methods is that our method is equipped with a slow thinking process that refines the solution guided by a learned objective function.
△ Less
Submitted 7 April, 2021; v1 submitted 7 February, 2019;
originally announced February 2019.
-
Learning Dynamic Generator Model by Alternating Back-Propagation Through Time
Authors:
Jianwen Xie,
Ruiqi Gao,
Zilong Zheng,
Song-Chun Zhu,
Ying Nian Wu
Abstract:
This paper studies the dynamic generator model for spatial-temporal processes such as dynamic textures and action sequences in video data. In this model, each time frame of the video sequence is generated by a generator model, which is a non-linear transformation of a latent state vector, where the non-linear transformation is parametrized by a top-down neural network. The sequence of latent state…
▽ More
This paper studies the dynamic generator model for spatial-temporal processes such as dynamic textures and action sequences in video data. In this model, each time frame of the video sequence is generated by a generator model, which is a non-linear transformation of a latent state vector, where the non-linear transformation is parametrized by a top-down neural network. The sequence of latent state vectors follows a non-linear auto-regressive model, where the state vector of the next frame is a non-linear transformation of the state vector of the current frame as well as an independent noise vector that provides randomness in the transition. The non-linear transformation of this transition model can be parametrized by a feedforward neural network. We show that this model can be learned by an alternating back-propagation through time algorithm that iteratively samples the noise vectors and updates the parameters in the transition model and the generator model. We show that our training method can learn realistic models for dynamic textures and action patterns.
△ Less
Submitted 26 December, 2018;
originally announced December 2018.
-
Collaborative Deep Learning Across Multiple Data Centers
Authors:
Kele Xu,
Haibo Mi,
Dawei Feng,
Huaimin Wang,
Chuan Chen,
Zibin Zheng,
Xu Lan
Abstract:
Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practice, however, it is often infeasible to transfer all data to a centralized data center due to not only bandwidth limitation but also the constraints of privacy regulations. Model avera…
▽ More
Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practice, however, it is often infeasible to transfer all data to a centralized data center due to not only bandwidth limitation but also the constraints of privacy regulations. Model averaging is a conventional choice for data parallelized training, but its ineffectiveness is claimed by previous studies as deep neural networks are often non-convex. In this paper, we argue that model averaging can be effective in the decentralized environment by using two strategies, namely, the cyclical learning rate and the increased number of epochs for local model training. With the two strategies, we show that model averaging can provide competitive performance in the decentralized mode compared to the data-centralized one. In a practical environment with multiple data centers, we conduct extensive experiments using state-of-the-art deep network architectures on different types of data. Results demonstrate the effectiveness and robustness of the proposed method.
△ Less
Submitted 16 October, 2018;
originally announced October 2018.