-
Convergence analysis of controlled particle systems arising in deep learning: from finite to infinite sample size
Authors:
Huafu Liao,
Alpár R. Mészáros,
Chenchen Mou,
Chao Zhou
Abstract:
This paper deals with a class of neural SDEs and studies the limiting behavior of the associated sampled optimal control problems as the sample size grows to infinity. The neural SDEs with N samples can be linked to the N-particle systems with centralized control. We analyze the Hamilton--Jacobi--Bellman equation corresponding to the N-particle system and establish regularity results which are uni…
▽ More
This paper deals with a class of neural SDEs and studies the limiting behavior of the associated sampled optimal control problems as the sample size grows to infinity. The neural SDEs with N samples can be linked to the N-particle systems with centralized control. We analyze the Hamilton--Jacobi--Bellman equation corresponding to the N-particle system and establish regularity results which are uniform in N. The uniform regularity estimates are obtained by the stochastic maximum principle and the analysis of a backward stochastic Riccati equation. Using these uniform regularity results, we show the convergence of the minima of objective functionals and optimal parameters of the neural SDEs as the sample size N tends to infinity. The limiting objects can be identified with suitable functions defined on the Wasserstein space of Borel probability measures. Furthermore, quantitative algebraic convergence rates are also obtained.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Federated Multilinear Principal Component Analysis with Applications in Prognostics
Authors:
Chengyu Zhou,
Yuqi Su,
Tangbin Xia,
Xiaolei Fang
Abstract:
Multilinear Principal Component Analysis (MPCA) is a widely utilized method for the dimension reduction of tensor data. However, the integration of MPCA into federated learning remains unexplored in existing research. To tackle this gap, this article proposes a Federated Multilinear Principal Component Analysis (FMPCA) method, which enables multiple users to collaboratively reduce the dimension of…
▽ More
Multilinear Principal Component Analysis (MPCA) is a widely utilized method for the dimension reduction of tensor data. However, the integration of MPCA into federated learning remains unexplored in existing research. To tackle this gap, this article proposes a Federated Multilinear Principal Component Analysis (FMPCA) method, which enables multiple users to collaboratively reduce the dimension of their tensor data while kee** each user's data local and confidential. The proposed FMPCA method is guaranteed to have the same performance as traditional MPCA. An application of the proposed FMPCA in industrial prognostics is also demonstrated. Simulated data and a real-world data set are used to validate the performance of the proposed method.
△ Less
Submitted 28 April, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Efficiently analyzing large patient registries with Bayesian joint models for longitudinal and time-to-event data
Authors:
P. Miranda Afonso,
D. Rizopoulos,
A. K. Palipana,
G. C. Zhou,
C. Brokamp,
R. D. Szczesniak,
E-R. Andrinopoulou
Abstract:
The joint modeling of longitudinal and time-to-event outcomes has become a popular tool in follow-up studies. However, fitting Bayesian joint models to large datasets, such as patient registries, can require extended computing times. To speed up sampling, we divided a patient registry dataset into subsamples, analyzed them in parallel, and combined the resulting Markov chain Monte Carlo draws into…
▽ More
The joint modeling of longitudinal and time-to-event outcomes has become a popular tool in follow-up studies. However, fitting Bayesian joint models to large datasets, such as patient registries, can require extended computing times. To speed up sampling, we divided a patient registry dataset into subsamples, analyzed them in parallel, and combined the resulting Markov chain Monte Carlo draws into a consensus distribution. We used a simulation study to investigate how different consensus strategies perform with joint models. In particular, we compared grou** all draws together with using equal- and precision-weighted averages. We considered scenarios reflecting different sample sizes, numbers of data splits, and processor characteristics. Parallelization of the sampling process substantially decreased the time required to run the model. We found that the weighted-average consensus distributions for large sample sizes were nearly identical to the target posterior distribution. The proposed algorithm has been made available in an R package for joint models, JMbayes2. This work was motivated by the clinical interest in investigating the association between ppFEV1, a commonly measured marker of lung function, and the risk of lung transplant or death, using data from the US Cystic Fibrosis Foundation Patient Registry (35,153 individuals with 372,366 years of cumulative follow-up). Splitting the registry into five subsamples resulted in an 85\% decrease in computing time, from 9.22 to 1.39 hours. Splitting the data and finding a consensus distribution by precision-weighted averaging proved to be a computationally efficient and robust approach to handling large datasets under the joint modeling framework.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Graphical lasso for extremes
Authors:
Phyllis Wan,
Chen Zhou
Abstract:
In this paper we estimate the sparse dependence structure in the tail region of a multivariate random vector, potentially of high dimension. The tail dependence is modeled via a graphical model for extremes embedded in the Huesler-Reiss distribution (Engelke and Hitz, 2020). We propose the extreme graphical lasso procedure to estimate the sparsity in the tail dependence, similar to the Gaussian gr…
▽ More
In this paper we estimate the sparse dependence structure in the tail region of a multivariate random vector, potentially of high dimension. The tail dependence is modeled via a graphical model for extremes embedded in the Huesler-Reiss distribution (Engelke and Hitz, 2020). We propose the extreme graphical lasso procedure to estimate the sparsity in the tail dependence, similar to the Gaussian graphical lasso method in high dimensional statistics. We prove its consistency in identifying the graph structure and estimating model parameters. The efficiency and accuracy of the proposed method are illustrated in simulated and real examples.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Estimating probabilities of multivariate failure sets based on pairwise tail dependence coefficients
Authors:
Anna Kiriliouk,
Chen Zhou
Abstract:
An important problem in extreme-value theory is the estimation of the probability that a high-dimensional random vector falls into a given extreme failure set. This paper provides a parametric approach to this problem, based on a generalization of the tail pairwise dependence matrix (TPDM). The TPDM gives a partial summary of tail dependence for all pairs of components of the random vector. We pro…
▽ More
An important problem in extreme-value theory is the estimation of the probability that a high-dimensional random vector falls into a given extreme failure set. This paper provides a parametric approach to this problem, based on a generalization of the tail pairwise dependence matrix (TPDM). The TPDM gives a partial summary of tail dependence for all pairs of components of the random vector. We propose an algorithm to obtain an approximate completely positive decomposition of the TPDM. The decomposition is easy to compute and applicable to moderate to high dimensions. Based on the decomposition, we obtain parameters estimates of a max-linear model whose TPDM is equal to that of the original random vector. We apply the proposed decomposition algorithm to industry portfolio returns and maximal wind speeds to illustrate its applicability.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
A Supervised Tensor Dimension Reduction-Based Prognostics Model for Applications with Incomplete Imaging Data
Authors:
Chengyu Zhou,
Xiaolei Fang
Abstract:
This paper proposes a supervised dimension reduction methodology for tensor data which has two advantages over most image-based prognostic models. First, the model does not require tensor data to be complete which expands its application to incomplete data. Second, it utilizes time-to-failure (TTF) to supervise the extraction of low-dimensional features which makes the extracted features more effe…
▽ More
This paper proposes a supervised dimension reduction methodology for tensor data which has two advantages over most image-based prognostic models. First, the model does not require tensor data to be complete which expands its application to incomplete data. Second, it utilizes time-to-failure (TTF) to supervise the extraction of low-dimensional features which makes the extracted features more effective for the subsequent prognostic. Besides, an optimization algorithm is proposed for parameter estimation and closed-form solutions are derived under certain distributions.
△ Less
Submitted 4 June, 2023; v1 submitted 22 July, 2022;
originally announced July 2022.
-
Adapting the Hill estimator to distributed inference: dealing with the bias
Authors:
Liujun Chen,
Deyuan Li,
Chen Zhou
Abstract:
The distributed Hill estimator is a divide-and-conquer algorithm for estimating the extreme value index when data are stored in multiple machines. In applications, estimates based on the distributed Hill estimator can be sensitive to the choice of the number of the exceedance ratios used in each machine. Even when choosing the number at a low level, a high asymptotic bias may arise. We overcome th…
▽ More
The distributed Hill estimator is a divide-and-conquer algorithm for estimating the extreme value index when data are stored in multiple machines. In applications, estimates based on the distributed Hill estimator can be sensitive to the choice of the number of the exceedance ratios used in each machine. Even when choosing the number at a low level, a high asymptotic bias may arise. We overcome this potential drawback by designing a bias correction procedure for the distributed Hill estimator, which adheres to the setup of distributed inference. The asymptotically unbiased distributed estimator we obtained, on the one hand, is applicable to distributed stored data, on the other hand, inherits all known advantages of bias correction methods in extreme value statistics.
△ Less
Submitted 19 December, 2021;
originally announced December 2021.
-
Variational autoencoders in the presence of low-dimensional data: landscape and implicit bias
Authors:
Frederic Koehler,
Viraj Mehta,
Chenghui Zhou,
Andrej Risteski
Abstract:
Variational Autoencoders are one of the most commonly used generative models, particularly for image data. A prominent difficulty in training VAEs is data that is supported on a lower-dimensional manifold. Recent work by Dai and Wipf (2020) proposes a two-stage training algorithm for VAEs, based on a conjecture that in standard VAE training the generator will converge to a solution with 0 variance…
▽ More
Variational Autoencoders are one of the most commonly used generative models, particularly for image data. A prominent difficulty in training VAEs is data that is supported on a lower-dimensional manifold. Recent work by Dai and Wipf (2020) proposes a two-stage training algorithm for VAEs, based on a conjecture that in standard VAE training the generator will converge to a solution with 0 variance which is correctly supported on the ground truth manifold. They gave partial support for that conjecture by showing that some optima of the VAE loss do satisfy this property, but did not analyze the training dynamics. In this paper, we show that for linear encoders/decoders, the conjecture is true-that is the VAE training does recover a generator with support equal to the ground truth manifold-and does so due to an implicit bias of gradient descent rather than merely the VAE loss itself. In the nonlinear case, we show that VAE training frequently learns a higher-dimensional manifold which is a superset of the ground truth manifold.
△ Less
Submitted 17 May, 2022; v1 submitted 13 December, 2021;
originally announced December 2021.
-
RIO: Rotation-equivariance supervised learning of robust inertial odometry
Authors:
Caifa Zhou,
Xiya Cao,
Dandan Zeng,
Yongliang Wang
Abstract:
This paper introduces rotation-equivariance as a self-supervisor to train inertial odometry models. We demonstrate that the self-supervised scheme provides a powerful supervisory signal at training phase as well as at inference stage. It reduces the reliance on massive amounts of labeled data for training a robust model and makes it possible to update the model using various unlabeled data. Furthe…
▽ More
This paper introduces rotation-equivariance as a self-supervisor to train inertial odometry models. We demonstrate that the self-supervised scheme provides a powerful supervisory signal at training phase as well as at inference stage. It reduces the reliance on massive amounts of labeled data for training a robust model and makes it possible to update the model using various unlabeled data. Further, we propose adaptive Test-Time Training (TTT) based on uncertainty estimations in order to enhance the generalizability of the inertial odometry to various unseen data. We show in experiments that the Rotation-equivariance-supervised Inertial Odometry (RIO) trained with 30% data achieves on par performance with a model trained with the whole database. Adaptive TTT improves models performance in all cases and makes more than 25% improvements under several scenarios.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Tail inverse regression for dimension reduction with extreme response
Authors:
Anass Aghbalou,
François Portier,
Anne Sabourin,
Chen Zhou
Abstract:
We consider the problem of supervised dimension reduction with a particular focus on extreme values of the target $Y\in\mathbb{R}$ to be explained by a covariate vector $X \in \mathbb{R}^p$. The general purpose is to define and estimate a projection on a lower dimensional subspace of the covariate space which is sufficient for predicting exceedances of the target above high thresholds. We propose…
▽ More
We consider the problem of supervised dimension reduction with a particular focus on extreme values of the target $Y\in\mathbb{R}$ to be explained by a covariate vector $X \in \mathbb{R}^p$. The general purpose is to define and estimate a projection on a lower dimensional subspace of the covariate space which is sufficient for predicting exceedances of the target above high thresholds. We propose an original definition of Tail Conditional Independence which matches this purpose. Inspired by Sliced Inverse Regression (SIR) methods, we develop a novel framework (TIREX, Tail Inverse Regression for EXtreme response) in order to estimate an extreme sufficient dimension reduction (SDR) space of potentially smaller dimension than that of a classical SDR space. We prove the weak convergence of tail empirical processes involved in the estimation procedure and we illustrate the relevance of the proposed approach on simulated and real world data.
△ Less
Submitted 24 February, 2023; v1 submitted 30 July, 2021;
originally announced August 2021.
-
Distributed Inference for Tail Risk
Authors:
Liujun Chen,
Deyuan Li,
Chen Zhou
Abstract:
For measuring tail risk with scarce extreme events, extreme value analysis is often invoked as the statistical tool to extrapolate to the tail of a distribution. The presence of large datasets benefits tail risk analysis by providing more observations for conducting extreme value analysis. However, large datasets can be stored distributedly preventing the possibility of directly analyzing them. In…
▽ More
For measuring tail risk with scarce extreme events, extreme value analysis is often invoked as the statistical tool to extrapolate to the tail of a distribution. The presence of large datasets benefits tail risk analysis by providing more observations for conducting extreme value analysis. However, large datasets can be stored distributedly preventing the possibility of directly analyzing them. In this paper, we introduce a comprehensive set of tools for examining the asymptotic behavior of tail empirical and quantile processes in the setting where data is distributed across multiple sources, for instance, when data are stored on multiple machines. Utilizing these tools, one can establish the oracle property for most distributed estimators in extreme value statistics in a straightforward way. The main theoretical challenge arises when the number of machines diverges to infinity. The number of machines resembles the role of dimensionality in high dimensional statistics. We provide various examples to demonstrate the practicality and value of our proposed toolkit.
△ Less
Submitted 15 December, 2023; v1 submitted 3 August, 2021;
originally announced August 2021.
-
Active multi-fidelity Bayesian online changepoint detection
Authors:
Gregory W. Gundersen,
Diana Cai,
Chuteng Zhou,
Barbara E. Engelhardt,
Ryan P. Adams
Abstract:
Online algorithms for detecting changepoints, or abrupt shifts in the behavior of a time series, are often deployed with limited resources, e.g., to edge computing settings such as mobile phones or industrial sensors. In these scenarios it may be beneficial to trade the cost of collecting an environmental measurement against the quality or "fidelity" of this measurement and how the measurement aff…
▽ More
Online algorithms for detecting changepoints, or abrupt shifts in the behavior of a time series, are often deployed with limited resources, e.g., to edge computing settings such as mobile phones or industrial sensors. In these scenarios it may be beneficial to trade the cost of collecting an environmental measurement against the quality or "fidelity" of this measurement and how the measurement affects changepoint estimation. For instance, one might decide between inertial measurements or GPS to determine changepoints for motion. A Bayesian approach to changepoint detection is particularly appealing because we can represent our posterior uncertainty about changepoints and make active, cost-sensitive decisions about data fidelity to reduce this posterior uncertainty. Moreover, the total cost could be dramatically lowered through active fidelity switching, while remaining robust to changes in data distribution. We propose a multi-fidelity approach that makes cost-sensitive decisions about which data fidelity to collect based on maximizing information gain with respect to changepoints. We evaluate this framework on synthetic, video, and audio data and show that this information-based approach results in accurate predictions while reducing total cost.
△ Less
Submitted 25 July, 2021; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Mining geometric constraints from crowd-sourced radio signals and its application to indoor positioning
Authors:
Caifa Zhou,
Zhi Li,
Dandan Zeng,
Yongliang Wang
Abstract:
Crowd-sourcing has become a promising way to build} a feature-based indoor positioning system that has lower labour and time costs. It can make full use of the widely deployed infrastructure as well as built-in sensors on mobile devices. One of the key challenges is to generate the reference feature map (RFM), a database used for localization, by {aligning crowd-sourced {trajectories according to…
▽ More
Crowd-sourcing has become a promising way to build} a feature-based indoor positioning system that has lower labour and time costs. It can make full use of the widely deployed infrastructure as well as built-in sensors on mobile devices. One of the key challenges is to generate the reference feature map (RFM), a database used for localization, by {aligning crowd-sourced {trajectories according to associations embodied in the data. In order to facilitate the data fusion using crowd-sourced inertial sensors and radio signals, this paper proposes an approach to adaptively mining geometric information. This is the essential for generating spatial associations between trajectories when employing graph-based optimization methods. The core idea is to estimate the functional relationship to map the similarity/dissimilarity between radio signals to the physical space based on the relative positions obtained from inertial sensors and their associated radio signals. Namely, it is adaptable to different modalities of data and can be implemented in a self-supervised way. We verify the generality of the proposed approach through comprehensive experimental analysis: i) qualitatively comparing the estimation of geometric map** models and the alignment of crowd-sourced trajectories; ii) quantitatively evaluating the positioning performance. The 68\% of the positioning error is less than 4.7 $\mathrm{m}$ using crowd-sourced RFM, which is on a par with manually collected RFM, in a multi-storey shop** mall, which covers more than 10, 000 $ \mathrm{m}^2 $.
△ Less
Submitted 20 March, 2021;
originally announced March 2021.
-
CogDL: A Comprehensive Library for Graph Deep Learning
Authors:
Yukuo Cen,
Zhenyu Hou,
Yan Wang,
Qibin Chen,
Yizhen Luo,
Zhongming Yu,
Hengrui Zhang,
Xingcheng Yao,
Aohan Zeng,
Shiguang Guo,
Yuxiao Dong,
Yang Yang,
Peng Zhang,
Guohao Dai,
Yu Wang,
Chang Zhou,
Hongxia Yang,
Jie Tang
Abstract:
Graph neural networks (GNNs) have attracted tremendous attention from the graph learning community in recent years. It has been widely adopted in various real-world applications from diverse domains, such as social networks and biological graphs. The research and applications of graph deep learning present new challenges, including the sparse nature of graph data, complicated training of GNNs, and…
▽ More
Graph neural networks (GNNs) have attracted tremendous attention from the graph learning community in recent years. It has been widely adopted in various real-world applications from diverse domains, such as social networks and biological graphs. The research and applications of graph deep learning present new challenges, including the sparse nature of graph data, complicated training of GNNs, and non-standard evaluation of graph tasks. To tackle the issues, we present CogDL, a comprehensive library for graph deep learning that allows researchers and practitioners to conduct experiments, compare methods, and build applications with ease and efficiency. In CogDL, we propose a unified design for the training and evaluation of GNN models for various graph tasks, making it unique among existing graph learning libraries. By utilizing this unified trainer, CogDL can optimize the GNN training loop with several training techniques, such as mixed precision training. Moreover, we develop efficient sparse operators for CogDL, enabling it to become the most competitive graph library for efficiency. Another important CogDL feature is its focus on ease of use with the aim of facilitating open and reproducible research of graph learning. We leverage CogDL to report and maintain benchmark results on fundamental graph tasks, which can be reproduced and directly used by the community.
△ Less
Submitted 17 April, 2023; v1 submitted 1 March, 2021;
originally announced March 2021.
-
Contrastive Learning for Debiased Candidate Generation in Large-Scale Recommender Systems
Authors:
Chang Zhou,
Jianxin Ma,
Jianwei Zhang,
**gren Zhou,
Hongxia Yang
Abstract:
Deep candidate generation (DCG) that narrows down the collection of relevant items from billions to hundreds via representation learning has become prevalent in industrial recommender systems. Standard approaches approximate maximum likelihood estimation (MLE) through sampling for better scalability and address the problem of DCG in a way similar to language modeling. However, live recommender sys…
▽ More
Deep candidate generation (DCG) that narrows down the collection of relevant items from billions to hundreds via representation learning has become prevalent in industrial recommender systems. Standard approaches approximate maximum likelihood estimation (MLE) through sampling for better scalability and address the problem of DCG in a way similar to language modeling. However, live recommender systems face severe exposure bias and have a vocabulary several orders of magnitude larger than that of natural language, implying that MLE will preserve and even exacerbate the exposure bias in the long run in order to faithfully fit the observed samples. In this paper, we theoretically prove that a popular choice of contrastive loss is equivalent to reducing the exposure bias via inverse propensity weighting, which provides a new perspective for understanding the effectiveness of contrastive learning. Based on the theoretical discovery, we design CLRec, a contrastive learning method to improve DCG in terms of fairness, effectiveness and efficiency in recommender systems with extremely large candidate size. We further improve upon CLRec and propose Multi-CLRec, for accurate multi-intention aware bias reduction. Our methods have been successfully deployed in Taobao, where at least four-month online A/B tests and offline analyses demonstrate its substantial improvements, including a dramatic reduction in the Matthew effect.
△ Less
Submitted 4 June, 2021; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Understanding Negative Sampling in Graph Representation Learning
Authors:
Zhen Yang,
Ming Ding,
Chang Zhou,
Hongxia Yang,
**gren Zhou,
Jie Tang
Abstract:
Graph representation learning has been extensively studied in recent years. Despite its potential in generating continuous embeddings for various networks, both the effectiveness and efficiency to infer high-quality representations toward large corpus of nodes are still challenging. Sampling is a critical point to achieve the performance goals. Prior arts usually focus on sampling positive node pa…
▽ More
Graph representation learning has been extensively studied in recent years. Despite its potential in generating continuous embeddings for various networks, both the effectiveness and efficiency to infer high-quality representations toward large corpus of nodes are still challenging. Sampling is a critical point to achieve the performance goals. Prior arts usually focus on sampling positive node pairs, while the strategy for negative sampling is left insufficiently explored. To bridge the gap, we systematically analyze the role of negative sampling from the perspectives of both objective and risk, theoretically demonstrating that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance. To the best of our knowledge, we are the first to derive the theory and quantify that the negative sampling distribution should be positively but sub-linearly correlated to their positive sampling distribution. With the guidance of the theory, we propose MCNS, approximating the positive distribution with self-contrast approximation and accelerating negative sampling by Metropolis-Hastings. We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation, on a total of 19 experimental settings. These relatively comprehensive experimental results demonstrate its robustness and superiorities.
△ Less
Submitted 25 June, 2020; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Controllable Multi-Interest Framework for Recommendation
Authors:
Yukuo Cen,
Jianwei Zhang,
Xu Zou,
Chang Zhou,
Hongxia Yang,
Jie Tang
Abstract:
Recently, neural networks have been widely used in e-commerce recommender systems, owing to the rapid development of deep learning. We formalize the recommender system as a sequential recommendation problem, intending to predict the next items that the user might be interacted with. Recent works usually give an overall embedding from a user's behavior sequence. However, a unified user embedding ca…
▽ More
Recently, neural networks have been widely used in e-commerce recommender systems, owing to the rapid development of deep learning. We formalize the recommender system as a sequential recommendation problem, intending to predict the next items that the user might be interacted with. Recent works usually give an overall embedding from a user's behavior sequence. However, a unified user embedding cannot reflect the user's multiple interests during a period. In this paper, we propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our multi-interest module captures multiple interests from user behavior sequences, which can be exploited for retrieving candidate items from the large-scale item pool. These items are then fed into an aggregation module to obtain the overall recommendation. The aggregation module leverages a controllable factor to balance the recommendation accuracy and diversity. We conduct experiments for the sequential recommendation on two real-world datasets, Amazon and Taobao. Experimental results demonstrate that our framework achieves significant improvements over state-of-the-art models. Our framework has also been successfully deployed on the offline Alibaba distributed cloud platform.
△ Less
Submitted 2 August, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Deep learning for smart fish farming: applications, opportunities and challenges
Authors:
Xinting Yang,
Song Zhang,
**tao Liu,
Qinfeng Gao,
Shuanglin Dong,
Chao Zhou
Abstract:
With the rapid emergence of deep learning (DL) technology, it has been successfully used in various fields including aquaculture. This change can create new opportunities and a series of challenges for information and data processing in smart fish farming. This paper focuses on the applications of DL in aquaculture, including live fish identification, species classification, behavioral analysis, f…
▽ More
With the rapid emergence of deep learning (DL) technology, it has been successfully used in various fields including aquaculture. This change can create new opportunities and a series of challenges for information and data processing in smart fish farming. This paper focuses on the applications of DL in aquaculture, including live fish identification, species classification, behavioral analysis, feeding decision-making, size or biomass estimation, water quality prediction. In addition, the technical details of DL methods applied to smart fish farming are also analyzed, including data, algorithms, computing power, and performance. The results of this review show that the most significant contribution of DL is the ability to automatically extract features. However, challenges still exist; DL is still in an era of weak artificial intelligence. A large number of labeled data are needed for training, which has become a bottleneck restricting further DL applications in aquaculture. Nevertheless, DL still offers breakthroughs in the handling of complex data in aquaculture. In brief, our purpose is to provide researchers and practitioners with a better understanding of the current state of the art of DL in aquaculture, which can provide strong support for the implementation of smart fish farming.
△ Less
Submitted 30 June, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Detecting Suspected Epidemic Cases Using Trajectory Big Data
Authors:
Chuansai Zhou,
Wen Yuan,
Jun Wang,
Haiyong Xu,
Yong Jiang,
Xinmin Wang,
Qiuzi Han Wen,
**wen Zhang
Abstract:
Emerging infectious diseases are existential threats to human health and global stability. The recent outbreaks of the novel coronavirus COVID-19 have rapidly formed a global pandemic, causing hundreds of thousands of infections and huge economic loss. The WHO declares that more precise measures to track, detect and isolate infected people are among the most effective means to quickly contain the…
▽ More
Emerging infectious diseases are existential threats to human health and global stability. The recent outbreaks of the novel coronavirus COVID-19 have rapidly formed a global pandemic, causing hundreds of thousands of infections and huge economic loss. The WHO declares that more precise measures to track, detect and isolate infected people are among the most effective means to quickly contain the outbreak. Based on trajectory provided by the big data and the mean field theory, we establish an aggregated risk mean field that contains information of all risk-spreading particles by proposing a spatio-temporal model named HiRES risk map. It has dynamic fine spatial resolution and high computation efficiency enabling fast update. We then propose an objective individual epidemic risk scoring model named HiRES-p based on HiRES risk maps, and use it to develop statistical inference and machine learning methods for detecting suspected epidemic-infected individuals. We conduct numerical experiments by applying the proposed methods to study the early outbreak of COVID-19 in China. Results show that the HiRES risk map has strong ability in capturing global trend and local variability of the epidemic risk, thus can be applied to monitor epidemic risk at country, province, city and community levels, as well as at specific high-risk locations such as hospital and station. HiRES-p score seems to be an effective measurement of personal epidemic risk. The accuracy of both detecting methods are above 90\% when the population infection rate is under 20\%, which indicates great application potential in epidemic risk prevention and control practice.
△ Less
Submitted 15 April, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Spatial dependence and space-time trend in extreme events
Authors:
John H. J. Einmahl,
Ana Ferreira,
Laurens de Haan,
Claudia Neves,
Chen Zhou
Abstract:
The statistical theory of extremes is extended to observations that are non-stationary and not independent. The non-stationarity over time and space is controlled via the scedasis (tail scale) in the marginal distributions. Spatial dependence stems from multivariate extreme value theory. We establish asymptotic theory for both the weighted sequential tail empirical process and the weighted tail qu…
▽ More
The statistical theory of extremes is extended to observations that are non-stationary and not independent. The non-stationarity over time and space is controlled via the scedasis (tail scale) in the marginal distributions. Spatial dependence stems from multivariate extreme value theory. We establish asymptotic theory for both the weighted sequential tail empirical process and the weighted tail quantile process based on all observations, taken over time and space. The results yield two statistical tests for homoscedasticity in the tail, one in space and one in time. Further, we show that the common extreme value index can be estimated via a pseudo-maximum likelihood procedure based on pooling all (non-stationary and dependent) observations. Our leading example and application is rainfall in Northern Germany.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Balance Between Efficient and Effective Learning: Dense2Sparse Reward Sha** for Robot Manipulation with Environment Uncertainty
Authors:
Yongle Luo,
Kun Dong,
Lili Zhao,
Zhiyong Sun,
Chao Zhou,
Bo Song
Abstract:
Efficient and effective learning is one of the ultimate goals of the deep reinforcement learning (DRL), although the compromise has been made in most of the time, especially for the application of robot manipulations. Learning is always expensive for robot manipulation tasks and the learning effectiveness could be affected by the system uncertainty. In order to solve above challenges, in this stud…
▽ More
Efficient and effective learning is one of the ultimate goals of the deep reinforcement learning (DRL), although the compromise has been made in most of the time, especially for the application of robot manipulations. Learning is always expensive for robot manipulation tasks and the learning effectiveness could be affected by the system uncertainty. In order to solve above challenges, in this study, we proposed a simple but powerful reward sha** method, namely Dense2Sparse. It combines the advantage of fast convergence of dense reward and the noise isolation of the sparse reward, to achieve a balance between learning efficiency and effectiveness, which makes it suitable for robot manipulation tasks. We evaluated our Dense2Sparse method with a series of ablation experiments using the state representation model with system uncertainty. The experiment results show that the Dense2Sparse method obtained higher expected reward compared with the ones using standalone dense reward or sparse reward, and it also has a superior tolerance of system uncertainty.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Unsupervised Program Synthesis for Images By Sampling Without Replacement
Authors:
Chenghui Zhou,
Chun-Liang Li,
Barnabas Poczos
Abstract:
Program synthesis has emerged as a successful approach to the image parsing task. Most prior works rely on a two-step scheme involving supervised pretraining of a Seq2Seq model with synthetic programs followed by reinforcement learning (RL) for fine-tuning with real reference images. Fully unsupervised approaches promise to train the model directly on the target images without requiring curated pr…
▽ More
Program synthesis has emerged as a successful approach to the image parsing task. Most prior works rely on a two-step scheme involving supervised pretraining of a Seq2Seq model with synthetic programs followed by reinforcement learning (RL) for fine-tuning with real reference images. Fully unsupervised approaches promise to train the model directly on the target images without requiring curated pretraining datasets. However, they struggle with the inherent sparsity of meaningful programs in the search space. In this paper, we present the first unsupervised algorithm capable of parsing constructive solid geometry (CSG) images into context-free grammar (CFG) without pretraining via non-differentiable renderer. To tackle the \emph{non-Markovian} sparse reward problem, we combine three key ingredients -- (i) a grammar-encoded tree LSTM ensuring program validity (ii) entropy regularization and (iii) sampling without replacement from the CFG syntax tree. Empirically, our algorithm recovers meaningful programs in large search spaces (up to $3.8 \times 10^{28}$). Further, even though our approach is fully unsupervised, it generalizes better than supervised methods on the synthetic 2D CSG dataset. On the 2D computer aided design (CAD) dataset, our approach significantly outperforms the supervised pretrained model and is competitive to the refined model.
△ Less
Submitted 14 June, 2021; v1 submitted 27 January, 2020;
originally announced January 2020.
-
Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation
Authors:
Chuteng Zhou,
Prad Kadambi,
Matthew Mattina,
Paul N. Whatmough
Abstract:
The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for accelerating neural networks, based on either electronic, optical or photonic devices, which may well achieve lower power consumption than conventional digital…
▽ More
The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for accelerating neural networks, based on either electronic, optical or photonic devices, which may well achieve lower power consumption than conventional digital electronics. However, these proposed analog accelerators suffer from the intrinsic noise generated by their physical components, which makes it challenging to achieve high accuracy on deep neural networks. Hence, for successful deployment on analog accelerators, it is essential to be able to train deep neural networks to be robust to random continuous noise in the network weights, which is a somewhat new challenge in machine learning. In this paper, we advance the understanding of noisy neural networks. We outline how a noisy neural network has reduced learning capacity as a result of loss of mutual information between its input and output. To combat this, we propose using knowledge distillation combined with noise injection during training to achieve more noise robust networks, which is demonstrated experimentally across different networks and datasets, including ImageNet. Our method achieves models with as much as two times greater noise tolerance compared with the previous best attempts, which is a significant step towards making analog hardware practical for deep learning.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Gasoline Pricing Policies for Transportation Safety
Authors:
Nima Safaei,
Chao Zhou
Abstract:
Economic factors can have substantial effects on transportation crash trends. This study makes a comprehensive examination of the relationship between the retail gasoline price (including state and federal fuel taxes) and transportation fatal crashes from 2007 to 2016 in the US. Data on motor vehicle, bicycle and pedestrian fatal crashes come from Fatality Analysis Reporting System (FARS) provided…
▽ More
Economic factors can have substantial effects on transportation crash trends. This study makes a comprehensive examination of the relationship between the retail gasoline price (including state and federal fuel taxes) and transportation fatal crashes from 2007 to 2016 in the US. Data on motor vehicle, bicycle and pedestrian fatal crashes come from Fatality Analysis Reporting System (FARS) provided by the National Highway Safety Administration (NHTSA) and the gasoline price data is from U.S. Energy Information Administration (EIA). Random effect negative binomial regression models are used to estimate the impact of inflation-adjusted gasoline prices on trends of transportation fatal crashes. Initial results combined with results of previous studies showed that gender and transportation mean type (motorcycle, non-motorcycle, bicycle and pedestrian) play prominent roles in interpreting the final model, so by using random effect negative binomial regression, seven models are developed to evaluate the effects of gasoline price changes on total population, male, female, motorcyclists, non-motorcyclists, bicyclists and pedestrians separately. Our findings suggest that increasing the gasoline prices will not significantly alter the number of total fatal crashes. However, by looking at different vehicle types, it is estimated that one dollar increase in adjusted gasoline price is associated with 24.2% increase in the number of motorcycle fatal crashes, 1.9% decrease in the number of non-motorcycle fatal crashes, and 0.7% decrease in the number of pedestrian fatal crashes. Also, there is no noticeable difference between male and female in response to the gasoline price changes.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Feature-wise change detection and robust indoor positioning using RANSAC-like approach
Authors:
Caifa Zhou
Abstract:
Fingerprinting-based positioning, one of the promising indoor positioning solutions, has been broadly explored owing to the pervasiveness of sensor-rich mobile devices, the prosperity of opportunistically measurable location-relevant signals and the progress of data-driven algorithms. One critical challenge is to controland improve the quality of the reference fingerprint map (RFM), which is built…
▽ More
Fingerprinting-based positioning, one of the promising indoor positioning solutions, has been broadly explored owing to the pervasiveness of sensor-rich mobile devices, the prosperity of opportunistically measurable location-relevant signals and the progress of data-driven algorithms. One critical challenge is to controland improve the quality of the reference fingerprint map (RFM), which is built at the offline stage and applied for online positioning. The key concept concerningthe quality control of the RFM is updating the RFM according to the newly measured data. Though varies methods have been proposed for adapting the RFM, they approach the problem by introducing extra-positioning schemes (e.g. PDR orUGV) and directly adjust the RFM without distinguishing whether critical changes have occurred. This paper aims at proposing an extra-positioning-free solution by making full use of the redundancy of measurable features. Loosely inspired by random sampling consensus (RANSAC), arbitrarily sampled subset of features from the online measurement are used for generating multi-resamples, which areused for estimating the intermediate locations. In the way of resampling, it can mitigate the impact of the changed features on positioning and enables to retrieve accurate location estimation. The users location is robustly computed by identifying the candidate locations from these intermediate ones using modified Jaccardindex (MJI) and the feature-wise change belief is calculated according to the world model of the RFM and the estimated variability of features. In order to validate our proposed approach, two levels of experimental analysis have been carried out. On the simulated dataset, the average change detection accuracy is about 90%. Meanwhile, the improvement of positioning accuracy within 2 m is about 20% by drop** out the features that are detected as changed when performing positioning comparing to that of using all measured features for location estimation. On the long-term collected dataset, the average change detection accuracy is about 85%.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
ExperienceThinking: Constrained Hyperparameter Optimization based on Knowledge and Pruning
Authors:
Chunnan Wang,
Hongzhi Wang,
Chang Zhou,
Hanxiao Chen
Abstract:
Machine learning algorithms are very sensitive to the hyperparameters, and their evaluations are generally expensive. Users desperately need intelligent methods to quickly optimize hyperparameter settings according to known evaluation information, and thus reduce computational cost and promote optimization efficiency. Motivated by this, we propose ExperienceThinking algorithm to quickly find the b…
▽ More
Machine learning algorithms are very sensitive to the hyperparameters, and their evaluations are generally expensive. Users desperately need intelligent methods to quickly optimize hyperparameter settings according to known evaluation information, and thus reduce computational cost and promote optimization efficiency. Motivated by this, we propose ExperienceThinking algorithm to quickly find the best possible hyperparameter configuration of machine learning algorithms within a few configuration evaluations. ExperienceThinking design two novel methods, which intelligently infer optimal configurations from two aspects: search space pruning and knowledge utilization respectively. Two methods complement each other and solve the constrained hyperparameter optimization problems effectively. To demonstrate the benefit of ExperienceThinking, we compare it with 3 classical hyperparameter optimization algorithms with a small number of configuration evaluations. The experimental results present that our proposed algorithm provides superior results and achieve better performance.
△ Less
Submitted 4 May, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
A Fast Sampling Gradient Tree Boosting Framework
Authors:
Daniel Chao Zhou,
Zhongming **,
Tong Zhang
Abstract:
As an adaptive, interpretable, robust, and accurate meta-algorithm for arbitrary differentiable loss functions, gradient tree boosting is one of the most popular machine learning techniques, though the computational expensiveness severely limits its usage. Stochastic gradient boosting could be adopted to accelerates gradient boosting by uniformly sampling training instances, but its estimator coul…
▽ More
As an adaptive, interpretable, robust, and accurate meta-algorithm for arbitrary differentiable loss functions, gradient tree boosting is one of the most popular machine learning techniques, though the computational expensiveness severely limits its usage. Stochastic gradient boosting could be adopted to accelerates gradient boosting by uniformly sampling training instances, but its estimator could introduce a high variance. This situation arises motivation for us to optimize gradient tree boosting. We combine gradient tree boosting with importance sampling, which achieves better performance by reducing the stochastic variance. Furthermore, we use a regularizer to improve the diagonal approximation in the Newton step of gradient boosting. The theoretical analysis supports that our strategies achieve a linear convergence rate on logistic loss. Empirical results show that our algorithm achieves a 2.5x--18x acceleration on two different gradient boosting algorithms (LogitBoost and LambdaMART) without appreciable performance loss.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Learning Disentangled Representations for Recommendation
Authors:
Jianxin Ma,
Chang Zhou,
Peng Cui,
Hongxia Yang,
Wenwu Zhu
Abstract:
User behavior data in recommender systems are driven by the complex interactions of many latent factors behind the users' decision making processes. The factors are highly entangled, and may range from high-level ones that govern user intentions, to low-level ones that characterize a user's preference when executing an intention. Learning representations that uncover and disentangle these latent f…
▽ More
User behavior data in recommender systems are driven by the complex interactions of many latent factors behind the users' decision making processes. The factors are highly entangled, and may range from high-level ones that govern user intentions, to low-level ones that characterize a user's preference when executing an intention. Learning representations that uncover and disentangle these latent factors can bring enhanced robustness, interpretability, and controllability. However, learning such disentangled representations from user behavior is challenging, and remains largely neglected by the existing literature. In this paper, we present the MACRo-mIcro Disentangled Variational Auto-Encoder (MacridVAE) for learning disentangled representations from user behavior. Our approach achieves macro disentanglement by inferring the high-level concepts associated with user intentions (e.g., to buy a shirt or a cellphone), while capturing the preference of a user regarding the different concepts separately. A micro-disentanglement regularizer, stemming from an information-theoretic interpretation of VAEs, then forces each dimension of the representations to independently reflect an isolated low-level factor (e.g., the size or the color of a shirt). Empirical results show that our approach can achieve substantial improvement over the state-of-the-art baselines. We further demonstrate that the learned representations are interpretable and controllable, which can potentially lead to a new paradigm for recommendation where users are given fine-grained control over targeted aspects of the recommendation lists.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Pushing the limits of RNN Compression
Authors:
Urmish Thakker,
Igor Fedorov,
Jesse Beu,
Dibakar Gope,
Chu Zhou,
Ganesh Dasika,
Matthew Mattina
Abstract:
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 16-38x…
▽ More
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 16-38x with minimal accuracy loss. We show that KP can beat the task accuracy achieved by other state-of-the-art compression techniques (pruning and low-rank matrix factorization) across 4 benchmarks spanning 3 different applications, while simultaneously improving inference run-time.
△ Less
Submitted 9 October, 2019; v1 submitted 4 October, 2019;
originally announced October 2019.
-
Persistence B-Spline Grids: Stable Vector Representation of Persistence Diagrams Based on Data Fitting
Authors:
Zhetong Dong,
Hongwei Lin,
Chi Zhou
Abstract:
Many attempts have been made in recent decades to integrate machine learning (ML) and topological data analysis. A prominent problem in applying persistent homology to ML tasks is finding a vector representation of a persistence diagram (PD), which is a summary diagram for representing topological features. From the perspective of data fitting, a stable vector representation, namely, persistence B…
▽ More
Many attempts have been made in recent decades to integrate machine learning (ML) and topological data analysis. A prominent problem in applying persistent homology to ML tasks is finding a vector representation of a persistence diagram (PD), which is a summary diagram for representing topological features. From the perspective of data fitting, a stable vector representation, namely, persistence B-spline grid (PBSG), is proposed based on the efficient technique of progressive-iterative approximation for least-squares B-spline function fitting. We theoretically prove that the PBSG method is stable with respect to the metric of 1-Wasserstein distance defined on the PD space. The proposed method was tested on a synthetic data set, data sets of randomly generated PDs, data of a dynamical system, and 3D CAD models, showing its effectiveness and efficiency
△ Less
Submitted 22 April, 2022; v1 submitted 17 September, 2019;
originally announced September 2019.
-
Semi-Supervised Graph Embedding for Multi-Label Graph Node Classification
Authors:
Kaisheng Gao,
**g Zhang,
Cangqi Zhou
Abstract:
The graph convolution network (GCN) is a widely-used facility to realize graph-based semi-supervised learning, which usually integrates node features and graph topologic information to build learning models. However, as for multi-label learning tasks, the supervision part of GCN simply minimizes the cross-entropy loss between the last layer outputs and the ground-truth label distribution, which te…
▽ More
The graph convolution network (GCN) is a widely-used facility to realize graph-based semi-supervised learning, which usually integrates node features and graph topologic information to build learning models. However, as for multi-label learning tasks, the supervision part of GCN simply minimizes the cross-entropy loss between the last layer outputs and the ground-truth label distribution, which tends to lose some useful information such as label correlations, so that prevents from obtaining high performance. In this paper, we pro-pose a novel GCN-based semi-supervised learning approach for multi-label classification, namely ML-GCN. ML-GCN first uses a GCN to embed the node features and graph topologic information. Then, it randomly generates a label matrix, where each row (i.e., label vector) represents a kind of labels. The dimension of the label vector is the same as that of the node vector before the last convolution operation of GCN. That is, all labels and nodes are embedded in a uniform vector space. Finally, during the ML-GCN model training, label vectors and node vectors are concatenated to serve as the inputs of the relaxed skip-gram model to detect the node-label correlation as well as the label-label correlation. Experimental results on several graph classification datasets show that the proposed ML-GCN outperforms four state-of-the-art methods.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
Dimensional Reweighting Graph Convolutional Networks
Authors:
Xu Zou,
Qiuye Jia,
Jianwei Zhang,
Chang Zhou,
Hongxia Yang,
Jie Tang
Abstract:
Graph Convolution Networks (GCNs) are becoming more and more popular for learning node representations on graphs. Though there exist various developments on sampling and aggregation to accelerate the training process and improve the performances, limited works focus on dealing with the dimensional information imbalance of node representations. To bridge the gap, we propose a method named Dimension…
▽ More
Graph Convolution Networks (GCNs) are becoming more and more popular for learning node representations on graphs. Though there exist various developments on sampling and aggregation to accelerate the training process and improve the performances, limited works focus on dealing with the dimensional information imbalance of node representations. To bridge the gap, we propose a method named Dimensional reweighting Graph Convolution Network (DrGCN). We theoretically prove that our DrGCN can guarantee to improve the stability of GCNs via mean field theory. Our dimensional reweighting method is very flexible and can be easily combined with most sampling and aggregation techniques for GCNs. Experimental results demonstrate its superior performances on several challenging transductive and inductive node classification benchmark datasets. Our DrGCN also outperforms existing models on an industrial-sized Alibaba recommendation dataset.
△ Less
Submitted 29 October, 2020; v1 submitted 4 July, 2019;
originally announced July 2019.
-
Cognitive Knowledge Graph Reasoning for One-shot Relational Learning
Authors:
Zhengxiao Du,
Chang Zhou,
Ming Ding,
Hongxia Yang,
Jie Tang
Abstract:
Inferring new facts from existing knowledge graphs (KG) with explainable reasoning processes is a significant problem and has received much attention recently. However, few studies have focused on relation types unseen in the original KG, given only one or a few instances for training. To bridge this gap, we propose CogKR for one-shot KG reasoning. The one-shot relational learning problem is tackl…
▽ More
Inferring new facts from existing knowledge graphs (KG) with explainable reasoning processes is a significant problem and has received much attention recently. However, few studies have focused on relation types unseen in the original KG, given only one or a few instances for training. To bridge this gap, we propose CogKR for one-shot KG reasoning. The one-shot relational learning problem is tackled through two modules: the summary module summarizes the underlying relationship of the given instances, based on which the reasoning module infers the correct answers. Motivated by the dual process theory in cognitive science, in the reasoning module, a cognitive graph is built by iteratively coordinating retrieval (System 1, collecting relevant evidence intuitively) and reasoning (System 2, conducting relational reasoning over collected information). The structural information offered by the cognitive graph enables our model to aggregate pieces of evidence from multiple reasoning paths and explain the reasoning process graphically. Experiments show that CogKR substantially outperforms previous state-of-the-art models on one-shot KG reasoning benchmarks, with relative improvements of 24.3%-29.7% on MRR. The source code is available at https://github.com/THUDM/CogKR.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
Compressing RNNs for IoT devices by 15-38x using Kronecker Products
Authors:
Urmish Thakker,
Jesse Beu,
Dibakar Gope,
Chu Zhou,
Igor Fedorov,
Ganesh Dasika,
Matthew Mattina
Abstract:
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size.As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 15-38x…
▽ More
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size.As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 15-38x with minimal accuracy loss. By quantizing the resulting models to 8-bits, we further push the compression factor to 50x. We show that KP can beat the task accuracy achieved by other state-of-the-art compression techniques across 5 benchmarks spanning 3 different applications, while simultaneously improving inference run-time. We show that the KP compression mechanism does introduce an accuracy loss, which can be mitigated by a proposed hybrid KP (HKP) approach. Our HKP algorithm provides fine-grained control over the compression ratio, enabling us to regain accuracy lost during compression by adding a small number of model parameters.
△ Less
Submitted 31 January, 2020; v1 submitted 6 June, 2019;
originally announced June 2019.
-
An iterative scheme for feature based positioning using a weighted dissimilarity measure
Authors:
Caifa Zhou,
Andreas Wieser
Abstract:
We propose an iterative scheme for feature-based positioning using a new weighted dissimilarity measure with the goal of reducing the impact of large errors among the measured or modeled features. The weights are computed from the location-dependent standard deviations of the features and stored as part of the reference fingerprint map (RFM). Spatial filtering and kernel smoothing of the kinematic…
▽ More
We propose an iterative scheme for feature-based positioning using a new weighted dissimilarity measure with the goal of reducing the impact of large errors among the measured or modeled features. The weights are computed from the location-dependent standard deviations of the features and stored as part of the reference fingerprint map (RFM). Spatial filtering and kernel smoothing of the kinematically collected raw data allow efficiently estimating the standard deviations during RFM generation. In the positioning stage, the weights control the contribution of each feature to the dissimilarity measure, which in turn quantifies the difference between the set of online measured features and the fingerprints stored in the RFM. Features with little variability contribute more to the estimated position than features with high variability. Iterations are necessary because the variability depends on the location, and the location is initially unknown when estimating the position. Using real WiFi signal strength data from extended test measurements with ground truth in an office building, we show that the standard deviations of these features vary considerably within the region of interest and are neither simple functions of the signal strength nor of the distances from the corresponding access points. This is the motivation to include the empirical standard deviations in the RFM. We then analyze the deviations of the estimated positions with and without the location-dependent weighting. In the present example the maximum radial positioning error from ground truth are reduced by 40% comparing to kNN without the weighted dissimilarity measure.
△ Less
Submitted 30 May, 2019; v1 submitted 20 May, 2019;
originally announced May 2019.
-
GraphNAS: Graph Neural Architecture Search with Reinforcement Learning
Authors:
Yang Gao,
Hong Yang,
Peng Zhang,
Chuan Zhou,
Yue Hu
Abstract:
Graph Neural Networks (GNNs) have been popularly used for analyzing non-Euclidean data such as social network data and biological data. Despite their success, the design of graph neural networks requires a lot of manual work and domain knowledge. In this paper, we propose a Graph Neural Architecture Search method (GraphNAS for short) that enables automatic search of the best graph neural architect…
▽ More
Graph Neural Networks (GNNs) have been popularly used for analyzing non-Euclidean data such as social network data and biological data. Despite their success, the design of graph neural networks requires a lot of manual work and domain knowledge. In this paper, we propose a Graph Neural Architecture Search method (GraphNAS for short) that enables automatic search of the best graph neural architecture based on reinforcement learning. Specifically, GraphNAS first uses a recurrent network to generate variable-length strings that describe the architectures of graph neural networks, and then trains the recurrent network with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation data set. Extensive experimental results on node classification tasks in both transductive and inductive learning settings demonstrate that GraphNAS can achieve consistently better performance on the Cora, Citeseer, Pubmed citation network, and protein-protein interaction network. On node classification tasks, GraphNAS can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy.
△ Less
Submitted 19 August, 2019; v1 submitted 22 April, 2019;
originally announced April 2019.
-
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning
Authors:
Paul N. Whatmough,
Chuteng Zhou,
Patrick Hansen,
Shreyas Kolala Venkataramanaiah,
Jae-sun Seo,
Matthew Mattina
Abstract:
The computational demands of computer vision tasks based on state-of-the-art Convolutional Neural Network (CNN) image classification far exceed the energy budgets of mobile devices. This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features, and a conventional programmable CNN accelerator which processes a dataset-specific CNN. Image class…
▽ More
The computational demands of computer vision tasks based on state-of-the-art Convolutional Neural Network (CNN) image classification far exceed the energy budgets of mobile devices. This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features, and a conventional programmable CNN accelerator which processes a dataset-specific CNN. Image classification models for FixyNN are trained end-to-end via transfer learning, with the common feature extractor representing the transfered part, and the programmable part being learnt on the target dataset. Experimental results demonstrate FixyNN hardware can achieve very high energy efficiencies up to 26.6 TOPS/W ($4.81 \times$ better than iso-area programmable accelerator). Over a suite of six datasets we trained models via transfer learning with an accuracy loss of $<1\%$ resulting in up to 11.2 TOPS/W - nearly $2 \times$ more efficient than a conventional programmable CNN accelerator of the same area.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
Modified Jaccard Index Analysis and Adaptive Feature Selection for Location Fingerprinting with Limited Computational Complexity
Authors:
Caifa Zhou,
Andreas Wieser
Abstract:
We propose an approach for fingerprinting-based positioning which reduces the data requirements and computational complexity of the online positioning stage. It is based on a segmentation of the entire region of interest into subregions, identification of candidate subregions during the online-stage, and position estimation using a preselected subset of relevant features. The subregion selection u…
▽ More
We propose an approach for fingerprinting-based positioning which reduces the data requirements and computational complexity of the online positioning stage. It is based on a segmentation of the entire region of interest into subregions, identification of candidate subregions during the online-stage, and position estimation using a preselected subset of relevant features. The subregion selection uses a modified Jaccard index which quantifies the similarity between the features observed by the user and those available within the reference fingerprint map. The adaptive feature selection is achieved using an adaptive forward-backward greedy search which determines a subset of features for each subregion, relevant with respect to a given fingerprinting-based positioning method. In an empirical study using signals of opportunity for fingerprinting the proposed subregion and feature selection reduce the processing time during the online-stage by a factor of about 10 while the positioning accuracy does not deteriorate significantly. In fact, in one of the two study cases the 90th percentile of the circular error increased by 7.5% while in the other study case we even found a reduction of the corresponding circular error by 30%.
△ Less
Submitted 10 January, 2019;
originally announced January 2019.
-
MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders
Authors:
Xuezhe Ma,
Chunting Zhou,
Eduard Hovy
Abstract:
Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. However, recent studies demonstrate that, when equipped with expressive generative distributions (aka. decoders), VAE suffers from learning uninformative latent representations with the observation call…
▽ More
Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. However, recent studies demonstrate that, when equipped with expressive generative distributions (aka. decoders), VAE suffers from learning uninformative latent representations with the observation called KL Varnishing, in which case VAE collapses into an unconditional generative model. In this work, we introduce mutual posterior-divergence regularization, a novel regularization that is able to control the geometry of the latent space to accomplish meaningful representation learning, while achieving comparable or superior capability of density estimation. Experiments on three image benchmark datasets demonstrate that, when equipped with powerful decoders, our model performs well both on density estimation and representation learning.
△ Less
Submitted 5 January, 2019;
originally announced January 2019.
-
Energy Efficient Hardware for On-Device CNN Inference via Transfer Learning
Authors:
Paul Whatmough,
Chuteng Zhou,
Patrick Hansen,
Matthew Mattina
Abstract:
On-device CNN inference for real-time computer vision applications can result in computational demands that far exceed the energy budgets of mobile devices. This paper proposes FixyNN, a co-designed hardware accelerator platform which splits a CNN model into two parts: a set of layers that are fixed in the hardware platform as a front-end fixed-weight feature extractor, and the remaining layers wh…
▽ More
On-device CNN inference for real-time computer vision applications can result in computational demands that far exceed the energy budgets of mobile devices. This paper proposes FixyNN, a co-designed hardware accelerator platform which splits a CNN model into two parts: a set of layers that are fixed in the hardware platform as a front-end fixed-weight feature extractor, and the remaining layers which become a back-end classifier running on a conventional programmable CNN accelerator. The common front-end provides ubiquitous CNN features for all FixyNN models, while the back-end is programmable and specific to a given dataset. Image classification models for FixyNN are trained end-to-end via transfer learning, with front-end layers fixed for the shared feature extractor, and back-end layers fine-tuned for a specific task. Over a suite of six datasets, we trained models via transfer learning with an accuracy loss of <1%, resulting in a FixyNN hardware platform with nearly 2 times better energy efficiency than a conventional programmable CNN accelerator of the same silicon area (i.e. hardware cost).
△ Less
Submitted 26 February, 2019; v1 submitted 4 December, 2018;
originally announced December 2018.
-
Anomaly Detection via Graphical Lasso
Authors:
Haitao Liu,
Randy C. Paffenroth,
Jian Zou,
Chong Zhou
Abstract:
Anomalies and outliers are common in real-world data, and they can arise from many sources, such as sensor faults. Accordingly, anomaly detection is important both for analyzing the anomalies themselves and for cleaning the data for further analysis of its ambient structure. Nonetheless, a precise definition of anomalies is important for automated detection and herein we approach such problems fro…
▽ More
Anomalies and outliers are common in real-world data, and they can arise from many sources, such as sensor faults. Accordingly, anomaly detection is important both for analyzing the anomalies themselves and for cleaning the data for further analysis of its ambient structure. Nonetheless, a precise definition of anomalies is important for automated detection and herein we approach such problems from the perspective of detecting sparse latent effects embedded in large collections of noisy data. Standard Graphical Lasso-based techniques can identify the conditional dependency structure of a collection of random variables based on their sample covariance matrix. However, classic Graphical Lasso is sensitive to outliers in the sample covariance matrix. In particular, several outliers in a sample covariance matrix can destroy the sparsity of its inverse. Accordingly, we propose a novel optimization problem that is similar in spirit to Robust Principal Component Analysis (RPCA) and splits the sample covariance matrix $M$ into two parts, $M=F+S$, where $F$ is the cleaned sample covariance whose inverse is sparse and computable by Graphical Lasso, and $S$ contains the outliers in $M$. We accomplish this decomposition by adding an additional $ \ell_1$ penalty to classic Graphical Lasso, and name it "Robust Graphical Lasso (Rglasso)". Moreover, we propose an Alternating Direction Method of Multipliers (ADMM) solution to the optimization problem which scales to large numbers of unknowns. We evaluate our algorithm on both real and synthetic datasets, obtaining interpretable results and outperforming the standard robust Minimum Covariance Determinant (MCD) method and Robust Principal Component Analysis (RPCA) regarding both accuracy and speed.
△ Less
Submitted 10 November, 2018;
originally announced November 2018.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
High-dimensional Two-sample Precision Matrices Test: An Adaptive Approach through Multiplier Bootstrap
Authors:
Mingjuan Zhang,
Yong He,
Cheng Zhou,
Xinsheng Zhang
Abstract:
Precision matrix, which is the inverse of covariance matrix, plays an important role in statistics, as it captures the partial correlation between variables. Testing the equality of two precision matrices in high dimensional setting is a very challenging but meaningful problem, especially in the differential network modelling. To our best knowledge, existing test is only powerful for sparse altern…
▽ More
Precision matrix, which is the inverse of covariance matrix, plays an important role in statistics, as it captures the partial correlation between variables. Testing the equality of two precision matrices in high dimensional setting is a very challenging but meaningful problem, especially in the differential network modelling. To our best knowledge, existing test is only powerful for sparse alternative patterns where two precision matrices differ in a small number of elements. In this paper we propose a data-adaptive test which is powerful against either dense or sparse alternatives. Multiplier bootstrap approach is utilized to approximate the limiting distribution of the test statistic. Theoretical properties including asymptotic size and power of the test are investigated. Simulation study verifies that the data-adaptive test performs well under various alternative scenarios. The practical usefulness of the test is illustrated by applying it to a gene expression data set associated with lung cancer.
△ Less
Submitted 20 October, 2018;
originally announced October 2018.
-
Generative Adversarial Active Learning for Unsupervised Outlier Detection
Authors:
Yezheng Liu,
Zhe Li,
Chong Zhou,
Yuanchun Jiang,
Jianshan Sun,
Meng Wang,
Xiangnan He
Abstract:
Outlier detection is an important topic in machine learning and has been used in a wide range of applications. In this paper, we approach outlier detection as a binary-classification issue by sampling potential outliers from a uniform reference distribution. However, due to the sparsity of data in high-dimensional space, a limited number of potential outliers may fail to provide sufficient informa…
▽ More
Outlier detection is an important topic in machine learning and has been used in a wide range of applications. In this paper, we approach outlier detection as a binary-classification issue by sampling potential outliers from a uniform reference distribution. However, due to the sparsity of data in high-dimensional space, a limited number of potential outliers may fail to provide sufficient information to assist the classifier in describing a boundary that can separate outliers from normal data effectively. To address this, we propose a novel Single-Objective Generative Adversarial Active Learning (SO-GAAL) method for outlier detection, which can directly generate informative potential outliers based on the mini-max game between a generator and a discriminator. Moreover, to prevent the generator from falling into the mode collapsing problem, the stop node of training should be determined when SO-GAAL is able to provide sufficient information. But without any prior information, it is extremely difficult for SO-GAAL. Therefore, we expand the network structure of SO-GAAL from a single generator to multiple generators with different objectives (MO-GAAL), which can generate a reasonable reference distribution for the whole dataset. We empirically compare the proposed approach with several state-of-the-art outlier detection methods on both synthetic and real-world datasets. The results show that MO-GAAL outperforms its competitors in the majority of cases, especially for datasets with various cluster types or high irrelevant variable ratio.
△ Less
Submitted 17 March, 2019; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Deep Interest Evolution Network for Click-Through Rate Prediction
Authors:
Guorui Zhou,
Na Mou,
Ying Fan,
Qi Pi,
Weijie Bian,
Chang Zhou,
Xiaoqiang Zhu,
Kun Gai
Abstract:
Click-through rate~(CTR) prediction, whose goal is to estimate the probability of the user clicks, has become one of the core tasks in advertising systems. For CTR prediction model, it is necessary to capture the latent user interest behind the user behavior data. Besides, considering the changing of the external environment and the internal cognition, user interest evolves over time dynamically.…
▽ More
Click-through rate~(CTR) prediction, whose goal is to estimate the probability of the user clicks, has become one of the core tasks in advertising systems. For CTR prediction model, it is necessary to capture the latent user interest behind the user behavior data. Besides, considering the changing of the external environment and the internal cognition, user interest evolves over time dynamically. There are several CTR prediction methods for interest modeling, while most of them regard the representation of behavior as the interest directly, and lack specially modeling for latent interest behind the concrete behavior. Moreover, few work consider the changing trend of interest. In this paper, we propose a novel model, named Deep Interest Evolution Network~(DIEN), for CTR prediction. Specifically, we design interest extractor layer to capture temporal interests from history behavior sequence. At this layer, we introduce an auxiliary loss to supervise interest extracting at each step. As user interests are diverse, especially in the e-commerce system, we propose interest evolving layer to capture interest evolving process that is relative to the target item. At interest evolving layer, attention mechanism is embedded into the sequential structure novelly, and the effects of relative interests are strengthened during interest evolution. In the experiments on both public and industrial datasets, DIEN significantly outperforms the state-of-the-art solutions. Notably, DIEN has been deployed in the display advertisement system of Taobao, and obtained 20.7\% improvement on CTR.
△ Less
Submitted 16 November, 2018; v1 submitted 10 September, 2018;
originally announced September 2018.
-
A horse racing between the block maxima method and the peak-over-threshold approach
Authors:
Axel Bücher,
Chen Zhou
Abstract:
Classical extreme value statistics consists of two fundamental approaches: the block maxima (BM) method and the peak-over-threshold (POT) approach. It seems to be general consensus among researchers in the field that the POT method makes use of extreme observations more efficiently than the BM method. We shed light on this discussion from three different perspectives. First, based on recent theore…
▽ More
Classical extreme value statistics consists of two fundamental approaches: the block maxima (BM) method and the peak-over-threshold (POT) approach. It seems to be general consensus among researchers in the field that the POT method makes use of extreme observations more efficiently than the BM method. We shed light on this discussion from three different perspectives. First, based on recent theoretical results for the BM approach, we provide a theoretical comparison in i.i.d.\ scenarios. We argue that the data generating process may favour either one or the other approach. Second, if the underlying data possesses serial dependence, we argue that the choice of a method should be primarily guided by the ultimate statistical interest: for instance, POT is preferable for quantile estimation, while BM is preferable for return level estimation. Finally, we discuss the two approaches for multivariate observations and identify various open ends for future research.
△ Less
Submitted 1 July, 2018;
originally announced July 2018.
-
Pressure Predictions of Turbine Blades with Deep Learning
Authors:
Cheng'an Bai,
Chao Zhou
Abstract:
Deep learning has been used in many areas, such as feature detections in images and the game of go. This paper presents a study that attempts to use the deep learning method to predict turbomachinery performance. Three different deep neural networks are built and trained to predict the pressure distributions of turbine airfoils. The performance of a library of turbine airfoils were firstly predict…
▽ More
Deep learning has been used in many areas, such as feature detections in images and the game of go. This paper presents a study that attempts to use the deep learning method to predict turbomachinery performance. Three different deep neural networks are built and trained to predict the pressure distributions of turbine airfoils. The performance of a library of turbine airfoils were firstly predicted using methods based on Euler equations, which were then used to train and validate the deep learning neural networks. The results show that network with four layers of convolutional neural network and two layers of fully connected neural network provides the best predictions. For the best neural network architecture, the pressure prediction on more than 99% locations are better than 3% and 90% locations are better than 1%.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
CDM: Compound dissimilarity measure and an application to fingerprinting-based positioning
Authors:
Caifa Zhou,
Andreas Wieser
Abstract:
A non-vector-based dissimilarity measure is proposed by combining vector-based distance metrics and set operations. This proposed compound dissimilarity measure (CDM) is applicable to quantify similarity of collections of attribute/feature pairs where not all attributes are present in all collections. This is a typical challenge in the context of e.g., fingerprinting-based positioning (FbP). Compa…
▽ More
A non-vector-based dissimilarity measure is proposed by combining vector-based distance metrics and set operations. This proposed compound dissimilarity measure (CDM) is applicable to quantify similarity of collections of attribute/feature pairs where not all attributes are present in all collections. This is a typical challenge in the context of e.g., fingerprinting-based positioning (FbP). Compared to vector-based distance metrics (e.g., Minkowski), the merits of the proposed CDM are i) the data do not need to be converted to vectors of equal dimension, ii) shared and unshared attributes can be weighted differently within the assessment, and iii) additional degrees of freedom within the measure allow to adapt its properties to application needs in a data-driven way. We indicate the validity of the proposed CDM by demonstrating the improvements of the positioning performance of fingerprinting-based WLAN indoor positioning using four different datasets, three of them publicly available. When processing these datasets using CDM instead of conventional distance metrics the accuracy of identifying buildings and floors improves by about 5% on average. The 2d positioning errors in terms of root mean squared error (RMSE) are reduced by a factor of two, and the percentage of position solutions with less than 2m error improves by over 10%.
△ Less
Submitted 26 June, 2018; v1 submitted 16 May, 2018;
originally announced May 2018.
-
Jaccard analysis and LASSO-based feature selection for location fingerprinting with limited computational complexity
Authors:
Caifa Zhou,
Andreas Wieser
Abstract:
We propose an approach to reduce both computational complexity and data storage requirements for the online positioning stage of a fingerprinting-based indoor positioning system (FIPS) by introducing segmentation of the region of interest (RoI) into sub-regions, sub-region selection using a modified Jaccard index, and feature selection based on randomized least absolute shrinkage and selection ope…
▽ More
We propose an approach to reduce both computational complexity and data storage requirements for the online positioning stage of a fingerprinting-based indoor positioning system (FIPS) by introducing segmentation of the region of interest (RoI) into sub-regions, sub-region selection using a modified Jaccard index, and feature selection based on randomized least absolute shrinkage and selection operator (LASSO). We implement these steps into a Bayesian framework of position estimation using the maximum a posteriori (MAP) principle. An additional benefit of these steps is that the time for estimating the position, and the required data storage are virtually independent of the size of the RoI and of the total number of available features within the RoI. Thus the proposed steps facilitate application of FIPS to large areas. Results of an experimental analysis using real data collected in an office building using a Nexus 6P smart phone as user device and a total station for providing position ground truth corroborate the expected performance of the proposed approach. The positioning accuracy obtained by only processing 10 automatically identified features instead of all available ones and limiting position estimation to 10 automatically identified sub-regions instead of the entire RoI is equivalent to processing all available data. In the chosen example, 50% of the errors are less than 1.8 m and 90% are less than 5 m. However, the computation time using the automatically identified subset of data is only about 1% of that required for processing the entire data set.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Unbiased Simulation for Optimizing Stochastic Function Compositions
Authors:
Jose Blanchet,
Donald Goldfarb,
Garud Iyengar,
Fengpei Li,
Chaoxu Zhou
Abstract:
In this paper, we introduce an unbiased gradient simulation algorithms for solving convex optimization problem with stochastic function compositions. We show that the unbiased gradient generated from the algorithm has finite variance and finite expected computation cost. We then combined the unbiased gradient simulation with two variance reduced algorithms (namely SVRG and SCSG) and showed that th…
▽ More
In this paper, we introduce an unbiased gradient simulation algorithms for solving convex optimization problem with stochastic function compositions. We show that the unbiased gradient generated from the algorithm has finite variance and finite expected computation cost. We then combined the unbiased gradient simulation with two variance reduced algorithms (namely SVRG and SCSG) and showed that the proposed optimization algorithms based on unbiased gradient simulations exhibit satisfactory convergence properties. Specifically, in the SVRG case, the algorithm with simulated gradient can be shown to converge linearly to optima in expectation and almost surely under strong convexity. Finally, for the numerical experiment,we applied the algorithms to two important cases of stochastic function compositions optimization: maximizing the Cox's partial likelihood model and training conditional random fields.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.