-
Measure This, Not That: Optimizing the Cost and Model-Based Information Content of Measurements
Authors:
Jialu Wang,
Zedong Peng,
Ryan Hughes,
Debangsu Bhattacharyya,
David E. Bernal Neira,
Alexander W. Dowling
Abstract:
Model-based design of experiments (MBDoE) is a powerful framework for selecting and calibrating science-based mathematical models from data. This work extends popular MBDoE workflows by proposing a convex mixed integer (non)linear programming (MINLP) problem to optimize the selection of measurements. The solver MindtPy is modified to support calculating the D-optimality objective and its gradient…
▽ More
Model-based design of experiments (MBDoE) is a powerful framework for selecting and calibrating science-based mathematical models from data. This work extends popular MBDoE workflows by proposing a convex mixed integer (non)linear programming (MINLP) problem to optimize the selection of measurements. The solver MindtPy is modified to support calculating the D-optimality objective and its gradient via an external package, \texttt{SciPy}, using the grey-box module in Pyomo. The new approach is demonstrated in two case studies: estimating highly correlated kinetics from a batch reactor and estimating transport parameters in a large-scale rotary packed bed for CO$_2$ capture. Both case studies show how examining the Pareto-optimal trade-offs between information content measured by A- and D-optimality versus measurement budget offers practical guidance for selecting measurements for scientific experiments.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Mostly Beneficial Clustering: Aggregating Data for Operational Decision Making
Authors:
Chengzhang Li,
Zhenkang Peng,
Ying Rong
Abstract:
With increasingly volatile market conditions and rapid product innovations, operational decision-making for large-scale systems entails solving thousands of problems with limited data. Data aggregation is proposed to combine the data across problems to improve the decisions obtained by solving those problems individually. We propose a novel cluster-based Shrunken-SAA approach that can exploit the…
▽ More
With increasingly volatile market conditions and rapid product innovations, operational decision-making for large-scale systems entails solving thousands of problems with limited data. Data aggregation is proposed to combine the data across problems to improve the decisions obtained by solving those problems individually. We propose a novel cluster-based Shrunken-SAA approach that can exploit the cluster structure among problems when implementing the data aggregation approaches. We prove that, as the number of problems grows, leveraging the given cluster structure among problems yields additional benefits over the data aggregation approaches that neglect such structure. When the cluster structure is unknown, we show that unveiling the cluster structure, even at the cost of a few data points, can be beneficial, especially when the distance between clusters of problems is substantial. Our proposed approach can be extended to general cost functions under mild conditions. When the number of problems gets large, the optimality gap of our proposed approach decreases exponentially in the distance between the clusters. We explore the performance of the proposed approach through the application of managing newsvendor systems via numerical experiments. We investigate the impacts of distance metrics between problem instances on the performance of the cluster-based Shrunken-SAA approach with synthetic data. We further validate our proposed approach with real data and highlight the advantages of cluster-based data aggregation, especially in the small-data large-scale regime, compared to the existing approaches.
△ Less
Submitted 17 December, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Expectile-based conditional tail moments with covariates
Authors:
Qian Xiong,
Zuoxiang Peng
Abstract:
Expectile, as the minimizer of an asymmetric quadratic loss function, is a coherent risk measure and is helpful to use more information about the distribution of the considered risk. In this paper, we propose a new risk measure by replacing quantiles by expectiles, called expectile-based conditional tail moment, and focus on the estimation of this new risk measure as the conditional survival funct…
▽ More
Expectile, as the minimizer of an asymmetric quadratic loss function, is a coherent risk measure and is helpful to use more information about the distribution of the considered risk. In this paper, we propose a new risk measure by replacing quantiles by expectiles, called expectile-based conditional tail moment, and focus on the estimation of this new risk measure as the conditional survival function of the risk, given the risk exceeding the expectile and given a value of the covariates, is heavy tail. Under some regular conditions, asymptotic properties of this new estimator are considered. The extrapolated estimation of the conditional tail moments is also investigated. These results are illustrated both on simulated data and on a real insurance data.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Exploring differences in injury severity between occupant groups involved in fatal rear-end crashes: A correlated random parameter logit model with mean heterogeneity
Authors:
Renteng Yuan,
Xin Gu,
Zhipeng Peng,
Qiaojun Xiang
Abstract:
Rear-end crashes are one of the most common crash types. Passenger cars involved in rear-end crashes frequently produce severe outcomes. However, no study investigated the differences in the injury severity of occupant groups when cars are involved as following and leading vehicles in rear-end crashes. Therefore, the focus of this investigation is to compare the key factors affecting the injury se…
▽ More
Rear-end crashes are one of the most common crash types. Passenger cars involved in rear-end crashes frequently produce severe outcomes. However, no study investigated the differences in the injury severity of occupant groups when cars are involved as following and leading vehicles in rear-end crashes. Therefore, the focus of this investigation is to compare the key factors affecting the injury severity between the front- and rear-car occupant groups in rear-end crashes. First, data is extracted from the Fatality Analysis Reporting System (FARS) for two types of rear-end crashes from 2017 to 2019, including passenger cars as rear-end and rear-ended vehicles. Significant injury severity difference between front- and rear-car occupant groups is found by conducting likelihood ratio test. Moreover, the front- and rear-car occupant groups are modelled by the correlated random parameter logit model with heterogeneity in means (CRPLHM) and the random parameter logit model with heterogeneity in means (RPLHM), respectively. From the modeling, the significant factors are occupant positions, driver age, overturn, vehicle type, etc. For instance, the driving and front-right positions significantly increase the probability of severe injury when struck by another vehicle. Large truck-strike-car tends to cause severe outcomes compared to car-strike-large truck. This study provides an insightful knowledge of mechanism of occupant injury severity in rear-end crashes, and propose some effective countermeasures to mitigate the crash severity, such as implementing stricter seat belt laws, improving the coverage of the streetlights, strengthening car driver's emergency response ability.
△ Less
Submitted 5 July, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Modelling multivariate extreme value distributions via Markov trees
Authors:
Shuang Hu,
Zuoxiang Peng,
Johan Segers
Abstract:
Multivariate extreme value distributions are a common choice for modelling multivariate extremes. In high dimensions, however, the construction of flexible and parsimonious models is challenging. We propose to combine bivariate extreme value distributions into a Markov random field with respect to a tree. Although in general not an extreme value distribution itself, this Markov tree is attracted b…
▽ More
Multivariate extreme value distributions are a common choice for modelling multivariate extremes. In high dimensions, however, the construction of flexible and parsimonious models is challenging. We propose to combine bivariate extreme value distributions into a Markov random field with respect to a tree. Although in general not an extreme value distribution itself, this Markov tree is attracted by a multivariate extreme value distribution. The latter serves as a tree-based approximation to an unknown extreme value distribution with the given bivariate distributions as margins. Given data, we learn an appropriate tree structure by Prim's algorithm with estimated pairwise upper tail dependence coefficients or Kendall's tau values as edge weights. The distributions of pairs of connected variables can be fitted in various ways. The resulting tree-structured extreme value distribution allows for inference on rare event probabilities, as illustrated on river discharge data from the upper Danube basin.
△ Less
Submitted 29 July, 2022;
originally announced August 2022.
-
Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising
Authors:
Xiaotian Hao,
Zhaoqing Peng,
Yi Ma,
Guan Wang,
Junqi **,
Jianye Hao,
Shan Chen,
Rongquan Bai,
Mingzhou Xie,
Miao Xu,
Zhenzhe Zheng,
Chuan Yu,
Han Li,
Jian Xu,
Kun Gai
Abstract:
In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing adver…
▽ More
In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing advertising systems mainly focus on the immediate revenue with single ad exposures, ignoring the contribution of each exposure to the final conversion, thus usually falls into suboptimal solutions. In this paper, we formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space while ensuring the solution quality. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach. Extensive offline and online experiments show the superior performance of our approaches over state-of-the-art baselines in terms of cumulative revenue.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Non-local Policy Optimization via Diversity-regularized Collaborative Exploration
Authors:
Zhenghao Peng,
Hao Sun,
Bolei Zhou
Abstract:
Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and p…
▽ More
Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE). DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences. A regularization mechanism is further designed to maintain the diversity of the team and modulate the exploration. We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines in the MuJoCo locomotion tasks.
△ Less
Submitted 13 June, 2020;
originally announced June 2020.
-
An empirical Bayes Approach to stochastic blockmodels and graphons: shrinkage estimation and model selection
Authors:
Zhanhao Peng,
Qing Zhou
Abstract:
The graphon (W-graph), including the stochastic block model as a special case, has been widely used in modeling and analyzing network data. This random graph model is well-characterized by its graphon function, and estimation of the graphon function has gained a lot of recent research interests. Most existing works focus on community detection in the latent space of the model, while adopting simpl…
▽ More
The graphon (W-graph), including the stochastic block model as a special case, has been widely used in modeling and analyzing network data. This random graph model is well-characterized by its graphon function, and estimation of the graphon function has gained a lot of recent research interests. Most existing works focus on community detection in the latent space of the model, while adopting simple maximum likelihood or Bayesian estimates for the graphon or connectivity parameters given the identified communities. In this work, we propose a hierarchical Binomial model and develop a novel empirical Bayes estimate of the connectivity matrix of a stochastic block model to approximate the graphon function. Based on the likelihood of our hierarchical model, we further introduce a model selection criterion for choosing the number of communities. Numerical results on extensive simulations and two well-annotated social networks demonstrate the superiority of our approach in terms of estimation accuracy and model selection.
△ Less
Submitted 5 September, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Novel Policy Seeking with Constrained Optimization
Authors:
Hao Sun,
Zhenghao Peng,
Bo Dai,
Jian Guo,
Dahua Lin,
Bolei Zhou
Abstract:
In problem-solving, we humans can come up with multiple novel solutions to the same problem. However, reinforcement learning algorithms can only produce a set of monotonous policies that maximize the cumulative reward but lack diversity and novelty. In this work, we address the problem of generating novel policies in reinforcement learning tasks. Instead of following the multi-objective framework…
▽ More
In problem-solving, we humans can come up with multiple novel solutions to the same problem. However, reinforcement learning algorithms can only produce a set of monotonous policies that maximize the cumulative reward but lack diversity and novelty. In this work, we address the problem of generating novel policies in reinforcement learning tasks. Instead of following the multi-objective framework used in existing methods, we propose to rethink the problem under a novel perspective of constrained optimization. We first introduce a new metric to evaluate the difference between policies and then design two practical novel policy generation methods following the new perspective. The two proposed methods, namely the Constrained Task Novel Bisector (CTNB) and the Interior Policy Differentiation (IPD), are derived from the feasible direction method and the interior point method commonly known in the constrained optimization literature. Experimental comparisons on the MuJoCo control suite show our methods can achieve substantial improvement over previous novelty-seeking methods in terms of both the novelty of policies and their performances in the primal task.
△ Less
Submitted 29 October, 2022; v1 submitted 21 May, 2020;
originally announced May 2020.
-
An Inexact Manifold Augmented Lagrangian Method for Adaptive Sparse Canonical Correlation Analysis with Trace Lasso Regularization
Authors:
Kangkang Deng,
Zheng Peng
Abstract:
Canonical correlation analysis (CCA for short) describes the relationship between two sets of variables by finding some linear combinations of these variables that maximizing the correlation coefficient. However, in high-dimensional settings where the number of variables exceeds sample size, or in the case of that the variables are highly correlated, the traditional CCA is no longer appropriate. I…
▽ More
Canonical correlation analysis (CCA for short) describes the relationship between two sets of variables by finding some linear combinations of these variables that maximizing the correlation coefficient. However, in high-dimensional settings where the number of variables exceeds sample size, or in the case of that the variables are highly correlated, the traditional CCA is no longer appropriate. In this paper, an adaptive sparse version of CCA (ASCCA for short) is proposed by using the trace Lasso regularization. The proposed ASCCA reduces the instability of the estimator when the covariates are highly correlated, and thus improves its interpretation. The ASCCA is further reformulated to an optimization problem on Riemannian manifolds, and an manifold inexact augmented Lagrangian method is then proposed for the resulting optimization problem. The performance of the ASCCA is compared with the other sparse CCA techniques in different simulation settings, which illustrates that the ASCCA is feasible and efficient.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Self-Supervised Graph Representation Learning via Global Context Prediction
Authors:
Zhen Peng,
Yixiang Dong,
Minnan Luo,
Xiao-Ming Wu,
Qinghua Zheng
Abstract:
To take full advantage of fast-growing unlabeled networked data, this paper introduces a novel self-supervised strategy for graph representation learning by exploiting natural supervision provided by the data itself. Inspired by human social behavior, we assume that the global context of each node is composed of all nodes in the graph since two arbitrary entities in a connected network could inter…
▽ More
To take full advantage of fast-growing unlabeled networked data, this paper introduces a novel self-supervised strategy for graph representation learning by exploiting natural supervision provided by the data itself. Inspired by human social behavior, we assume that the global context of each node is composed of all nodes in the graph since two arbitrary entities in a connected network could interact with each other via paths of varying length. Based on this, we investigate whether the global context can be a source of free and effective supervisory signals for learning useful node representations. Specifically, we randomly select pairs of nodes in a graph and train a well-designed neural net to predict the contextual position of one node relative to the other. Our underlying hypothesis is that the representations learned from such within-graph context would capture the global topology of the graph and finely characterize the similarity and differentiation between nodes, which is conducive to various downstream learning tasks. Extensive benchmark experiments including node classification, clustering, and link prediction demonstrate that our approach outperforms many state-of-the-art unsupervised methods and sometimes even exceeds the performance of supervised counterparts.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Graph Representation Learning via Graphical Mutual Information Maximization
Authors:
Zhen Peng,
Wenbing Huang,
Minnan Luo,
Qinghua Zheng,
Yu Rong,
Tingyang Xu,
Junzhou Huang
Abstract:
The richness in the content of various information networks such as social networks and communication networks provides the unprecedented potential for learning high-quality expressive representations without external supervision. This paper investigates how to preserve and extract the abundant information from graph-structured data into embedding space in an unsupervised manner. To this end, we p…
▽ More
The richness in the content of various information networks such as social networks and communication networks provides the unprecedented potential for learning high-quality expressive representations without external supervision. This paper investigates how to preserve and extract the abundant information from graph-structured data into embedding space in an unsupervised manner. To this end, we propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations. GMI generalizes the idea of conventional mutual information computations from vector space to the graph domain where measuring mutual information from two aspects of node features and topological structure is indispensable. GMI exhibits several benefits: First, it is invariant to the isomorphic transformation of input graphs---an inevitable constraint in many existing graph representation learning algorithms; Besides, it can be efficiently estimated and maximized by current mutual information estimation methods such as MINE; Finally, our theoretical analysis confirms its correctness and rationality. With the aid of GMI, we develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder. Considerable experiments on transductive as well as inductive node classification and link prediction demonstrate that our method outperforms state-of-the-art unsupervised counterparts, and even sometimes exceeds the performance of supervised ones.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
Imitation Learning of Neural Spatio-Temporal Point Processes
Authors:
Shixiang Zhu,
Shuang Li,
Zhigang Peng,
Yao Xie
Abstract:
We present a novel Neural Embedding Spatio-Temporal (NEST) point process model for spatio-temporal discrete event data and develop an efficient imitation learning (a type of reinforcement learning) based approach for model fitting. Despite the rapid development of one-dimensional temporal point processes for discrete event data, the study of spatial-temporal aspects of such data is relatively scar…
▽ More
We present a novel Neural Embedding Spatio-Temporal (NEST) point process model for spatio-temporal discrete event data and develop an efficient imitation learning (a type of reinforcement learning) based approach for model fitting. Despite the rapid development of one-dimensional temporal point processes for discrete event data, the study of spatial-temporal aspects of such data is relatively scarce. Our model captures complex spatio-temporal dependence between discrete events by carefully design a mixture of heterogeneous Gaussian diffusion kernels, whose parameters are parameterized by neural networks. This new kernel is the key that our model can capture intricate spatial dependence patterns and yet still lead to interpretable results as we examine maps of Gaussian diffusion kernel parameters. The imitation learning model fitting for the NEST is more robust than the maximum likelihood estimate. It directly measures the divergence between the empirical distributions between the training data and the model-generated data. Moreover, our imitation learning-based approach enjoys computational efficiency due to the explicit characterization of the reward function related to the likelihood function; furthermore, the likelihood function under our model enjoys tractable expression due to Gaussian kernel parameterization. Experiments based on real data show our method's good performance relative to the state-of-the-art and the good interpretability of NEST's result.
△ Less
Submitted 22 January, 2021; v1 submitted 12 June, 2019;
originally announced June 2019.
-
SVRG for Policy Evaluation with Fewer Gradient Evaluations
Authors:
Zilun Peng,
Ahmed Touati,
Pascal Vincent,
Doina Precup
Abstract:
Stochastic variance-reduced gradient (SVRG) is an optimization method originally designed for tackling machine learning problems with a finite sum structure. SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy. SVRG makes use of gradient estimates at two scales. At the slower scale, SVRG computes…
▽ More
Stochastic variance-reduced gradient (SVRG) is an optimization method originally designed for tackling machine learning problems with a finite sum structure. SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy. SVRG makes use of gradient estimates at two scales. At the slower scale, SVRG computes a full gradient over the whole dataset, which could lead to prohibitive computation costs. In this work, we show that two variants of SVRG for policy evaluation could significantly diminish the number of gradient calculations while preserving a linear convergence speed. More importantly, our theoretical result implies that one does not need to use the entire dataset in every epoch of SVRG when it is applied to policy evaluation with linear function approximation. Our experiments demonstrate large computational savings provided by the proposed methods.
△ Less
Submitted 19 June, 2020; v1 submitted 9 June, 2019;
originally announced June 2019.
-
Towards Understanding Regularization in Batch Normalization
Authors:
** Luo,
Xinjiang Wang,
Wenqi Shao,
Zhanglin Peng
Abstract:
Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit r…
▽ More
Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization. Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate. Third, generalization of BN is explored by using statistical mechanics. Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses.
△ Less
Submitted 24 April, 2019; v1 submitted 4 September, 2018;
originally announced September 2018.
-
AXNet: ApproXimate computing using an end-to-end trainable neural network
Authors:
Zhenghao Peng,
Xuyang Chen,
Chengwen Xu,
Naifeng **g,
Xiaoyao Liang,
Cewu Lu,
Li Jiang
Abstract:
Neural network based approximate computing is a universal architecture promising to gain tremendous energy-efficiency for many error resilient applications. To guarantee the approximation quality, existing works deploy two neural networks (NNs), e.g., an approximator and a predictor. The approximator provides the approximate results, while the predictor predicts whether the input data is safe to a…
▽ More
Neural network based approximate computing is a universal architecture promising to gain tremendous energy-efficiency for many error resilient applications. To guarantee the approximation quality, existing works deploy two neural networks (NNs), e.g., an approximator and a predictor. The approximator provides the approximate results, while the predictor predicts whether the input data is safe to approximate with the given quality requirement. However, it is non-trivial and time-consuming to make these two neural network coordinate---they have different optimization objectives---by training them separately. This paper proposes a novel neural network structure---AXNet---to fuse two NNs to a holistic end-to-end trainable NN. Leveraging the philosophy of multi-task learning, AXNet can tremendously improve the invocation (proportion of safe-to-approximate samples) and reduce the approximation error. The training effort also decrease significantly. Experiment results show 50.7% more invocation and substantial cuts of training time when compared to existing neural network based approximate computing framework.
△ Less
Submitted 18 December, 2018; v1 submitted 27 July, 2018;
originally announced July 2018.
-
Active Learning for Segmentation by Optimizing Content Information for Maximal Entropy
Authors:
Firat Ozdemir,
Zixuan Peng,
Christine Tanner,
Philipp Fuernstahl,
Orcun Goksel
Abstract:
Segmentation is essential for medical image analysis tasks such as intervention planning, therapy guidance, diagnosis, treatment decisions. Deep learning is becoming increasingly prominent for segmentation, where the lack of annotations, however, often becomes the main limitation. Due to privacy concerns and ethical considerations, most medical datasets are created, curated, and allow access only…
▽ More
Segmentation is essential for medical image analysis tasks such as intervention planning, therapy guidance, diagnosis, treatment decisions. Deep learning is becoming increasingly prominent for segmentation, where the lack of annotations, however, often becomes the main limitation. Due to privacy concerns and ethical considerations, most medical datasets are created, curated, and allow access only locally. Furthermore, current deep learning methods are often suboptimal in translating anatomical knowledge between different medical imaging modalities. Active learning can be used to select an informed set of image samples to request for manual annotation, in order to best utilize the limited annotation time of clinical experts for optimal outcomes, which we focus on in this work. Our contributions herein are two fold: (1) we enforce domain-representativeness of selected samples using a proposed penalization scheme to maximize information at the network abstraction layer, and (2) we propose a Borda-count based sample querying scheme for selecting samples for segmentation. Comparative experiments with baseline approaches show that the samples queried with our proposed method, where both above contributions are combined, result in significantly improved segmentation performance for this active learning task.
△ Less
Submitted 18 July, 2018;
originally announced July 2018.
-
Approximate Random Dropout
Authors:
Zhuoran Song,
Ru Wang,
Dongyu Ru,
Hongru Huang,
Zhenghao Peng,
**g Ke,
Xiaoyao Liang,
Li Jiang
Abstract:
The training phases of Deep neural network~(DNN) consumes enormous processing time and energy. Compression techniques utilizing the sparsity of DNNs can effectively accelerate the inference phase of DNNs. However, it can be hardly used in the training phase because the training phase involves dense matrix-multiplication using General Purpose Computation on Graphics Processors (GPGPU), which endors…
▽ More
The training phases of Deep neural network~(DNN) consumes enormous processing time and energy. Compression techniques utilizing the sparsity of DNNs can effectively accelerate the inference phase of DNNs. However, it can be hardly used in the training phase because the training phase involves dense matrix-multiplication using General Purpose Computation on Graphics Processors (GPGPU), which endorse regular and structural data layout. In this paper, we propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and predefined patterns to eliminate the unnecessary computation and data access. To compensate the potential performance loss we develop a SGD-based Search Algorithm to produce the distribution of dropout patterns. We prove our approach is statistically equivalent to the previous dropout method. Experiments results on MLP and LSTM using well-known benchmarks show that the proposed Approximate Random Dropout can reduce the training time by $20\%$-$77\%$ ($19\%$-$60\%$) when dropout rate is $0.3$-$0.7$ on MLP (LSTM) with marginal accuracy drop.
△ Less
Submitted 14 December, 2018; v1 submitted 22 May, 2018;
originally announced May 2018.
-
Comparing Aggregators for Relational Probabilistic Models
Authors:
Seyed Mehran Kazemi,
Bahare Fatemi,
Alexandra Kim,
Zilun Peng,
Moumita Roy Tora,
Xing Zeng,
Matthew Dirks,
David Poole
Abstract:
Relational probabilistic models have the challenge of aggregation, where one variable depends on a population of other variables. Consider the problem of predicting gender from movie ratings; this is challenging because the number of movies per user and users per movie can vary greatly. Surprisingly, aggregation is not well understood. In this paper, we show that existing relational models (implic…
▽ More
Relational probabilistic models have the challenge of aggregation, where one variable depends on a population of other variables. Consider the problem of predicting gender from movie ratings; this is challenging because the number of movies per user and users per movie can vary greatly. Surprisingly, aggregation is not well understood. In this paper, we show that existing relational models (implicitly or explicitly) either use simple numerical aggregators that lose great amounts of information, or correspond to naive Bayes, logistic regression, or noisy-OR that suffer from overconfidence. We propose new simple aggregators and simple modifications of existing models that empirically outperform the existing ones. The intuition we provide on different (existing or new) models and their shortcomings plus our empirical findings promise to form the foundation for future representations.
△ Less
Submitted 24 July, 2017;
originally announced July 2017.
-
Multilayer Perceptron Algebra
Authors:
Zhao Peng
Abstract:
Artificial Neural Networks(ANN) has been phenomenally successful on various pattern recognition tasks. However, the design of neural networks rely heavily on the experience and intuitions of individual developers. In this article, the author introduces a mathematical structure called MLP algebra on the set of all Multilayer Perceptron Neural Networks(MLP), which can serve as a guiding principle to…
▽ More
Artificial Neural Networks(ANN) has been phenomenally successful on various pattern recognition tasks. However, the design of neural networks rely heavily on the experience and intuitions of individual developers. In this article, the author introduces a mathematical structure called MLP algebra on the set of all Multilayer Perceptron Neural Networks(MLP), which can serve as a guiding principle to build MLPs accommodating to the particular data sets, and to build complex MLPs from simpler ones.
△ Less
Submitted 18 January, 2017;
originally announced January 2017.
-
On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays
Authors:
Zhimin Peng,
Yangyang Xu,
Ming Yan,
Wotao Yin
Abstract:
Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost a…
▽ More
Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown.
This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.
△ Less
Submitted 15 November, 2017; v1 submitted 13 December, 2016;
originally announced December 2016.
-
Coordinate Friendly Structures, Algorithms and Applications
Authors:
Zhimin Peng,
Tianyu Wu,
Yangyang Xu,
Ming Yan,
Wotao Yin
Abstract:
This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear map**s, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addit…
▽ More
This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear map**s, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize.
The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates.
Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.
△ Less
Submitted 14 August, 2016; v1 submitted 5 January, 2016;
originally announced January 2016.
-
ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates
Authors:
Zhimin Peng,
Yangyang Xu,
Ming Yan,
Wotao Yin
Abstract:
Finding a fixed point to a nonexpansive operator, i.e., $x^*=Tx^*$, abstracts many problems in numerical linear algebra, optimization, and other areas of scientific computing. To solve fixed-point problems, we propose ARock, an algorithmic framework in which multiple agents (machines, processors, or cores) update $x$ in an asynchronous parallel fashion. Asynchrony is crucial to parallel computing…
▽ More
Finding a fixed point to a nonexpansive operator, i.e., $x^*=Tx^*$, abstracts many problems in numerical linear algebra, optimization, and other areas of scientific computing. To solve fixed-point problems, we propose ARock, an algorithmic framework in which multiple agents (machines, processors, or cores) update $x$ in an asynchronous parallel fashion. Asynchrony is crucial to parallel computing since it reduces synchronization wait, relaxes communication bottleneck, and thus speeds up computing significantly. At each step of ARock, an agent updates a randomly selected coordinate $x_i$ based on possibly out-of-date information on $x$. The agents share $x$ through either global memory or communication. If writing $x_i$ is atomic, the agents can read and write $x$ without memory locks.
Theoretically, we show that if the nonexpansive operator $T$ has a fixed point, then with probability one, ARock generates a sequence that converges to a fixed points of $T$. Our conditions on $T$ and step sizes are weaker than comparable work. Linear convergence is also obtained.
We propose special cases of ARock for linear systems, convex optimization, machine learning, as well as distributed and decentralized consensus problems. Numerical experiments of solving sparse logistic regression problems are presented.
△ Less
Submitted 26 May, 2016; v1 submitted 8 June, 2015;
originally announced June 2015.
-
Dynamic Bivariate Normal Copula
Authors:
Xin Liao,
Liang Peng,
Zuoxiang Peng,
Yanting Zheng
Abstract:
Normal copula with a correlation coefficient between $-1$ and $1$ is tail independent and so it severely underestimates extreme probabilities. By letting the correlation coefficient in a normal copula depend on the sample size, Hüsler and Reiss (1989) showed that the tail can become asymptotically dependent. In this paper, we extend this result by deriving the limit of the normalized maximum of…
▽ More
Normal copula with a correlation coefficient between $-1$ and $1$ is tail independent and so it severely underestimates extreme probabilities. By letting the correlation coefficient in a normal copula depend on the sample size, Hüsler and Reiss (1989) showed that the tail can become asymptotically dependent. In this paper, we extend this result by deriving the limit of the normalized maximum of $n$ independent observations, where the $i$-th observation follows from a normal copula with its correlation coefficient being either a parametric or a nonparametric function of $i/n$. Furthermore, both parametric and nonparametric inference for this unknown function are studied, which can be employed to test the condition in Hüsler and Reiss (1989). A simulation study and real data analysis are presented too.
△ Less
Submitted 14 May, 2015;
originally announced May 2015.
-
Asymptotics and statistical inferences on independent and non-identically distributed bivariate Gaussian triangular arrays
Authors:
Xin Liao,
Zuoxiang Peng
Abstract:
In this paper, we establish the first and the second-order asymptotics of distributions of normalized maxima of independent and non-identically distributed bivariate Gaussian triangular arrays, where each vector of the $n$th row follows from a bivariate Gaussian distribution with correlation coefficient being a monotone continuous function of $i/n$. Furthermore, parametric inference for this unkno…
▽ More
In this paper, we establish the first and the second-order asymptotics of distributions of normalized maxima of independent and non-identically distributed bivariate Gaussian triangular arrays, where each vector of the $n$th row follows from a bivariate Gaussian distribution with correlation coefficient being a monotone continuous function of $i/n$. Furthermore, parametric inference for this unknown function is studied. Some simulation study and real data sets analysis are also presented.
△ Less
Submitted 26 April, 2016; v1 submitted 13 May, 2015;
originally announced May 2015.
-
Tail Asymptotic Expansions for L-Statistics
Authors:
E. Hashorva,
C. Ling,
Z. Peng
Abstract:
In this paper, we derive higher-order expansions of $L$-statistics of independent risks $X_1, \ldots, X_n$ under conditions on the underlying distribution function $F$. The new results are applied to derive the asymptotic expansions of ratios of two kinds of risk measures, stop-loss premium and excess return on capital, respectively.
In this paper, we derive higher-order expansions of $L$-statistics of independent risks $X_1, \ldots, X_n$ under conditions on the underlying distribution function $F$. The new results are applied to derive the asymptotic expansions of ratios of two kinds of risk measures, stop-loss premium and excess return on capital, respectively.
△ Less
Submitted 25 February, 2014;
originally announced February 2014.
-
Higher-order expansions of distributions of maxima in a Hüsler-Reiss model
Authors:
E. Hashorva,
Z. Peng,
Z. Weng
Abstract:
The max-stable Hüsler-Reiss distribution which arises as the limit distribution of maxima of bivariate Gaussian triangular arrays has been shown to be useful in various extreme value models. For such triangular arrays, this paper establishes higher-order asymptotic expansions of the joint distribution of maxima under refined Hüsler-Reiss conditions. In particular, the rate of convergence of normal…
▽ More
The max-stable Hüsler-Reiss distribution which arises as the limit distribution of maxima of bivariate Gaussian triangular arrays has been shown to be useful in various extreme value models. For such triangular arrays, this paper establishes higher-order asymptotic expansions of the joint distribution of maxima under refined Hüsler-Reiss conditions. In particular, the rate of convergence of normalized maxima to the Hüsler-Reiss distribution is explicitly calculated.
△ Less
Submitted 23 February, 2014;
originally announced February 2014.
-
Rates of convergence of extremes from skew normal samples
Authors:
Xin Liao,
Zuoxiang Peng,
Saralees Nadarajah,
Xiaoqian Wang
Abstract:
For a skew normal random sequence, convergence rates of the distribution of its partial maximum to the Gumbel extreme value distribution are derived. The asymptotic expansion of the distribution of the normalized maximum is given under an optimal choice of norming constants. We find that the optimal convergence rate of the normalized maximum to the Gumbel extreme value distribution is proportional…
▽ More
For a skew normal random sequence, convergence rates of the distribution of its partial maximum to the Gumbel extreme value distribution are derived. The asymptotic expansion of the distribution of the normalized maximum is given under an optimal choice of norming constants. We find that the optimal convergence rate of the normalized maximum to the Gumbel extreme value distribution is proportional to $1/\log n$.
△ Less
Submitted 5 December, 2012;
originally announced December 2012.