Search | arXiv e-print repository

Measure This, Not That: Optimizing the Cost and Model-Based Information Content of Measurements

Authors: Jialu Wang, Zedong Peng, Ryan Hughes, Debangsu Bhattacharyya, David E. Bernal Neira, Alexander W. Dowling

Abstract: Model-based design of experiments (MBDoE) is a powerful framework for selecting and calibrating science-based mathematical models from data. This work extends popular MBDoE workflows by proposing a convex mixed integer (non)linear programming (MINLP) problem to optimize the selection of measurements. The solver MindtPy is modified to support calculating the D-optimality objective and its gradient… ▽ More Model-based design of experiments (MBDoE) is a powerful framework for selecting and calibrating science-based mathematical models from data. This work extends popular MBDoE workflows by proposing a convex mixed integer (non)linear programming (MINLP) problem to optimize the selection of measurements. The solver MindtPy is modified to support calculating the D-optimality objective and its gradient via an external package, \texttt{SciPy}, using the grey-box module in Pyomo. The new approach is demonstrated in two case studies: estimating highly correlated kinetics from a batch reactor and estimating transport parameters in a large-scale rotary packed bed for CO$_2$ capture. Both case studies show how examining the Pareto-optimal trade-offs between information content measured by A- and D-optimality versus measurement budget offers practical guidance for selecting measurements for scientific experiments. △ Less

Submitted 13 June, 2024; originally announced June 2024.

MSC Class: 90C25; 90C11; 90C30; 90C90; 62K05

arXiv:2311.17326 [pdf, other]

Mostly Beneficial Clustering: Aggregating Data for Operational Decision Making

Authors: Chengzhang Li, Zhenkang Peng, Ying Rong

Abstract: With increasingly volatile market conditions and rapid product innovations, operational decision-making for large-scale systems entails solving thousands of problems with limited data. Data aggregation is proposed to combine the data across problems to improve the decisions obtained by solving those problems individually. We propose a novel cluster-based Shrunken-SAA approach that can exploit the… ▽ More With increasingly volatile market conditions and rapid product innovations, operational decision-making for large-scale systems entails solving thousands of problems with limited data. Data aggregation is proposed to combine the data across problems to improve the decisions obtained by solving those problems individually. We propose a novel cluster-based Shrunken-SAA approach that can exploit the cluster structure among problems when implementing the data aggregation approaches. We prove that, as the number of problems grows, leveraging the given cluster structure among problems yields additional benefits over the data aggregation approaches that neglect such structure. When the cluster structure is unknown, we show that unveiling the cluster structure, even at the cost of a few data points, can be beneficial, especially when the distance between clusters of problems is substantial. Our proposed approach can be extended to general cost functions under mild conditions. When the number of problems gets large, the optimality gap of our proposed approach decreases exponentially in the distance between the clusters. We explore the performance of the proposed approach through the application of managing newsvendor systems via numerical experiments. We investigate the impacts of distance metrics between problem instances on the performance of the cluster-based Shrunken-SAA approach with synthetic data. We further validate our proposed approach with real data and highlight the advantages of cluster-based data aggregation, especially in the small-data large-scale regime, compared to the existing approaches. △ Less

Submitted 17 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2310.18963 [pdf, other]

Expectile-based conditional tail moments with covariates

Authors: Qian Xiong, Zuoxiang Peng

Abstract: Expectile, as the minimizer of an asymmetric quadratic loss function, is a coherent risk measure and is helpful to use more information about the distribution of the considered risk. In this paper, we propose a new risk measure by replacing quantiles by expectiles, called expectile-based conditional tail moment, and focus on the estimation of this new risk measure as the conditional survival funct… ▽ More Expectile, as the minimizer of an asymmetric quadratic loss function, is a coherent risk measure and is helpful to use more information about the distribution of the considered risk. In this paper, we propose a new risk measure by replacing quantiles by expectiles, called expectile-based conditional tail moment, and focus on the estimation of this new risk measure as the conditional survival function of the risk, given the risk exceeding the expectile and given a value of the covariates, is heavy tail. Under some regular conditions, asymptotic properties of this new estimator are considered. The extrapolated estimation of the conditional tail moments is also investigated. These results are illustrated both on simulated data and on a real insurance data. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: 17 pages, 7 figures

MSC Class: 60G70

arXiv:2303.12159 [pdf]

doi 10.1080/19427867.2023.2292859

Exploring differences in injury severity between occupant groups involved in fatal rear-end crashes: A correlated random parameter logit model with mean heterogeneity

Authors: Renteng Yuan, Xin Gu, Zhipeng Peng, Qiaojun Xiang

Abstract: Rear-end crashes are one of the most common crash types. Passenger cars involved in rear-end crashes frequently produce severe outcomes. However, no study investigated the differences in the injury severity of occupant groups when cars are involved as following and leading vehicles in rear-end crashes. Therefore, the focus of this investigation is to compare the key factors affecting the injury se… ▽ More Rear-end crashes are one of the most common crash types. Passenger cars involved in rear-end crashes frequently produce severe outcomes. However, no study investigated the differences in the injury severity of occupant groups when cars are involved as following and leading vehicles in rear-end crashes. Therefore, the focus of this investigation is to compare the key factors affecting the injury severity between the front- and rear-car occupant groups in rear-end crashes. First, data is extracted from the Fatality Analysis Reporting System (FARS) for two types of rear-end crashes from 2017 to 2019, including passenger cars as rear-end and rear-ended vehicles. Significant injury severity difference between front- and rear-car occupant groups is found by conducting likelihood ratio test. Moreover, the front- and rear-car occupant groups are modelled by the correlated random parameter logit model with heterogeneity in means (CRPLHM) and the random parameter logit model with heterogeneity in means (RPLHM), respectively. From the modeling, the significant factors are occupant positions, driver age, overturn, vehicle type, etc. For instance, the driving and front-right positions significantly increase the probability of severe injury when struck by another vehicle. Large truck-strike-car tends to cause severe outcomes compared to car-strike-large truck. This study provides an insightful knowledge of mechanism of occupant injury severity in rear-end crashes, and propose some effective countermeasures to mitigate the crash severity, such as implementing stricter seat belt laws, improving the coverage of the streetlights, strengthening car driver's emergency response ability. △ Less

Submitted 5 July, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Report number: 1 - 13

Journal ref: Transportation Letters The International Journal of Transportation Research,2023

arXiv:2208.02627 [pdf, ps, other]

Modelling multivariate extreme value distributions via Markov trees

Authors: Shuang Hu, Zuoxiang Peng, Johan Segers

Abstract: Multivariate extreme value distributions are a common choice for modelling multivariate extremes. In high dimensions, however, the construction of flexible and parsimonious models is challenging. We propose to combine bivariate extreme value distributions into a Markov random field with respect to a tree. Although in general not an extreme value distribution itself, this Markov tree is attracted b… ▽ More Multivariate extreme value distributions are a common choice for modelling multivariate extremes. In high dimensions, however, the construction of flexible and parsimonious models is challenging. We propose to combine bivariate extreme value distributions into a Markov random field with respect to a tree. Although in general not an extreme value distribution itself, this Markov tree is attracted by a multivariate extreme value distribution. The latter serves as a tree-based approximation to an unknown extreme value distribution with the given bivariate distributions as margins. Given data, we learn an appropriate tree structure by Prim's algorithm with estimated pairwise upper tail dependence coefficients or Kendall's tau values as edge weights. The distributions of pairs of connected variables can be fitted in various ways. The resulting tree-structured extreme value distribution allows for inference on rare event probabilities, as illustrated on river discharge data from the upper Danube basin. △ Less

Submitted 29 July, 2022; originally announced August 2022.

Comments: 37 pages, 10 figures, 7 tables

MSC Class: 62G32; 62H22

arXiv:2006.16312 [pdf, other]

Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

Authors: Xiaotian Hao, Zhaoqing Peng, Yi Ma, Guan Wang, Junqi **, Jianye Hao, Shan Chen, Rongquan Bai, Mingzhou Xie, Miao Xu, Zhenzhe Zheng, Chuan Yu, Han Li, Jian Xu, Kun Gai

Abstract: In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing adver… ▽ More In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing advertising systems mainly focus on the immediate revenue with single ad exposures, ignoring the contribution of each exposure to the final conversion, thus usually falls into suboptimal solutions. In this paper, we formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space while ensuring the solution quality. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach. Extensive offline and online experiments show the superior performance of our approaches over state-of-the-art baselines in terms of cumulative revenue. △ Less

Submitted 29 June, 2020; originally announced June 2020.

Comments: accepted by ICML 2020

arXiv:2006.07781 [pdf, other]

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Authors: Zhenghao Peng, Hao Sun, Bolei Zhou

Abstract: Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and p… ▽ More Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE). DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences. A regularization mechanism is further designed to maintain the diversity of the team and modulate the exploration. We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines in the MuJoCo locomotion tasks. △ Less

Submitted 13 June, 2020; originally announced June 2020.

Comments: https://decisionforce.github.io/DiCE/

arXiv:2006.07435 [pdf, other]

An empirical Bayes Approach to stochastic blockmodels and graphons: shrinkage estimation and model selection

Authors: Zhanhao Peng, Qing Zhou

Abstract: The graphon (W-graph), including the stochastic block model as a special case, has been widely used in modeling and analyzing network data. This random graph model is well-characterized by its graphon function, and estimation of the graphon function has gained a lot of recent research interests. Most existing works focus on community detection in the latent space of the model, while adopting simpl… ▽ More The graphon (W-graph), including the stochastic block model as a special case, has been widely used in modeling and analyzing network data. This random graph model is well-characterized by its graphon function, and estimation of the graphon function has gained a lot of recent research interests. Most existing works focus on community detection in the latent space of the model, while adopting simple maximum likelihood or Bayesian estimates for the graphon or connectivity parameters given the identified communities. In this work, we propose a hierarchical Binomial model and develop a novel empirical Bayes estimate of the connectivity matrix of a stochastic block model to approximate the graphon function. Based on the likelihood of our hierarchical model, we further introduce a model selection criterion for choosing the number of communities. Numerical results on extensive simulations and two well-annotated social networks demonstrate the superiority of our approach in terms of estimation accuracy and model selection. △ Less

Submitted 5 September, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:2005.10696 [pdf, other]

Novel Policy Seeking with Constrained Optimization

Authors: Hao Sun, Zhenghao Peng, Bo Dai, Jian Guo, Dahua Lin, Bolei Zhou

Abstract: In problem-solving, we humans can come up with multiple novel solutions to the same problem. However, reinforcement learning algorithms can only produce a set of monotonous policies that maximize the cumulative reward but lack diversity and novelty. In this work, we address the problem of generating novel policies in reinforcement learning tasks. Instead of following the multi-objective framework… ▽ More In problem-solving, we humans can come up with multiple novel solutions to the same problem. However, reinforcement learning algorithms can only produce a set of monotonous policies that maximize the cumulative reward but lack diversity and novelty. In this work, we address the problem of generating novel policies in reinforcement learning tasks. Instead of following the multi-objective framework used in existing methods, we propose to rethink the problem under a novel perspective of constrained optimization. We first introduce a new metric to evaluate the difference between policies and then design two practical novel policy generation methods following the new perspective. The two proposed methods, namely the Constrained Task Novel Bisector (CTNB) and the Interior Policy Differentiation (IPD), are derived from the feasible direction method and the interior point method commonly known in the constrained optimization literature. Experimental comparisons on the MuJoCo control suite show our methods can achieve substantial improvement over previous novelty-seeking methods in terms of both the novelty of policies and their performances in the primal task. △ Less

Submitted 29 October, 2022; v1 submitted 21 May, 2020; originally announced May 2020.

arXiv:2003.09195 [pdf, ps, other]

An Inexact Manifold Augmented Lagrangian Method for Adaptive Sparse Canonical Correlation Analysis with Trace Lasso Regularization

Authors: Kangkang Deng, Zheng Peng

Abstract: Canonical correlation analysis (CCA for short) describes the relationship between two sets of variables by finding some linear combinations of these variables that maximizing the correlation coefficient. However, in high-dimensional settings where the number of variables exceeds sample size, or in the case of that the variables are highly correlated, the traditional CCA is no longer appropriate. I… ▽ More Canonical correlation analysis (CCA for short) describes the relationship between two sets of variables by finding some linear combinations of these variables that maximizing the correlation coefficient. However, in high-dimensional settings where the number of variables exceeds sample size, or in the case of that the variables are highly correlated, the traditional CCA is no longer appropriate. In this paper, an adaptive sparse version of CCA (ASCCA for short) is proposed by using the trace Lasso regularization. The proposed ASCCA reduces the instability of the estimator when the covariates are highly correlated, and thus improves its interpretation. The ASCCA is further reformulated to an optimization problem on Riemannian manifolds, and an manifold inexact augmented Lagrangian method is then proposed for the resulting optimization problem. The performance of the ASCCA is compared with the other sparse CCA techniques in different simulation settings, which illustrates that the ASCCA is feasible and efficient. △ Less

Submitted 20 March, 2020; originally announced March 2020.

Comments: 21 pages

MSC Class: 90C26; 90C30

arXiv:2003.01604 [pdf, other]

Self-Supervised Graph Representation Learning via Global Context Prediction

Authors: Zhen Peng, Yixiang Dong, Minnan Luo, Xiao-Ming Wu, Qinghua Zheng

Abstract: To take full advantage of fast-growing unlabeled networked data, this paper introduces a novel self-supervised strategy for graph representation learning by exploiting natural supervision provided by the data itself. Inspired by human social behavior, we assume that the global context of each node is composed of all nodes in the graph since two arbitrary entities in a connected network could inter… ▽ More To take full advantage of fast-growing unlabeled networked data, this paper introduces a novel self-supervised strategy for graph representation learning by exploiting natural supervision provided by the data itself. Inspired by human social behavior, we assume that the global context of each node is composed of all nodes in the graph since two arbitrary entities in a connected network could interact with each other via paths of varying length. Based on this, we investigate whether the global context can be a source of free and effective supervisory signals for learning useful node representations. Specifically, we randomly select pairs of nodes in a graph and train a well-designed neural net to predict the contextual position of one node relative to the other. Our underlying hypothesis is that the representations learned from such within-graph context would capture the global topology of the graph and finely characterize the similarity and differentiation between nodes, which is conducive to various downstream learning tasks. Extensive benchmark experiments including node classification, clustering, and link prediction demonstrate that our approach outperforms many state-of-the-art unsupervised methods and sometimes even exceeds the performance of supervised counterparts. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:2002.01169 [pdf, other]

Graph Representation Learning via Graphical Mutual Information Maximization

Authors: Zhen Peng, Wenbing Huang, Minnan Luo, Qinghua Zheng, Yu Rong, Tingyang Xu, Junzhou Huang

Abstract: The richness in the content of various information networks such as social networks and communication networks provides the unprecedented potential for learning high-quality expressive representations without external supervision. This paper investigates how to preserve and extract the abundant information from graph-structured data into embedding space in an unsupervised manner. To this end, we p… ▽ More The richness in the content of various information networks such as social networks and communication networks provides the unprecedented potential for learning high-quality expressive representations without external supervision. This paper investigates how to preserve and extract the abundant information from graph-structured data into embedding space in an unsupervised manner. To this end, we propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations. GMI generalizes the idea of conventional mutual information computations from vector space to the graph domain where measuring mutual information from two aspects of node features and topological structure is indispensable. GMI exhibits several benefits: First, it is invariant to the isomorphic transformation of input graphs---an inevitable constraint in many existing graph representation learning algorithms; Besides, it can be efficiently estimated and maximized by current mutual information estimation methods such as MINE; Finally, our theoretical analysis confirms its correctness and rationality. With the aid of GMI, we develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder. Considerable experiments on transductive as well as inductive node classification and link prediction demonstrate that our method outperforms state-of-the-art unsupervised counterparts, and even sometimes exceeds the performance of supervised ones. △ Less

Submitted 4 February, 2020; originally announced February 2020.

arXiv:1906.05467 [pdf, other]

Imitation Learning of Neural Spatio-Temporal Point Processes

Authors: Shixiang Zhu, Shuang Li, Zhigang Peng, Yao Xie

Abstract: We present a novel Neural Embedding Spatio-Temporal (NEST) point process model for spatio-temporal discrete event data and develop an efficient imitation learning (a type of reinforcement learning) based approach for model fitting. Despite the rapid development of one-dimensional temporal point processes for discrete event data, the study of spatial-temporal aspects of such data is relatively scar… ▽ More We present a novel Neural Embedding Spatio-Temporal (NEST) point process model for spatio-temporal discrete event data and develop an efficient imitation learning (a type of reinforcement learning) based approach for model fitting. Despite the rapid development of one-dimensional temporal point processes for discrete event data, the study of spatial-temporal aspects of such data is relatively scarce. Our model captures complex spatio-temporal dependence between discrete events by carefully design a mixture of heterogeneous Gaussian diffusion kernels, whose parameters are parameterized by neural networks. This new kernel is the key that our model can capture intricate spatial dependence patterns and yet still lead to interpretable results as we examine maps of Gaussian diffusion kernel parameters. The imitation learning model fitting for the NEST is more robust than the maximum likelihood estimate. It directly measures the divergence between the empirical distributions between the training data and the model-generated data. Moreover, our imitation learning-based approach enjoys computational efficiency due to the explicit characterization of the reward function related to the likelihood function; furthermore, the likelihood function under our model enjoys tractable expression due to Gaussian kernel parameterization. Experiments based on real data show our method's good performance relative to the state-of-the-art and the good interpretability of NEST's result. △ Less

Submitted 22 January, 2021; v1 submitted 12 June, 2019; originally announced June 2019.

arXiv:1906.03704 [pdf, other]

SVRG for Policy Evaluation with Fewer Gradient Evaluations

Authors: Zilun Peng, Ahmed Touati, Pascal Vincent, Doina Precup

Abstract: Stochastic variance-reduced gradient (SVRG) is an optimization method originally designed for tackling machine learning problems with a finite sum structure. SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy. SVRG makes use of gradient estimates at two scales. At the slower scale, SVRG computes… ▽ More Stochastic variance-reduced gradient (SVRG) is an optimization method originally designed for tackling machine learning problems with a finite sum structure. SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy. SVRG makes use of gradient estimates at two scales. At the slower scale, SVRG computes a full gradient over the whole dataset, which could lead to prohibitive computation costs. In this work, we show that two variants of SVRG for policy evaluation could significantly diminish the number of gradient calculations while preserving a linear convergence speed. More importantly, our theoretical result implies that one does not need to use the entire dataset in every epoch of SVRG when it is applied to policy evaluation with linear function approximation. Our experiments demonstrate large computational savings provided by the proposed methods. △ Less

Submitted 19 June, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

Comments: Short version of the paper is published in the proceedings of the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI2020)

arXiv:1809.00846 [pdf, other]

Towards Understanding Regularization in Batch Normalization

Authors: ** Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng

Abstract: Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit r… ▽ More Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization. Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate. Third, generalization of BN is explored by using statistical mechanics. Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses. △ Less

Submitted 24 April, 2019; v1 submitted 4 September, 2018; originally announced September 2018.

Comments: International Conference on Learning Representations (ICLR)

arXiv:1807.10458 [pdf, other]

AXNet: ApproXimate computing using an end-to-end trainable neural network

Authors: Zhenghao Peng, Xuyang Chen, Chengwen Xu, Naifeng **g, Xiaoyao Liang, Cewu Lu, Li Jiang

Abstract: Neural network based approximate computing is a universal architecture promising to gain tremendous energy-efficiency for many error resilient applications. To guarantee the approximation quality, existing works deploy two neural networks (NNs), e.g., an approximator and a predictor. The approximator provides the approximate results, while the predictor predicts whether the input data is safe to a… ▽ More Neural network based approximate computing is a universal architecture promising to gain tremendous energy-efficiency for many error resilient applications. To guarantee the approximation quality, existing works deploy two neural networks (NNs), e.g., an approximator and a predictor. The approximator provides the approximate results, while the predictor predicts whether the input data is safe to approximate with the given quality requirement. However, it is non-trivial and time-consuming to make these two neural network coordinate---they have different optimization objectives---by training them separately. This paper proposes a novel neural network structure---AXNet---to fuse two NNs to a holistic end-to-end trainable NN. Leveraging the philosophy of multi-task learning, AXNet can tremendously improve the invocation (proportion of safe-to-approximate samples) and reduce the approximation error. The training effort also decrease significantly. Experiment results show 50.7% more invocation and substantial cuts of training time when compared to existing neural network based approximate computing framework. △ Less

Submitted 18 December, 2018; v1 submitted 27 July, 2018; originally announced July 2018.

Comments: Accepted by ICCAD 2018

arXiv:1807.06962 [pdf, other]

doi 10.1007/978-3-030-00889-5_21

Active Learning for Segmentation by Optimizing Content Information for Maximal Entropy

Authors: Firat Ozdemir, Zixuan Peng, Christine Tanner, Philipp Fuernstahl, Orcun Goksel

Abstract: Segmentation is essential for medical image analysis tasks such as intervention planning, therapy guidance, diagnosis, treatment decisions. Deep learning is becoming increasingly prominent for segmentation, where the lack of annotations, however, often becomes the main limitation. Due to privacy concerns and ethical considerations, most medical datasets are created, curated, and allow access only… ▽ More Segmentation is essential for medical image analysis tasks such as intervention planning, therapy guidance, diagnosis, treatment decisions. Deep learning is becoming increasingly prominent for segmentation, where the lack of annotations, however, often becomes the main limitation. Due to privacy concerns and ethical considerations, most medical datasets are created, curated, and allow access only locally. Furthermore, current deep learning methods are often suboptimal in translating anatomical knowledge between different medical imaging modalities. Active learning can be used to select an informed set of image samples to request for manual annotation, in order to best utilize the limited annotation time of clinical experts for optimal outcomes, which we focus on in this work. Our contributions herein are two fold: (1) we enforce domain-representativeness of selected samples using a proposed penalization scheme to maximize information at the network abstraction layer, and (2) we propose a Borda-count based sample querying scheme for selecting samples for segmentation. Comparative experiments with baseline approaches show that the samples queried with our proposed method, where both above contributions are combined, result in significantly improved segmentation performance for this active learning task. △ Less

Submitted 18 July, 2018; originally announced July 2018.

Comments: 8 pages, 4 figures, Accepted to MICCAI 2018 Workshop: Deep Learning in Medical Image Analysis (DLMIA)

arXiv:1805.08939 [pdf, other]

Approximate Random Dropout

Authors: Zhuoran Song, Ru Wang, Dongyu Ru, Hongru Huang, Zhenghao Peng, **g Ke, Xiaoyao Liang, Li Jiang

Abstract: The training phases of Deep neural network~(DNN) consumes enormous processing time and energy. Compression techniques utilizing the sparsity of DNNs can effectively accelerate the inference phase of DNNs. However, it can be hardly used in the training phase because the training phase involves dense matrix-multiplication using General Purpose Computation on Graphics Processors (GPGPU), which endors… ▽ More The training phases of Deep neural network~(DNN) consumes enormous processing time and energy. Compression techniques utilizing the sparsity of DNNs can effectively accelerate the inference phase of DNNs. However, it can be hardly used in the training phase because the training phase involves dense matrix-multiplication using General Purpose Computation on Graphics Processors (GPGPU), which endorse regular and structural data layout. In this paper, we propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and predefined patterns to eliminate the unnecessary computation and data access. To compensate the potential performance loss we develop a SGD-based Search Algorithm to produce the distribution of dropout patterns. We prove our approach is statistically equivalent to the previous dropout method. Experiments results on MLP and LSTM using well-known benchmarks show that the proposed Approximate Random Dropout can reduce the training time by $20\%$-$77\%$ ($19\%$-$60\%$) when dropout rate is $0.3$-$0.7$ on MLP (LSTM) with marginal accuracy drop. △ Less

Submitted 14 December, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

Comments: 7 pages, 6 figures, conference

arXiv:1707.07785 [pdf, ps, other]

Comparing Aggregators for Relational Probabilistic Models

Authors: Seyed Mehran Kazemi, Bahare Fatemi, Alexandra Kim, Zilun Peng, Moumita Roy Tora, Xing Zeng, Matthew Dirks, David Poole

Abstract: Relational probabilistic models have the challenge of aggregation, where one variable depends on a population of other variables. Consider the problem of predicting gender from movie ratings; this is challenging because the number of movies per user and users per movie can vary greatly. Surprisingly, aggregation is not well understood. In this paper, we show that existing relational models (implic… ▽ More Relational probabilistic models have the challenge of aggregation, where one variable depends on a population of other variables. Consider the problem of predicting gender from movie ratings; this is challenging because the number of movies per user and users per movie can vary greatly. Surprisingly, aggregation is not well understood. In this paper, we show that existing relational models (implicitly or explicitly) either use simple numerical aggregators that lose great amounts of information, or correspond to naive Bayes, logistic regression, or noisy-OR that suffer from overconfidence. We propose new simple aggregators and simple modifications of existing models that empirically outperform the existing ones. The intuition we provide on different (existing or new) models and their shortcomings plus our empirical findings promise to form the foundation for future representations. △ Less

Submitted 24 July, 2017; originally announced July 2017.

Comments: 8 pages, Accepted at Statistical Relational AI (StarAI) workshop 2017

arXiv:1701.04968 [pdf, ps, other]

Multilayer Perceptron Algebra

Authors: Zhao Peng

Abstract: Artificial Neural Networks(ANN) has been phenomenally successful on various pattern recognition tasks. However, the design of neural networks rely heavily on the experience and intuitions of individual developers. In this article, the author introduces a mathematical structure called MLP algebra on the set of all Multilayer Perceptron Neural Networks(MLP), which can serve as a guiding principle to… ▽ More Artificial Neural Networks(ANN) has been phenomenally successful on various pattern recognition tasks. However, the design of neural networks rely heavily on the experience and intuitions of individual developers. In this article, the author introduces a mathematical structure called MLP algebra on the set of all Multilayer Perceptron Neural Networks(MLP), which can serve as a guiding principle to build MLPs accommodating to the particular data sets, and to build complex MLPs from simpler ones. △ Less

Submitted 18 January, 2017; originally announced January 2017.

arXiv:1612.04425 [pdf, other]

doi 10.1007/s40305-017-0183-1

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Authors: Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin

Abstract: Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost a… ▽ More Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm. △ Less

Submitted 15 November, 2017; v1 submitted 13 December, 2016; originally announced December 2016.

Comments: accepted to JORSC

Journal ref: Journal of the Operations Research Society of China, 7 (2019), 5-42

arXiv:1601.00863 [pdf, other]

doi 10.4310/AMSA.2016.v1.n1.a2

Coordinate Friendly Structures, Algorithms and Applications

Authors: Zhimin Peng, Tianyu Wu, Yangyang Xu, Ming Yan, Wotao Yin

Abstract: This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear map**s, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addit… ▽ More This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear map**s, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize. The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates. Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are. △ Less

Submitted 14 August, 2016; v1 submitted 5 January, 2016; originally announced January 2016.

Report number: UCLA CAM Report 16-13

Journal ref: Annals of Mathematical Sciences and Applications, 1 (2016), 57-119

arXiv:1506.02396 [pdf, other]

doi 10.1137/15M1024950

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

Authors: Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin

Abstract: Finding a fixed point to a nonexpansive operator, i.e., $x^*=Tx^*$, abstracts many problems in numerical linear algebra, optimization, and other areas of scientific computing. To solve fixed-point problems, we propose ARock, an algorithmic framework in which multiple agents (machines, processors, or cores) update $x$ in an asynchronous parallel fashion. Asynchrony is crucial to parallel computing… ▽ More Finding a fixed point to a nonexpansive operator, i.e., $x^*=Tx^*$, abstracts many problems in numerical linear algebra, optimization, and other areas of scientific computing. To solve fixed-point problems, we propose ARock, an algorithmic framework in which multiple agents (machines, processors, or cores) update $x$ in an asynchronous parallel fashion. Asynchrony is crucial to parallel computing since it reduces synchronization wait, relaxes communication bottleneck, and thus speeds up computing significantly. At each step of ARock, an agent updates a randomly selected coordinate $x_i$ based on possibly out-of-date information on $x$. The agents share $x$ through either global memory or communication. If writing $x_i$ is atomic, the agents can read and write $x$ without memory locks. Theoretically, we show that if the nonexpansive operator $T$ has a fixed point, then with probability one, ARock generates a sequence that converges to a fixed points of $T$. Our conditions on $T$ and step sizes are weaker than comparable work. Linear convergence is also obtained. We propose special cases of ARock for linear systems, convex optimization, machine learning, as well as distributed and decentralized consensus problems. Numerical experiments of solving sparse logistic regression problems are presented. △ Less

Submitted 26 May, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

Comments: updated the linear convergence proofs

Journal ref: SIAM Journal on Scientific Computing, 38 (2016), A2851-A2879

arXiv:1505.03762 [pdf, other]

doi 10.1007/s11425-015-5114-1

Dynamic Bivariate Normal Copula

Authors: Xin Liao, Liang Peng, Zuoxiang Peng, Yanting Zheng

Abstract: Normal copula with a correlation coefficient between $-1$ and $1$ is tail independent and so it severely underestimates extreme probabilities. By letting the correlation coefficient in a normal copula depend on the sample size, Hüsler and Reiss (1989) showed that the tail can become asymptotically dependent. In this paper, we extend this result by deriving the limit of the normalized maximum of… ▽ More Normal copula with a correlation coefficient between $-1$ and $1$ is tail independent and so it severely underestimates extreme probabilities. By letting the correlation coefficient in a normal copula depend on the sample size, Hüsler and Reiss (1989) showed that the tail can become asymptotically dependent. In this paper, we extend this result by deriving the limit of the normalized maximum of $n$ independent observations, where the $i$-th observation follows from a normal copula with its correlation coefficient being either a parametric or a nonparametric function of $i/n$. Furthermore, both parametric and nonparametric inference for this unknown function are studied, which can be employed to test the condition in Hüsler and Reiss (1989). A simulation study and real data analysis are presented too. △ Less

Submitted 14 May, 2015; originally announced May 2015.

Comments: 22pages, 4 figures

MSC Class: Primary 62F12; 62G30; Secondary 660G70; 62G32

arXiv:1505.03431 [pdf, ps, other]

Asymptotics and statistical inferences on independent and non-identically distributed bivariate Gaussian triangular arrays

Authors: Xin Liao, Zuoxiang Peng

Abstract: In this paper, we establish the first and the second-order asymptotics of distributions of normalized maxima of independent and non-identically distributed bivariate Gaussian triangular arrays, where each vector of the $n$th row follows from a bivariate Gaussian distribution with correlation coefficient being a monotone continuous function of $i/n$. Furthermore, parametric inference for this unkno… ▽ More In this paper, we establish the first and the second-order asymptotics of distributions of normalized maxima of independent and non-identically distributed bivariate Gaussian triangular arrays, where each vector of the $n$th row follows from a bivariate Gaussian distribution with correlation coefficient being a monotone continuous function of $i/n$. Furthermore, parametric inference for this unknown function is studied. Some simulation study and real data sets analysis are also presented. △ Less

Submitted 26 April, 2016; v1 submitted 13 May, 2015; originally announced May 2015.

Comments: 19 pages, 8 figures

MSC Class: 62E20; 60G70; 60F15; 60F05

arXiv:1402.6302 [pdf, ps, other]

doi 10.1007/s11425-014-4841-z

Tail Asymptotic Expansions for L-Statistics

Authors: E. Hashorva, C. Ling, Z. Peng

Abstract: In this paper, we derive higher-order expansions of $L$-statistics of independent risks $X_1, \ldots, X_n$ under conditions on the underlying distribution function $F$. The new results are applied to derive the asymptotic expansions of ratios of two kinds of risk measures, stop-loss premium and excess return on capital, respectively. In this paper, we derive higher-order expansions of $L$-statistics of independent risks $X_1, \ldots, X_n$ under conditions on the underlying distribution function $F$. The new results are applied to derive the asymptotic expansions of ratios of two kinds of risk measures, stop-loss premium and excess return on capital, respectively. △ Less

Submitted 25 February, 2014; originally announced February 2014.

Journal ref: Science China Mathematics, 57(10), 1993-2012

arXiv:1402.5608 [pdf, ps, other]

Higher-order expansions of distributions of maxima in a Hüsler-Reiss model

Authors: E. Hashorva, Z. Peng, Z. Weng

Abstract: The max-stable Hüsler-Reiss distribution which arises as the limit distribution of maxima of bivariate Gaussian triangular arrays has been shown to be useful in various extreme value models. For such triangular arrays, this paper establishes higher-order asymptotic expansions of the joint distribution of maxima under refined Hüsler-Reiss conditions. In particular, the rate of convergence of normal… ▽ More The max-stable Hüsler-Reiss distribution which arises as the limit distribution of maxima of bivariate Gaussian triangular arrays has been shown to be useful in various extreme value models. For such triangular arrays, this paper establishes higher-order asymptotic expansions of the joint distribution of maxima under refined Hüsler-Reiss conditions. In particular, the rate of convergence of normalized maxima to the Hüsler-Reiss distribution is explicitly calculated. △ Less

Submitted 23 February, 2014; originally announced February 2014.

Comments: 11 pages

arXiv:1212.1004 [pdf, ps, other]

Rates of convergence of extremes from skew normal samples

Authors: Xin Liao, Zuoxiang Peng, Saralees Nadarajah, Xiaoqian Wang

Abstract: For a skew normal random sequence, convergence rates of the distribution of its partial maximum to the Gumbel extreme value distribution are derived. The asymptotic expansion of the distribution of the normalized maximum is given under an optimal choice of norming constants. We find that the optimal convergence rate of the normalized maximum to the Gumbel extreme value distribution is proportional… ▽ More For a skew normal random sequence, convergence rates of the distribution of its partial maximum to the Gumbel extreme value distribution are derived. The asymptotic expansion of the distribution of the normalized maximum is given under an optimal choice of norming constants. We find that the optimal convergence rate of the normalized maximum to the Gumbel extreme value distribution is proportional to $1/\log n$. △ Less

Submitted 5 December, 2012; originally announced December 2012.

Showing 1–28 of 28 results for author: Peng, Z