Search | arXiv e-print repository

Conformal Load Prediction with Transductive Graph Autoencoders

Abstract: Predicting edge weights on graphs has various applications, from transportation systems to social networks. This paper describes a Graph Neural Network (GNN) approach for edge weight prediction with guaranteed coverage. We leverage conformal prediction to calibrate the GNN outputs and produce valid prediction intervals. We handle data heteroscedasticity through error reweighting and Conformalized… ▽ More Predicting edge weights on graphs has various applications, from transportation systems to social networks. This paper describes a Graph Neural Network (GNN) approach for edge weight prediction with guaranteed coverage. We leverage conformal prediction to calibrate the GNN outputs and produce valid prediction intervals. We handle data heteroscedasticity through error reweighting and Conformalized Quantile Regression (CQR). We compare the performance of our method against baseline techniques on real-world transportation datasets. Our approach has better coverage and efficiency than all baselines and showcases robustness and adaptability. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2404.03828 [pdf, other]

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Authors: Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu

Abstract: We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an out… ▽ More We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{https://github.com/MAGICS-LAB/OutEffHop}{GitHub}; models are on \href{https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}; future updates are on \href{https://arxiv.longhoe.net/abs/2404.03828}{arXiv}. △ Less

Submitted 26 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted at ICML 2024; v2 updated to camera-ready version; Code available at https://github.com/MAGICS-LAB/OutEffHop; Models are on Hugging Face: https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f

arXiv:2308.06769 [pdf, other]

Fréchet Statistics Based Change Point Detection in Multivariate Hawkes Process

Authors: Rui Luo, Vikram Krishnamurthy

Abstract: This paper proposes a new approach for change point detection in causal networks of multivariate Hawkes processes using Frechet statistics. Our method splits the point process into overlap** windows, estimates kernel matrices in each window, and reconstructs the signed Laplacians by treating the kernel matrices as the adjacency matrices of the causal network. We demonstrate the effectiveness of… ▽ More This paper proposes a new approach for change point detection in causal networks of multivariate Hawkes processes using Frechet statistics. Our method splits the point process into overlap** windows, estimates kernel matrices in each window, and reconstructs the signed Laplacians by treating the kernel matrices as the adjacency matrices of the causal network. We demonstrate the effectiveness of our method through experiments on both simulated and real-world cryptocurrency datasets. Our results show that our method is capable of accurately detecting and characterizing changes in the causal structure of multivariate Hawkes processes, and may have potential applications in fields such as finance and neuroscience. The proposed method is an extension of previous work on Frechet statistics in point process settings and represents an important contribution to the field of change point detection in multivariate point processes. △ Less

Submitted 15 August, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

arXiv:2203.16666 [pdf, other]

Hawkes Process Modeling of Block Arrivals in Bitcoin Blockchain

Authors: Rui Luo, Vikram Krishnamurthy, Erik Blasch

Abstract: The paper constructs a multi-variate Hawkes process model of Bitcoin block arrivals and price jumps. Hawkes processes are selfexciting point processes that can capture the self- and cross-excitation effects of block mining and Bitcoin price volatility. We use publicly available blockchain datasets to estimate the model parameters via maximum likelihood estimation. The results show that Bitcoin pri… ▽ More The paper constructs a multi-variate Hawkes process model of Bitcoin block arrivals and price jumps. Hawkes processes are selfexciting point processes that can capture the self- and cross-excitation effects of block mining and Bitcoin price volatility. We use publicly available blockchain datasets to estimate the model parameters via maximum likelihood estimation. The results show that Bitcoin price volatility boost block mining rate and Bitcoin investment return demonstrates mean reversion. Quantile-Quantile plots show that the proposed Hawkes process model is a better fit to the blockchain datasets than a Poisson process model. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2109.12727 [pdf, other]

Anomalous Edge Detection in Edge Exchangeable Social Network Models

Authors: Rui Luo, Buddhika Nettasinghe, Vikram Krishnamurthy

Abstract: This paper studies detecting anomalous edges in directed graphs that model social networks. We exploit edge exchangeability as a criterion for distinguishing anomalous edges from normal edges. Then we present an anomaly detector based on conformal prediction theory; this detector has a guaranteed upper bound for false positive rate. In numerical experiments, we show that the proposed algorithm ach… ▽ More This paper studies detecting anomalous edges in directed graphs that model social networks. We exploit edge exchangeability as a criterion for distinguishing anomalous edges from normal edges. Then we present an anomaly detector based on conformal prediction theory; this detector has a guaranteed upper bound for false positive rate. In numerical experiments, we show that the proposed algorithm achieves superior performance to baseline methods. △ Less

Submitted 21 August, 2023; v1 submitted 26 September, 2021; originally announced September 2021.

arXiv:2008.09643 [pdf, ps, other]

Privacy Preserving Recalibration under Domain Shift

Authors: Rachel Luo, Shengjia Zhao, Jiaming Song, Jonathan Kuck, Stefano Ermon, Silvio Savarese

Abstract: Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training dat… ▽ More Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training data, and privacy preservation is paramount (e.g. protecting patient records). We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints. This framework allows us to adapt existing recalibration algorithms to satisfy differential privacy while remaining effective for domain-shift situations. Guided by our framework, we also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets. In an extensive empirical study, we find that our algorithm improves calibration on domain-shift benchmarks under the constraints of differential privacy. On the 15 highest severity perturbations of the ImageNet-C dataset, our method achieves a median ECE of 0.029, over 2x better than the next best recalibration method and almost 5x better than without recalibration. △ Less

Submitted 21 August, 2020; originally announced August 2020.

arXiv:2007.04785 [pdf, other]

Accuracy Prediction with Non-neural Model for Neural Architecture Search

Authors: Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu

Abstract: Neural architecture search (NAS) with an accuracy predictor that predicts the accuracy of candidate architectures has drawn increasing attention due to its simplicity and effectiveness. Previous works usually employ neural network-based predictors which require more delicate design and are easy to overfit. Considering that most architectures are represented as sequences of discrete symbols which a… ▽ More Neural architecture search (NAS) with an accuracy predictor that predicts the accuracy of candidate architectures has drawn increasing attention due to its simplicity and effectiveness. Previous works usually employ neural network-based predictors which require more delicate design and are easy to overfit. Considering that most architectures are represented as sequences of discrete symbols which are more like tabular data and preferred by non-neural predictors, in this paper, we study an alternative approach which uses non-neural model for accuracy prediction. Specifically, as decision tree based models can better handle tabular data, we leverage gradient boosting decision tree (GBDT) as the predictor for NAS. We demonstrate that the GBDT predictor can achieve comparable (if not better) prediction accuracy than neural network based predictors. Moreover, considering that a compact search space can ease the search process, we propose to prune the search space gradually according to important features derived from GBDT. In this way, NAS can be performed by first pruning the search space and then searching a neural architecture, which is more efficient and effective. Experiments on NASBench-101 and ImageNet demonstrate the effectiveness of using GBDT as predictor for NAS: (1) On NASBench-101, it is 22x, 8x, and 6x more sample efficient than random search, regularized evolution, and Monte Carlo Tree Search (MCTS) in finding the global optimum; (2) It achieves 24.2% top-1 error rate on ImageNet, and further achieves 23.4% top-1 error rate on ImageNet when enhanced with search space pruning. Code is provided at https://github.com/renqianluo/GBDT-NAS. △ Less

Submitted 19 July, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: Code is available at https://github.com/renqianluo/GBDT-NAS

arXiv:2007.00295 [pdf, ps, other]

Belief Propagation Neural Networks

Authors: Jonathan Kuck, Shuvam Chakraborty, Hao Tang, Rachel Luo, Jiaming Song, Ashish Sabharwal, Stefano Ermon

Abstract: Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP… ▽ More Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP). In its strictest form, a BPNN layer (BPNN-D) is a learned iterative operator that provably maintains many of the desirable properties of BP for any choice of the parameters. Empirically, we show that by training BPNN-D learns to perform the task better than the original BP: it converges 1.7x faster on Ising models while providing tighter bounds. On challenging model counting problems, BPNNs compute estimates 100's of times faster than state-of-the-art handcrafted methods, while returning an estimate of comparable quality. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2006.05620 [pdf, other]

Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption

Authors: Xu Sun, Zhiyuan Zhang, Xuancheng Ren, Ruixuan Luo, Liangyou Li

Abstract: We argue that the vulnerability of model parameters is of crucial value to the study of model robustness and generalization but little research has been devoted to understanding this matter. In this work, we propose an indicator to measure the robustness of neural network parameters by exploiting their vulnerability via parameter corruption. The proposed indicator describes the maximum loss variat… ▽ More We argue that the vulnerability of model parameters is of crucial value to the study of model robustness and generalization but little research has been devoted to understanding this matter. In this work, we propose an indicator to measure the robustness of neural network parameters by exploiting their vulnerability via parameter corruption. The proposed indicator describes the maximum loss variation in the non-trivial worst-case scenario under parameter corruption. For practical purposes, we give a gradient-based estimation, which is far more effective than random corruption trials that can hardly induce the worst accuracy degradation. Equipped with theoretical support and empirical validation, we are able to systematically investigate the robustness of different model parameters and reveal vulnerability of deep neural networks that has been rarely paid attention to before. Moreover, we can enhance the models accordingly with the proposed adversarial corruption-resistant training, which not only improves the parameter robustness but also translates into accuracy elevation. △ Less

Submitted 10 December, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

Comments: Accepted by AAAI 2021

arXiv:2002.10389 [pdf, other]

Semi-Supervised Neural Architecture Search

Authors: Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu

Abstract: Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and high-quality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose SemiNAS, a semi-supervised NAS approach that… ▽ More Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and high-quality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose SemiNAS, a semi-supervised NAS approach that leverages numerous unlabeled architectures (without evaluation and thus nearly no cost). Specifically, SemiNAS 1) trains an initial accuracy predictor with a small set of architecture-accuracy data pairs; 2) uses the trained accuracy predictor to predict the accuracy of large amount of architectures (without evaluation); and 3) adds the generated data pairs to the original data to further improve the predictor. The trained accuracy predictor can be applied to various NAS algorithms by predicting the accuracy of candidate architectures for them. SemiNAS has two advantages: 1) It reduces the computational cost under the same accuracy guarantee. On NASBench-101 benchmark dataset, it achieves comparable accuracy with gradient-based method while using only 1/7 architecture-accuracy pairs. 2) It achieves higher accuracy under the same computational cost. It achieves 94.02% test accuracy on NASBench-101, outperforming all the baselines when using the same number of architectures. On ImageNet, it achieves 23.5% top-1 error rate (under 600M FLOPS constraint) using 4 GPU-days for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively. △ Less

Submitted 3 November, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: NeurIPS 2020

arXiv:1911.06191 [pdf, other]

Microsoft Research Asia's Systems for WMT19

Authors: Yingce Xia, Xu Tan, Fei Tian, Fei Gao, Weicong Chen, Yang Fan, Linyuan Gong, Yichong Leng, Renqian Luo, Yiren Wang, Lijun Wu, **hua Zhu, Tao Qin, Tie-Yan Liu

Abstract: We Microsoft Research Asia made submissions to 11 language directions in the WMT19 news translation tasks. We won the first place for 8 of the 11 directions and the second place for the other three. Our basic systems are built on Transformer, back translation and knowledge distillation. We integrate several of our rececent techniques to enhance the baseline systems: multi-agent dual learning (MADL… ▽ More We Microsoft Research Asia made submissions to 11 language directions in the WMT19 news translation tasks. We won the first place for 8 of the 11 directions and the second place for the other three. Our basic systems are built on Transformer, back translation and knowledge distillation. We integrate several of our rececent techniques to enhance the baseline systems: multi-agent dual learning (MADL), masked sequence-to-sequence pre-training (MASS), neural architecture optimization (NAO), and soft contextual data augmentation (SCA). △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: Accepted to "Fourth Conference on Machine Translation (WMT19)"

arXiv:1910.12249 [pdf, other]

An Adaptive and Momental Bound Method for Stochastic Learning

Authors: Jianbang Ding, Xuancheng Ren, Ruixuan Luo, Xu Sun

Abstract: Training deep neural networks requires intricate initialization and careful selection of learning rates. The emergence of stochastic gradient optimization methods that use adaptive learning rates based on squared past gradients, e.g., AdaGrad, AdaDelta, and Adam, eases the job slightly. However, such methods have also been proven problematic in recent studies with their own pitfalls including non-… ▽ More Training deep neural networks requires intricate initialization and careful selection of learning rates. The emergence of stochastic gradient optimization methods that use adaptive learning rates based on squared past gradients, e.g., AdaGrad, AdaDelta, and Adam, eases the job slightly. However, such methods have also been proven problematic in recent studies with their own pitfalls including non-convergence issues and so on. Alternative variants have been proposed for enhancement, such as AMSGrad, AdaShift and AdaBound. In this work, we identify a new problem of adaptive learning rate methods that exhibits at the beginning of learning where Adam produces extremely large learning rates that inhibit the start of learning. We propose the Adaptive and Momental Bound (AdaMod) method to restrict the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks. Our experiments verify that AdaMod eliminates the extremely large learning rates throughout the training and brings significant improvements especially on complex networks such as DenseNet and Transformer, compared to Adam. Our implementation is available at: https://github.com/lancopku/AdaMod △ Less

Submitted 27 October, 2019; originally announced October 2019.

arXiv:1909.10815 [pdf, other]

Balanced One-shot Neural Architecture Optimization

Authors: Renqian Luo, Tao Qin, Enhong Chen

Abstract: The ability to rank candidate architectures is the key to the performance of neural architecture search~(NAS). One-shot NAS is proposed to reduce the expense but shows inferior performance against conventional NAS and is not adequately stable. We investigate into this and find that the ranking correlation between architectures under one-shot training and the ones under stand-alone full training is… ▽ More The ability to rank candidate architectures is the key to the performance of neural architecture search~(NAS). One-shot NAS is proposed to reduce the expense but shows inferior performance against conventional NAS and is not adequately stable. We investigate into this and find that the ranking correlation between architectures under one-shot training and the ones under stand-alone full training is poor, which misleads the algorithm to discover better architectures. Further, we show that the training of architectures of different sizes under the current one-shot method is imbalanced, which causes the evaluated performances of the architectures to be less predictable of their ground-truth performances and affects the ranking correlation heavily. Consequently, we propose Balanced NAO where we introduce balanced training of the supernet during the search procedure to encourage more updates for large architectures than small architectures by sampling architectures in proportion to their model sizes. Comprehensive experiments verify that our proposed method is effective and robust which leads to a more stable search. The final discovered architecture shows significant improvements against baselines with a test error rate of 2.60\% on CIFAR-10 and top-1 accuracy of 74.4% on ImageNet under the mobile setting. Code and model checkpoints will be publicly available. The code is available at github.com/renqianluo/NAO_pytorch. △ Less

Submitted 31 March, 2020; v1 submitted 24 September, 2019; originally announced September 2019.

Comments: Code and model checkpoints are publicly available at https://github.com/renqianluo/NAO_pytorch

arXiv:1908.03595 [pdf, other]

Adaptive Ensemble of Classifiers with Regularization for Imbalanced Data Classification

Authors: Chen Wang, Chengyuan Deng, Zhoulu Yu, Dafeng Hui, Xiaofeng Gong, Ruisen Luo

Abstract: The dynamic ensemble selection of classifiers is an effective approach for processing label-imbalanced data classifications. However, such a technique is prone to overfitting, owing to the lack of regularization methods and the dependence of the aforementioned technique on local geometry. In this study, focusing on binary imbalanced data classification, a novel dynamic ensemble method, namely adap… ▽ More The dynamic ensemble selection of classifiers is an effective approach for processing label-imbalanced data classifications. However, such a technique is prone to overfitting, owing to the lack of regularization methods and the dependence of the aforementioned technique on local geometry. In this study, focusing on binary imbalanced data classification, a novel dynamic ensemble method, namely adaptive ensemble of classifiers with regularization (AER), is proposed, to overcome the stated limitations. The method solves the overfitting problem through implicit regularization. Specifically, it leverages the properties of stochastic gradient descent to obtain the solution with the minimum norm, thereby achieving regularization; furthermore, it interpolates the ensemble weights by exploiting the global geometry of data to further prevent overfitting. According to our theoretical proofs, the seemingly complicated AER paradigm, in addition to its regularization capabilities, can actually reduce the asymptotic time and memory complexities of several other algorithms. We evaluate the proposed AER method on seven benchmark imbalanced datasets from the UCI machine learning repository and one artificially generated GMM-based dataset with five variations. The results show that the proposed algorithm outperforms the major existing algorithms based on multiple metrics in most cases, and two hypothesis tests (McNemar's and Wilcoxon tests) verify the statistical significance further. In addition, the proposed method has other preferred properties such as special advantages in dealing with highly imbalanced data, and it pioneers the research on the regularization for dynamic ensemble methods. △ Less

Submitted 5 November, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

Comments: Major revision; Change of authors due to contributions

arXiv:1907.13196 [pdf, other]

Wasserstein Robust Reinforcement Learning

Authors: Mohammed Amin Abdullah, Hang Ren, Haitham Bou Ammar, Vladimir Milenkovic, Rui Luo, Mingtian Zhang, Jun Wang

Abstract: Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes $\text{W}\text{R}^{2}\text{L}$ -- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a… ▽ More Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes $\text{W}\text{R}^{2}\text{L}$ -- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We empirically demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments. △ Less

Submitted 16 September, 2019; v1 submitted 30 July, 2019; originally announced July 2019.

arXiv:1907.04536 [pdf]

Multi-layer Attention Mechanism for Speech Keyword Recognition

Authors: Ruisen Luo, Tianran Sun, Chen Wang, Miao Du, Zuodong Tang, Kai Zhou, Xiaofeng Gong, Xiaomei Yang

Abstract: As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition… ▽ More As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition are based on long short-term memory (LSTM) networks with attention mechanism. However, due to inevitable information losses for the LSTM layer caused during feature extraction, the calculated attention weights are biased. In this paper, a novel approach, namely Multi-layer Attention Mechanism, is proposed to handle the inaccurate attention weights problem. The key idea is that, in addition to the conventional attention mechanism, information of layers prior to feature extraction and LSTM are introduced into attention weights calculations. Therefore, the attention weights are more accurate because the overall model can have more precise and focused areas. We conduct a comprehensive comparison and analysis on the keyword spotting performances on convolution neural network, bi-directional LSTM cyclic neural network, and cyclic neural network with the proposed attention mechanism on Google Speech Command datasets V2 datasets. Experimental results indicate favorable results for the proposed method and demonstrate the validity of the proposed method. The proposed multi-layer attention methods can be useful for other researches related to object spotting. △ Less

Submitted 10 July, 2019; originally announced July 2019.

arXiv:1905.12569 [pdf, other]

Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets

Authors: Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

Abstract: In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is achieved by simulating a collection of replicas in parallel with different temperatures and periodically swap** them. When evolving the replicas' states, the Nosé-Hoover d… ▽ More In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is achieved by simulating a collection of replicas in parallel with different temperatures and periodically swap** them. When evolving the replicas' states, the Nosé-Hoover dynamics is applied, which adaptively neutralizes the mini-batch noise. To perform proper exchanges, a new protocol is developed with a noise-aware test of acceptance, by which the detailed balance is reserved in an asymptotic way. While its efficacy on complex multimodal posteriors has been illustrated by testing over synthetic distributions, experiments with deep Bayesian neural networks on large-scale datasets have shown its significant improvements over strong baselines. △ Less

Submitted 21 February, 2021; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: NeurIPS 2020

arXiv:1904.00204 [pdf, other]

Combining Smoothing Spline with Conditional Gaussian Graphical Model for Density and Graph Estimation

Authors: Runfei Luo, Anna Liu, Yuedong Wang

Abstract: Multivariate density estimation and graphical models play important roles in statistical learning. The estimated density can be used to construct a graphical model that reveals conditional relationships whereas a graphical structure can be used to build models for density estimation. Our goal is to construct a consolidated framework that can perform both density and graph estimation. Denote… ▽ More Multivariate density estimation and graphical models play important roles in statistical learning. The estimated density can be used to construct a graphical model that reveals conditional relationships whereas a graphical structure can be used to build models for density estimation. Our goal is to construct a consolidated framework that can perform both density and graph estimation. Denote $\bm{Z}$ as the random vector of interest with density function $f(\bz)$. Splitting $\bm{Z}$ into two parts, $\bm{Z}=(\bm{X}^T,\bm{Y}^T)^T$ and writing $f(\bz)=f(\bx)f(\by|\bx)$ where $f(\bx)$ is the density function of $\bm{X}$ and $f(\by|\bx)$ is the conditional density of $\bm{Y}|\bm{X}=\bx$. We propose a semiparametric framework that models $f(\bx)$ nonparametrically using a smoothing spline ANOVA (SS ANOVA) model and $f(\by|\bx)$ parametrically using a conditional Gaussian graphical model (cGGM). Combining flexibility of the SS ANOVA model with succinctness of the cGGM, this framework allows us to deal with high-dimensional data without assuming a joint Gaussian distribution. We propose a backfitting estimation procedure for the cGGM with a computationally efficient approach for selection of tuning parameters. We also develop a geometric inference approach for edge selection. We establish asymptotic convergence properties for both the parameter and density estimation. The performance of the proposed method is evaluated through extensive simulation studies and two real data applications. △ Less

Submitted 30 March, 2019; originally announced April 2019.

arXiv:1901.09207 [pdf, other]

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning

Authors: Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan

Abstract: Humans are capable of attributing latent mental contents such as beliefs or intentions to others. The social skill is critical in daily life for reasoning about the potential consequences of others' behaviors so as to plan ahead. It is known that humans use such reasoning ability recursively by considering what others believe about their own beliefs. In this paper, we start from level-$1$ recursio… ▽ More Humans are capable of attributing latent mental contents such as beliefs or intentions to others. The social skill is critical in daily life for reasoning about the potential consequences of others' behaviors so as to plan ahead. It is known that humans use such reasoning ability recursively by considering what others believe about their own beliefs. In this paper, we start from level-$1$ recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policies, to which each agent finds the best response and then improve their own policies. We develop decentralized-training-decentralized-execution algorithms, namely PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenarios when there exists one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community. △ Less

Submitted 1 March, 2019; v1 submitted 26 January, 2019; originally announced January 2019.

Comments: ICLR 2019

arXiv:1812.01181 [pdf, other]

Parallel-tempered Stochastic Gradient Hamiltonian Monte Carlo for Approximate Multimodal Posterior Sampling

Authors: Rui Luo, Qiang Zhang, Yuanyuan Liu

Abstract: We propose a new sampler that integrates the protocol of parallel tempering with the Nosé-Hoover (NH) dynamics. The proposed method can efficiently draw representative samples from complex posterior distributions with multiple isolated modes in the presence of noise arising from stochastic gradient. It potentially facilitates deep Bayesian learning on large datasets where complex multimodal poster… ▽ More We propose a new sampler that integrates the protocol of parallel tempering with the Nosé-Hoover (NH) dynamics. The proposed method can efficiently draw representative samples from complex posterior distributions with multiple isolated modes in the presence of noise arising from stochastic gradient. It potentially facilitates deep Bayesian learning on large datasets where complex multimodal posteriors and mini-batch gradient are encountered. △ Less

Submitted 7 December, 2018; v1 submitted 3 December, 2018; originally announced December 2018.

arXiv:1811.03711 [pdf, other]

Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series

Authors: Qiang Zhang, Rui Luo, Yaodong Yang, Yuanyuan Liu

Abstract: Volatility is a quantity of measurement for the price movements of stocks or options which indicates the uncertainty within financial markets. As an indicator of the level of risk or the degree of variation, volatility is important to analyse the financial market, and it is taken into consideration in various decision-making processes in financial activities. On the other hand, recent advancement… ▽ More Volatility is a quantity of measurement for the price movements of stocks or options which indicates the uncertainty within financial markets. As an indicator of the level of risk or the degree of variation, volatility is important to analyse the financial market, and it is taken into consideration in various decision-making processes in financial activities. On the other hand, recent advancement in deep learning techniques has shown strong capabilities in modelling sequential data, such as speech and natural language. In this paper, we empirically study the applicability of the latest deep structures with respect to the volatility modelling problem, through which we aim to provide an empirical guidance for the theoretical analysis of the marriage between deep learning techniques and financial applications in the future. We examine both the traditional approaches and the deep sequential models on the task of volatility prediction, including the most recent variants of convolutional and recurrent networks, such as the dilated architecture. Accordingly, experiments with real-world stock price datasets are performed on a set of 1314 daily stock series for 2018 days of transaction. The evaluation and comparison are based on the negative log likelihood (NLL) of real-world stock price time series. The result shows that the dilated neural models, including dilated CNN and Dilated RNN, produce most accurate estimation and prediction, outperforming various widely-used deterministic models in the GARCH family and several recently proposed stochastic models. In addition, the high flexibility and rich expressive power are validated in this study. △ Less

Submitted 8 November, 2018; originally announced November 2018.

Comments: NIPS 2018, Workshop on Challenges and Opportunities for AI in Financial Services

arXiv:1808.07233 [pdf, other]

Neural Architecture Optimization

Authors: Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu

Abstract: Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuou… ▽ More Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain 1.93% test set error rate for CIFAR-10 image classification task and 56.0 test set perplexity of PTB language modeling task. Furthermore, combined with the recent proposed weight sharing mechanism, we discover powerful architecture on CIFAR-10 (with error rate 2.93%) and on PTB (with test set perplexity 56.6), with very limited computational resources (less than 10 GPU hours) for both tasks. △ Less

Submitted 4 September, 2019; v1 submitted 22 August, 2018; originally announced August 2018.

Comments: NeurIPS 2018. Code available at: https://github.com/renqianluo/NAO

arXiv:1808.03679 [pdf]

Machine Learning Promoting Extreme Simplification of Spectroscopy Equipment

Authors: Jianchao Lee, Qiannan Duan, Sifan Bi, Ruen Luo, Yachao Lian, Hanqiang Liu, Ruixing Tian, Jiayuan Chen, Guodong Ma, **hong Gao, Zhaoyi Xu

Abstract: The spectroscopy measurement is one of main pathways for exploring and understanding the nature. Today, it seems that racing artificial intelligence will remould its styles. The algorithms contained in huge neural networks are capable of substituting many of expensive and complex components of spectrum instruments. In this work, we presented a smart machine learning strategy on the measurement of… ▽ More The spectroscopy measurement is one of main pathways for exploring and understanding the nature. Today, it seems that racing artificial intelligence will remould its styles. The algorithms contained in huge neural networks are capable of substituting many of expensive and complex components of spectrum instruments. In this work, we presented a smart machine learning strategy on the measurement of absorbance curves, and also initially verified that an exceedingly-simplified equipment is sufficient to meet the needs for this strategy. Further, with its simplicity, the setup is expected to infiltrate into many scientific areas in versatile forms. △ Less

Submitted 13 September, 2019; v1 submitted 5 August, 2018; originally announced August 2018.

Comments: This is the second version. On pages 7 through 8, we have added a new case about the spectral properties of mixtures. Specifically, paragraph 1 on page 8 and Fig.7 is added

arXiv:1803.00204 [pdf, other]

doi 10.1109/TPAMI.2019.2952096

Scalar Quantization as Sparse Least Square Optimization

Authors: Chen Wang, Xiaomei Yang, Shaomin Fei, Kai Zhou, Xiaofeng Gong, Miao Du, Ruisen Luo

Abstract: Quantization can be used to form new vectors/matrices with shared values close to the original. In recent years, the popularity of scalar quantization for value-sharing applications has been soaring as it has been found huge utilities in reducing the complexity of neural networks. Existing clustering-based quantization techniques, while being well-developed, have multiple drawbacks including the d… ▽ More Quantization can be used to form new vectors/matrices with shared values close to the original. In recent years, the popularity of scalar quantization for value-sharing applications has been soaring as it has been found huge utilities in reducing the complexity of neural networks. Existing clustering-based quantization techniques, while being well-developed, have multiple drawbacks including the dependency of the random seed, empty or out-of-the-range clusters, and high time complexity for a large number of clusters. To overcome these problems, in this paper, the problem of scalar quantization is examined from a new perspective, namely sparse least square optimization. Specifically, inspired by the property of sparse least square regression, several quantization algorithms based on $l_1$ least square are proposed. In addition, similar schemes with $l_1 + l_2$ and $l_0$ regularization are proposed. Furthermore, to compute quantization results with a given amount of values/clusters, this paper designed an iterative method and a clustering-based method, and both of them are built on sparse least square. The paper shows that the latter method is mathematically equivalent to an improved version of k-means clustering-based quantization algorithm, although the two algorithms originated from different intuitions. The algorithms proposed were tested with three types of data and their computational performances, including information loss, time consumption, and the distribution of the values of the sparse vectors, were compared and analyzed. The paper offers a new perspective to probe the area of quantization, and the algorithms proposed can outperform existing methods especially under some bit-width reduction scenarios, when the required post-quantization resolution (number of values) is not significantly lower than the original number. △ Less

Submitted 5 November, 2019; v1 submitted 28 February, 2018; originally announced March 2018.

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

arXiv:1712.00504 [pdf, other]

A Neural Stochastic Volatility Model

Authors: Rui Luo, Weinan Zhang, Xiaojun Xu, Jun Wang

Abstract: In this paper, we show that the recent integration of statistical models with deep recurrent neural networks provides a new way of formulating volatility (the degree of variation of time series) models that have been widely used in time series analysis and prediction in finance. The model comprises a pair of complementary stochastic recurrent neural networks: the generative network models the join… ▽ More In this paper, we show that the recent integration of statistical models with deep recurrent neural networks provides a new way of formulating volatility (the degree of variation of time series) models that have been widely used in time series analysis and prediction in finance. The model comprises a pair of complementary stochastic recurrent neural networks: the generative network models the joint distribution of the stochastic volatility process; the inference network approximates the conditional distribution of the latent variables given the observables. Our focus here is on the formulation of temporal dynamics of volatility over time under a stochastic recurrent neural network framework. Experiments on real-world stock price datasets demonstrate that the proposed model generates a better volatility estimation and prediction that outperforms mainstream methods, e.g., deterministic models such as GARCH and its variants, and stochastic models namely the MCMC-based model \emph{stochvol} as well as the Gaussian process volatility model \emph{GPVol}, on average negative log-likelihood. △ Less

Submitted 4 December, 2018; v1 submitted 30 November, 2017; originally announced December 2017.

arXiv:1711.11511 [pdf, other]

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

Authors: Rui Luo, Jianhong Wang, Yaodong Yang, Zhanxing Zhu, Jun Wang

Abstract: We propose a new sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for Bayesian learning on large datasets and multimodal distributions. It simulates the Nosé-Hoover dynamics of a continuously-tempered Hamiltonian system built on the distribution of interest. A significant advantage of this method is that it is not only able to efficiently draw representative… ▽ More We propose a new sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for Bayesian learning on large datasets and multimodal distributions. It simulates the Nosé-Hoover dynamics of a continuously-tempered Hamiltonian system built on the distribution of interest. A significant advantage of this method is that it is not only able to efficiently draw representative i.i.d. samples when the distribution contains multiple isolated modes, but capable of adaptively neutralising the noise arising from mini-batches and maintaining accurate sampling. While the properties of this method have been studied using synthetic distributions, experiments on three real datasets also demonstrated the gain of performance over several strong baselines with various types of neural networks plunged in. △ Less

Submitted 28 January, 2019; v1 submitted 30 November, 2017; originally announced November 2017.

arXiv:1706.05446 [pdf, other]

Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models

Authors: Yaodong Yang, Rui Luo, Yuanyuan Liu

Abstract: The Tweedie Compound Poisson-Gamma model is routinely used for modeling non-negative continuous data with a discrete probability mass at zero. Mixed models with random effects account for the covariance structure related to the grou** hierarchy in the data. An important application of Tweedie mixed models is pricing the insurance policies, e.g. car insurance. However, the intractable likelihood… ▽ More The Tweedie Compound Poisson-Gamma model is routinely used for modeling non-negative continuous data with a discrete probability mass at zero. Mixed models with random effects account for the covariance structure related to the grou** hierarchy in the data. An important application of Tweedie mixed models is pricing the insurance policies, e.g. car insurance. However, the intractable likelihood function, the unknown variance function, and the hierarchical structure of mixed effects have presented considerable challenges for drawing inferences on Tweedie. In this study, we tackle the Bayesian Tweedie mixed-effects models via variational inference approaches. In particular, we empower the posterior approximation by implicit models trained in an adversarial setting. To reduce the variance of gradients, we reparameterize random effects, and integrate out one local latent variable of Tweedie. We also employ a flexible hyper prior to ensure the richness of the approximation. Our method is evaluated on both simulated and real-world data. Results show that the proposed method has smaller estimation bias on the random effects compared to traditional inference methods including MCMC; it also achieves a state-of-the-art predictive performance, meanwhile offering a richer estimation of the variance function. △ Less

Submitted 3 February, 2019; v1 submitted 16 June, 2017; originally announced June 2017.

Comments: ICASSP 2019

arXiv:1508.01113 [pdf, ps, other]

Sparse Fisher's discriminant analysis with thresholded linear constraints

Authors: Ruiyan Luo, Xin Qi

Abstract: Various regularized linear discriminant analysis (LDA) methods have been proposed to address the problems of the classic methods in high-dimensional settings. Asymptotic optimality has been established for some of these methods in high dimension when there are only two classes. A major difficulty in proving asymptotic optimality for multiclass classification is that the classification boundary is… ▽ More Various regularized linear discriminant analysis (LDA) methods have been proposed to address the problems of the classic methods in high-dimensional settings. Asymptotic optimality has been established for some of these methods in high dimension when there are only two classes. A major difficulty in proving asymptotic optimality for multiclass classification is that the classification boundary is typically complicated and no explicit formula for classification error generally exists when the number of classes is greater than two. For the Fisher's LDA, one additional difficulty is that the covariance matrix is also involved in the linear constraints. The main purpose of this paper is to establish asymptotic consistency and asymptotic optimality for our sparse Fisher's LDA with thresholded linear constraints in the high-dimensional settings for arbitrary number of classes. To address the first difficulty above, we provide asymptotic optimality and the corresponding convergence rates in high-dimensional settings for a large family of linear classification rules with arbitrary number of classes, and apply them to our method. To overcome the second difficulty, we propose a thresholding approach to avoid the estimate of the covariance matrix. We apply the method to the classification problems for multivariate functional data through the wavelet transformations. △ Less

Submitted 5 August, 2015; originally announced August 2015.

arXiv:1508.01105 [pdf, other]

Signal extraction approach for sparse multivariate response regression

Authors: Ruiyan Luo, Xin Qi

Abstract: In this paper, we consider multivariate response regression models with high dimensional predictor variables. One way to model the correlation among the response variables is through the low rank decomposition of the coefficient matrix, which has been considered by several papers for the high dimensional predictors. However, all these papers focus on the singular value decomposition of the coeffic… ▽ More In this paper, we consider multivariate response regression models with high dimensional predictor variables. One way to model the correlation among the response variables is through the low rank decomposition of the coefficient matrix, which has been considered by several papers for the high dimensional predictors. However, all these papers focus on the singular value decomposition of the coefficient matrix. Our target is the decomposition of the coefficient matrix which leads to the best lower rank approximation to the regression function, the signal part in the response. Given any rank, this decomposition has nearly the smallest expected prediction error among all approximations to the the coefficient matrix with the same rank. To estimate the decomposition, we formulate a penalized generalized eigenvalue problem to obtain the first matrix in the decomposition and then obtain the second one by a least squares method. In the high-dimensional setting, we establish the oracle inequalities for the estimates. Compared to the existing theoretical results, we have less restrictions on the distribution of the noise vector in each observation and allow correlations among its coordinates. Our theoretical results do not depend on the dimension of the multivariate response. Therefore, the dimension is arbitrary and can be larger than the sample size and the dimension of the predictor. Simulation studies and application to real data show that the proposed method has good prediction performance and is efficient in dimension reduction for various reduced rank models. △ Less

Submitted 5 August, 2015; originally announced August 2015.

Comments: 28 pages, 4 figures

arXiv:1108.0793 [pdf, ps, other]

doi 10.1214/10-AOAS425

Bayesian hierarchical modeling for signaling pathway inference from single cell interventional data

Authors: Ruiyan Luo, Hongyu Zhao

Abstract: Recent technological advances have made it possible to simultaneously measure multiple protein activities at the single cell level. With such data collected under different stimulatory or inhibitory conditions, it is possible to infer the causal relationships among proteins from single cell interventional data. In this article we propose a Bayesian hierarchical modeling framework to infer the sign… ▽ More Recent technological advances have made it possible to simultaneously measure multiple protein activities at the single cell level. With such data collected under different stimulatory or inhibitory conditions, it is possible to infer the causal relationships among proteins from single cell interventional data. In this article we propose a Bayesian hierarchical modeling framework to infer the signaling pathway based on the posterior distributions of parameters in the model. Under this framework, we consider network sparsity and model the existence of an association between two proteins both at the overall level across all experiments and at each individual experimental level. This allows us to infer the pairs of proteins that are associated with each other and their causal relationships. We also explicitly consider both intrinsic noise and measurement error. Markov chain Monte Carlo is implemented for statistical inference. We demonstrate that this hierarchical modeling can effectively pool information from different interventional experiments through simulation studies and real data analysis. △ Less

Submitted 3 August, 2011; originally announced August 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS425 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS425

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 2A, 725-745

arXiv:0906.1094 [pdf, ps, other]

doi 10.1214/08-AOAS212

Modeling substitution and indel processes for AFLP marker evolution and phylogenetic inference

Authors: Ruiyan Luo, Bret Larget

Abstract: The amplified fragment length polymorphism (AFLP) method produces anonymous genetic markers from throughout a genome. We extend the nucleotide substitution model of AFLP evolution to additionally include insertion and deletion processes. The new Sub-ID model relaxes the common assumption that markers are independent and homologous. We build a Markov chain Monte Carlo methodology tailored for the… ▽ More The amplified fragment length polymorphism (AFLP) method produces anonymous genetic markers from throughout a genome. We extend the nucleotide substitution model of AFLP evolution to additionally include insertion and deletion processes. The new Sub-ID model relaxes the common assumption that markers are independent and homologous. We build a Markov chain Monte Carlo methodology tailored for the Sub-ID model to implement a Bayesian approach to infer AFLP marker evolution. The method allows us to infer both the phylogenies and the subset of markers that are possibly homologous. In addition, we can infer the genome-wide relative rate of indels versus substitutions. In a case study with AFLP markers from sedges, a grass-like plant common in North America, we find that accounting for insertion and deletion makes a difference in phylogenetic inference. The inference of topologies is not sensitive to the prior settings and the Jukes--Cantor assumption for nucleotide substitution. The model for insertion and deletion we introduce has potential value in other phylogenetic applications. △ Less

Submitted 5 June, 2009; originally announced June 2009.

Comments: Published in at http://dx.doi.org/10.1214/08-AOAS212 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS212

Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 1, 222-248

Showing 1–31 of 31 results for author: Luo, R