-
Estimation and Inference for CP Tensor Factor Models
Authors:
Bin Chen,
Yuefeng Han,
Qiyang Yu
Abstract:
High-dimensional tensor-valued data have recently gained attention from researchers in economics and finance. We consider the estimation and inference of high-dimensional tensor factor models, where each dimension of the tensor diverges. Our focus is on a factor model that admits CP-type tensor decomposition, which allows for non-orthogonal loading vectors. Based on the contemporary covariance mat…
▽ More
High-dimensional tensor-valued data have recently gained attention from researchers in economics and finance. We consider the estimation and inference of high-dimensional tensor factor models, where each dimension of the tensor diverges. Our focus is on a factor model that admits CP-type tensor decomposition, which allows for non-orthogonal loading vectors. Based on the contemporary covariance matrix, we propose an iterative simultaneous projection estimation method. Our estimator is robust to weak dependence among factors and weak correlation across different dimensions in the idiosyncratic shocks. We establish an inferential theory, demonstrating both consistency and asymptotic normality under relaxed assumptions. Within a unified framework, we consider two eigenvalue ratio-based estimators for the number of factors in a tensor factor model and justify their consistency. Through a simulation study and two empirical applications featuring sorted portfolios and international trade flows, we illustrate the advantages of our proposed estimator over existing methodologies in the literature.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Advancing Graph Generation through Beta Diffusion
Authors:
Yilin He,
Xinyang Liu,
Bo Chen,
Mingyuan Zhou
Abstract:
Diffusion models have demonstrated effectiveness in generating natural images and have been extended to generate diverse data types, including graphs. This new generation of diffusion-based graph generative models has demonstrated significant performance improvements over methods that rely on variational autoencoders or generative adversarial networks. It's important to recognize, however, that mo…
▽ More
Diffusion models have demonstrated effectiveness in generating natural images and have been extended to generate diverse data types, including graphs. This new generation of diffusion-based graph generative models has demonstrated significant performance improvements over methods that rely on variational autoencoders or generative adversarial networks. It's important to recognize, however, that most of these models employ Gaussian or categorical diffusion processes, which can struggle with sparse and long-tailed data distributions. In our work, we introduce Graph Beta Diffusion (GBD), a diffusion-based generative model particularly adept at capturing diverse graph structures. GBD utilizes a beta diffusion process, tailored for the sparse and range-bounded characteristics of graph adjacency matrices. Furthermore, we have developed a modulation technique that enhances the realism of the generated graphs by stabilizing the generation of critical graph structures, while preserving flexibility elsewhere. The outstanding performance of GBD across three general graph benchmarks and two biochemical graph benchmarks highlights its capability to effectively capture the complexities of real-world graph data. The code will be made available at https://github.com/YH-UtMSB/Graph_Beta_Diffusion
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
Authors:
Brian K Chen,
Tianyang Hu,
Hui **,
Hwee Kuan Lee,
Kenji Kawaguchi
Abstract:
In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias ter…
▽ More
In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias terms. We mathematically demonstrate the equivalence between a model with ICL demonstration prompts and the same model with the additional bias terms. Our algorithm (ICLCA) allows for exact conversion in an inexpensive manner. Existing methods are not exact and require expensive parameter updates. We demonstrate the efficacy of our approach through experiments that show the exact incorporation of ICL tokens into a linear transformer. We further suggest how our method can be adapted to achieve cheap approximate conversion of ICL tokens, even in regular transformer networks that are not linearized. Our experiments on GPT-2 show that, even though the conversion is only approximate, the model still gains valuable context from the included bias terms.
△ Less
Submitted 6 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models
Authors:
Jiankun Wang,
Sumyeong Ahn,
Taykhoom Dalal,
Xiaodan Zhang,
Weishen Pan,
Qiannan Zhang,
Bin Chen,
Hiroko H. Dodge,
Fei Wang,
Jiayu Zhou
Abstract:
Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for develo** ADRD screening tools such as machine learning bas…
▽ More
Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for develo** ADRD screening tools such as machine learning based predictive models. Recent advancements in large language models (LLMs) demonstrate their unprecedented capability of encoding knowledge and performing reasoning, which offers them strong potential for enhancing risk prediction. This paper proposes a novel pipeline that augments risk prediction by leveraging the few-shot inference power of LLMs to make predictions on cases where traditional supervised learning methods (SLs) may not excel. Specifically, we develop a collaborative pipeline that combines SLs and LLMs via a confidence-driven decision-making mechanism, leveraging the strengths of SLs in clear-cut cases and LLMs in more complex scenarios. We evaluate this pipeline using a real-world EHR data warehouse from Oregon Health \& Science University (OHSU) Hospital, encompassing EHRs from over 2.5 million patients and more than 20 million patient encounters. Our results show that our proposed approach effectively combines the power of SLs and LLMs, offering significant improvements in predictive performance. This advancement holds promise for revolutionizing ADRD screening and early detection practices, with potential implications for better strategies of patient management and thus improving healthcare.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Nonparametric Modern Hopfield Models
Authors:
Jerry Yao-Chieh Hu,
Bo-Yu Chen,
Dennis Wu,
Feng Ruan,
Han Liu
Abstract:
We present a nonparametric construction for deep learning compatible modern Hopfield models and utilize this framework to debut an efficient variant. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Crucially, our framework not only recovers the known resul…
▽ More
We present a nonparametric construction for deep learning compatible modern Hopfield models and utilize this framework to debut an efficient variant. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Crucially, our framework not only recovers the known results from the original dense modern Hopfield model but also fills the void in the literature regarding efficient modern Hopfield models, by introducing \textit{sparse-structured} modern Hopfield models with sub-quadratic complexity. We establish that this sparse model inherits the appealing theoretical properties of its dense analogue -- connection with transformer attention, fixed point convergence and exponential memory capacity -- even without knowing details of the Hopfield energy function. Additionally, we showcase the versatility of our framework by constructing a family of modern Hopfield models as extensions, including linear, random masked, top-$K$ and positive random feature modern Hopfield models. Empirically, we validate the efficacy of our framework in both synthetic and realistic settings.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Time-Varying Matrix Factor Models
Authors:
Bin Chen,
Elynn Y. Chen,
Stevenson Bolivar,
Rong Chen
Abstract:
Matrix-variate data of high dimensions are frequently observed in finance and economics, spanning extended time periods, such as the long-term data on international trade flows among numerous countries. To address potential structural shifts and explore the matrix structure's informational context, we propose a time-varying matrix factor model. This model accommodates changing factor loadings over…
▽ More
Matrix-variate data of high dimensions are frequently observed in finance and economics, spanning extended time periods, such as the long-term data on international trade flows among numerous countries. To address potential structural shifts and explore the matrix structure's informational context, we propose a time-varying matrix factor model. This model accommodates changing factor loadings over time, revealing the underlying dynamic structure through nonparametric principal component analysis and facilitating dimension reduction. We establish the consistency and asymptotic normality of our estimators under general conditions that allow for weak correlations across time, rows, or columns of the noise. A novel approach is introduced to overcome rotational ambiguity in the estimators, enhancing the clarity and interpretability of the estimated loading matrices. Our simulation study highlights the merits of the proposed estimators and the effective of the smoothing operation. In an application to international trade flow, we investigate the trading hubs, centrality, patterns, and trends in the trading network.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction
Authors:
Dennis Wu,
Jerry Yao-Chieh Hu,
Weijian Li,
Bo-Yu Chen,
Han Liu
Abstract:
We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-s…
▽ More
We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers. In addition, StanHop incorporates two additional external memory modules: a Plug-and-Play module and a Tune-and-Play module for train-less and task-aware memory-enhancements, respectively. They allow StanHop-Net to swiftly respond to certain sudden events. Methodologically, we construct the StanHop-Net by stacking STanHop blocks in a hierarchical fashion, enabling multi-resolution feature extraction with resolution-specific sparsity. Theoretically, we introduce a sparse extension of the modern Hopfield model (Generalized Sparse Modern Hopfield Model) and show that it endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity. Empirically, we validate the efficacy of our framework on both synthetic and real-world settings.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
A latent linear model for nonlinear coupled oscillators on graphs
Authors:
Agam Goyal,
Zhaoxing Wu,
Richard P. Yim,
Binhao Chen,
Zihong Xu,
Hanbaek Lyu
Abstract:
A system of coupled oscillators on an arbitrary graph is locally driven by the tendency to mutual synchronization between nearby oscillators, but can and often exhibit nonlinear behavior on the whole graph. Understanding such nonlinear behavior has been a key challenge in predicting whether all oscillators in such a system will eventually synchronize. In this paper, we demonstrate that, surprising…
▽ More
A system of coupled oscillators on an arbitrary graph is locally driven by the tendency to mutual synchronization between nearby oscillators, but can and often exhibit nonlinear behavior on the whole graph. Understanding such nonlinear behavior has been a key challenge in predicting whether all oscillators in such a system will eventually synchronize. In this paper, we demonstrate that, surprisingly, such nonlinear behavior of coupled oscillators can be effectively linearized in certain latent dynamic spaces. The key insight is that there is a small number of `latent dynamics filters', each with a specific association with synchronizing and non-synchronizing dynamics on subgraphs so that any observed dynamics on subgraphs can be approximated by a suitable linear combination of such elementary dynamic patterns. Taking an ensemble of subgraph-level predictions provides an interpretable predictor for whether the system on the whole graph reaches global synchronization. We propose algorithms based on supervised matrix factorization to learn such latent dynamics filters. We demonstrate that our method performs competitively in synchronization prediction tasks against baselines and black-box classification algorithms, despite its simple and interpretable architecture.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Compositional Deep Probabilistic Models of DNA Encoded Libraries
Authors:
Benson Chen,
Mohammad M. Sultan,
Theofanis Karaletsos
Abstract:
DNA-Encoded Library (DEL) has proven to be a powerful tool that utilizes combinatorially constructed small molecules to facilitate highly-efficient screening assays. These selection experiments, involving multiple stages of washing, elution, and identification of potent binders via unique DNA barcodes, often generate complex data. This complexity can potentially mask the underlying signals, necess…
▽ More
DNA-Encoded Library (DEL) has proven to be a powerful tool that utilizes combinatorially constructed small molecules to facilitate highly-efficient screening assays. These selection experiments, involving multiple stages of washing, elution, and identification of potent binders via unique DNA barcodes, often generate complex data. This complexity can potentially mask the underlying signals, necessitating the application of computational tools such as machine learning to uncover valuable insights. We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their mono-synthon, di-synthon, and tri-synthon building blocks and capitalizes on the inherent hierarchical structure of these molecules by modeling latent reactions between embedded synthons. Additionally, we investigate methods to improve the observation models for DEL count data such as integrating covariate factors to more effectively account for data noise. Across two popular public benchmark datasets (CA-IX and HRP), our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure, thereby providing a robust tool for the analysis of DEL data.
△ Less
Submitted 13 February, 2024; v1 submitted 20 October, 2023;
originally announced October 2023.
-
On Sparse Modern Hopfield Model
Authors:
Jerry Yao-Chieh Hu,
Donglin Yang,
Dennis Wu,
Chenwei Xu,
Bo-Yu Chen,
Han Liu
Abstract:
We introduce the sparse modern Hopfield model as a sparse extension of the modern Hopfield model. Like its dense counterpart, the sparse modern Hopfield model equips a memory-retrieval dynamics whose one-step approximation corresponds to the sparse attention mechanism. Theoretically, our key contribution is a principled derivation of a closed-form sparse Hopfield energy using the convex conjugate…
▽ More
We introduce the sparse modern Hopfield model as a sparse extension of the modern Hopfield model. Like its dense counterpart, the sparse modern Hopfield model equips a memory-retrieval dynamics whose one-step approximation corresponds to the sparse attention mechanism. Theoretically, our key contribution is a principled derivation of a closed-form sparse Hopfield energy using the convex conjugate of the sparse entropic regularizer. Building upon this, we derive the sparse memory retrieval dynamics from the sparse energy function and show its one-step approximation is equivalent to the sparse-structured attention. Importantly, we provide a sparsity-dependent memory retrieval error bound which is provably tighter than its dense analog. The conditions for the benefits of sparsity to arise are therefore identified and discussed. In addition, we show that the sparse modern Hopfield model maintains the robust theoretical properties of its dense counterpart, including rapid fixed point convergence and exponential memory capacity. Empirically, we use both synthetic and real-world datasets to demonstrate that the sparse Hopfield model outperforms its dense counterpart in many situations.
△ Less
Submitted 29 November, 2023; v1 submitted 22 September, 2023;
originally announced September 2023.
-
Modify Training Directions in Function Space to Reduce Generalization Error
Authors:
Yi Yu,
Wenlian Lu,
Boyu Chen
Abstract:
We propose theoretical analyses of a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix. We firstly present analytical expression for the function learned by this modified natural gradient under the assumptions of Gaussian distribution and infinite width limit. Thus, we explicitly der…
▽ More
We propose theoretical analyses of a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix. We firstly present analytical expression for the function learned by this modified natural gradient under the assumptions of Gaussian distribution and infinite width limit. Thus, we explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory. By decomposing of the total generalization error attributed to different eigenspace of the kernel in function space, we propose a criterion for balancing the errors stemming from training set and the distribution discrepancy between the training set and the true data. Through this approach, we establish that modifying the training direction of the neural network in function space leads to a reduction in the total generalization error. Furthermore, We demonstrate that this theoretical framework is capable to explain many existing results of generalization enhancing methods. These theoretical results are also illustrated by numerical examples on synthetic data.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Mathematical Challenges in Deep Learning
Authors:
Vahid Partovi Nia,
Guojun Zhang,
Ivan Kobyzev,
Michael R. Metel,
Xinlin Li,
Ke Sun,
Sobhan Hemati,
Masoud Asgharian,
Linglong Kong,
Wulong Liu,
Boxing Chen
Abstract:
Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012. The size of deep models is increasing ever since, which brings new challenges to this field with applications in cell phones, personal computers, autonomous cars, and wireless base stations. Here we list a set of problems, ranging from training, inference, generalization bound, and optimizati…
▽ More
Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012. The size of deep models is increasing ever since, which brings new challenges to this field with applications in cell phones, personal computers, autonomous cars, and wireless base stations. Here we list a set of problems, ranging from training, inference, generalization bound, and optimization with some formalism to communicate these challenges with mathematicians, statisticians, and theoretical computer scientists. This is a subjective view of the research questions in deep learning that benefits the tech industry in long run.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Intrinsic minimum average variance estimation for sufficient dimension reduction with symmetric positive definite matrices and beyond
Authors:
B. Chen,
S. Dai,
Z. Yu
Abstract:
In this paper, we target the problem of sufficient dimension reduction with symmetric positive definite matrices valued responses. We propose the intrinsic minimum average variance estimation method and the intrinsic outer product gradient method which fully exploit the geometric structure of the Riemannian manifold where responses lie. We present the algorithms for our newly developed methods und…
▽ More
In this paper, we target the problem of sufficient dimension reduction with symmetric positive definite matrices valued responses. We propose the intrinsic minimum average variance estimation method and the intrinsic outer product gradient method which fully exploit the geometric structure of the Riemannian manifold where responses lie. We present the algorithms for our newly developed methods under the log-Euclidean metric and the log-Cholesky metric. Each of the two metrics is linked to an abelian Lie group structure that transforms our model defined on a manifold into a Euclidean one. The proposed methods are then further extended to general Riemannian manifolds. We establish rigourous asymptotic results for the proposed estimators, including the rate of convergence and the asymptotic normality. We also develop a cross validation algorithm for the estimation of the structural dimension with theoretical guarantee Comprehensive simulation studies and an application to the New York taxi network data are performed to show the superiority of the proposed methods.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Adaptive sparseness for correntropy-based robust regression via automatic relevance determination
Authors:
Yuanhao Li,
Badong Chen,
Okito Yamashita,
Natsue Yoshimura,
Yasuharu Koike
Abstract:
Sparseness and robustness are two important properties for many machine learning scenarios. In the present study, regarding the maximum correntropy criterion (MCC) based robust regression algorithm, we investigate to integrate the MCC method with the automatic relevance determination (ARD) technique in a Bayesian framework, so that MCC-based robust regression could be implemented with adaptive spa…
▽ More
Sparseness and robustness are two important properties for many machine learning scenarios. In the present study, regarding the maximum correntropy criterion (MCC) based robust regression algorithm, we investigate to integrate the MCC method with the automatic relevance determination (ARD) technique in a Bayesian framework, so that MCC-based robust regression could be implemented with adaptive sparseness. To be specific, we use an inherent noise assumption from the MCC to derive an explicit likelihood function, and realize the maximum a posteriori (MAP) estimation with the ARD prior by variational Bayesian inference. Compared to the existing robust and sparse L1-regularized MCC regression, the proposed MCC-ARD regression can eradicate the troublesome tuning for the regularization hyper-parameter which controls the regularization strength. Further, MCC-ARD achieves superior prediction performance and feature selection capability than L1-regularized MCC, as demonstrated by a noisy and high-dimensional simulation study.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
CI-GNN: A Granger Causality-Inspired Graph Neural Network for Interpretable Brain Network-Based Psychiatric Diagnosis
Authors:
Kaizhong Zheng,
Shujian Yu,
Badong Chen
Abstract:
There is a recent trend to leverage the power of graph neural networks (GNNs) for brain-network based psychiatric diagnosis, which,in turn, also motivates an urgent need for psychiatrists to fully understand the decision behavior of the used GNNs. However, most of the existing GNN explainers are either post-hoc in which another interpretive model needs to be created to explain a well-trained GNN,…
▽ More
There is a recent trend to leverage the power of graph neural networks (GNNs) for brain-network based psychiatric diagnosis, which,in turn, also motivates an urgent need for psychiatrists to fully understand the decision behavior of the used GNNs. However, most of the existing GNN explainers are either post-hoc in which another interpretive model needs to be created to explain a well-trained GNN, or do not consider the causal relationship between the extracted explanation and the decision, such that the explanation itself contains spurious correlations and suffers from weak faithfulness. In this work, we propose a granger causality-inspired graph neural network (CI-GNN), a built-in interpretable model that is able to identify the most influential subgraph (i.e., functional connectivity within brain regions) that is causally related to the decision (e.g., major depressive disorder patients or healthy controls), without the training of an auxillary interpretive network. CI-GNN learns disentangled subgraph-level representations α and \b{eta} that encode, respectively, the causal and noncausal aspects of original graph under a graph variational autoencoder framework, regularized by a conditional mutual information (CMI) constraint. We theoretically justify the validity of the CMI regulation in capturing the causal relationship. We also empirically evaluate the performance of CI-GNN against three baseline GNNs and four state-of-the-art GNN explainers on synthetic data and three large-scale brain disease datasets. We observe that CI-GNN achieves the best performance in a wide range of metrics and provides more reliable and concise explanations which have clinical evidence.The source code and implementation details of CI-GNN are freely available at GitHub repository (https://github.com/ZKZ-Brain/CI-GNN/).
△ Less
Submitted 28 January, 2024; v1 submitted 4 January, 2023;
originally announced January 2023.
-
Assessing Spatial Stationarity and Segmenting Spatial Processes into Stationary Components
Authors:
ShengLi Tzeng,
Bo-Yu Chen,
Hsin-Cheng Huang
Abstract:
In this research, we propose a novel technique for visualizing nonstationarity in geostatistics, particularly when confronted with a single realization of data at irregularly spaced locations. Our method hinges on formulating a statistic that tracks a stable microergodic parameter of the exponential covariance function, allowing us to address the intricate challenges of nonstationary processes tha…
▽ More
In this research, we propose a novel technique for visualizing nonstationarity in geostatistics, particularly when confronted with a single realization of data at irregularly spaced locations. Our method hinges on formulating a statistic that tracks a stable microergodic parameter of the exponential covariance function, allowing us to address the intricate challenges of nonstationary processes that lack repeated measurements. We implement the fused lasso technique to elucidate nonstationary patterns at various resolutions. For prediction purposes, we segment the spatial domain into stationary sub-regions via Voronoi tessellations. Additionally, we devise a robust test for stationarity based on contrasting the sample means of our proposed statistics between two selected Voronoi subregions. The effectiveness of our method is demonstrated through simulation studies and its application to a precipitation dataset in Colorado.
△ Less
Submitted 28 August, 2023; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Geometric Learning of Hidden Markov Models via a Method of Moments Algorithm
Authors:
Berlin Chen,
Cyrus Mostajeran,
Salem Said
Abstract:
We present a novel algorithm for learning the parameters of hidden Markov models (HMMs) in a geometric setting where the observations take values in Riemannian manifolds. In particular, we elevate a recent second-order method of moments algorithm that incorporates non-consecutive correlations to a more general setting where observations take place in a Riemannian symmetric space of non-positive cu…
▽ More
We present a novel algorithm for learning the parameters of hidden Markov models (HMMs) in a geometric setting where the observations take values in Riemannian manifolds. In particular, we elevate a recent second-order method of moments algorithm that incorporates non-consecutive correlations to a more general setting where observations take place in a Riemannian symmetric space of non-positive curvature and the observation likelihoods are Riemannian Gaussians. The resulting algorithm decouples into a Riemannian Gaussian mixture model estimation algorithm followed by a sequence of convex optimization procedures. We demonstrate through examples that the learner can result in significantly improved speed and numerical accuracy compared to existing learners.
△ Less
Submitted 2 July, 2022;
originally announced July 2022.
-
Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings
Authors:
Dongsheng Wang,
Dandan Guo,
He Zhao,
Huangjie Zheng,
Korawat Tanwisuth,
Bo Chen,
Mingyuan Zhou
Abstract:
A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document and hence often suffers from poor performance in analyzing short documents. In addition, its parameter estimation often relies on approximate posterior inference…
▽ More
A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document and hence often suffers from poor performance in analyzing short documents. In addition, its parameter estimation often relies on approximate posterior inference that is either not scalable or suffers from large approximation error. This paper introduces a new topic-modeling framework where each document is viewed as a set of word embedding vectors and each topic is modeled as an embedding vector in the same embedding space. Embedding the words and topics in the same vector space, we define a method to measure the semantic difference between the embedding vectors of the words of a document and these of the topics, and optimize the topic embeddings to minimize the expected difference over all documents. Experiments on text analysis demonstrate that the proposed method, which is amenable to mini-batch stochastic gradient descent based optimization and hence scalable to big corpora, provides competitive performance in discovering more coherent and diverse topics and extracting better document representations.
△ Less
Submitted 14 March, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
baker: An R package for Nested Partially-Latent Class Models
Authors:
Irena B Chen,
Qiyuan Shi,
Scott L Zeger,
Zhenke Wu
Abstract:
This paper describes and illustrates the functionality of the baker R package. The package estimates a suite of nested partially-latent class models (NPLCM) for multivariate binary responses that are observed under a case-control design. The baker package allows researchers to flexibly estimate population-level class prevalences and posterior probabilities of class membership for individual cases.…
▽ More
This paper describes and illustrates the functionality of the baker R package. The package estimates a suite of nested partially-latent class models (NPLCM) for multivariate binary responses that are observed under a case-control design. The baker package allows researchers to flexibly estimate population-level class prevalences and posterior probabilities of class membership for individual cases. Estimation is accomplished by calling a cross-platform automatic Bayesian inference software JAGS through a wrapper R function that parses model specifications and data inputs. The baker package provides many useful features, including data ingestion, exploratory data analyses, model diagnostics, extensive plotting and visualization options, catalyzing communications between practitioners and domain scientists. Package features and workflows are illustrated using simulated and real data sets. Package URL: https://github.com/zhenkewu/baker
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
A Variational Edge Partition Model for Supervised Graph Representation Learning
Authors:
Yilin He,
Chaojie Wang,
Hao Zhang,
Bo Chen,
Mingyuan Zhou
Abstract:
Graph neural networks (GNNs), which propagate the node features through the edges and learn how to transform the aggregated features under label supervision, have achieved great success in supervised feature extraction for both node-level and graph-level classification tasks. However, GNNs typically treat the graph structure as given and ignore how the edges are formed. This paper introduces a gra…
▽ More
Graph neural networks (GNNs), which propagate the node features through the edges and learn how to transform the aggregated features under label supervision, have achieved great success in supervised feature extraction for both node-level and graph-level classification tasks. However, GNNs typically treat the graph structure as given and ignore how the edges are formed. This paper introduces a graph generative process to model how the observed edges are generated by aggregating the node interactions over a set of overlap** node communities, each of which contributes to the edges via a logical OR mechanism. Based on this generative model, we partition each edge into the summation of multiple community-specific weighted edges and use them to define community-specific GNNs. A variational inference framework is proposed to jointly learn a GNN-based inference network that partitions the edges into different communities, these community-specific GNNs, and a GNN-based predictor that combines community-specific GNNs for the end classification task. Extensive evaluations on real-world graph datasets have verified the effectiveness of the proposed method in learning discriminative representations for both node-level and graph-level classification tasks.
△ Less
Submitted 31 October, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
A Prototype-Oriented Framework for Unsupervised Domain Adaptation
Authors:
Korawat Tanwisuth,
Xinjie Fan,
Huangjie Zheng,
Shujian Zhang,
Hao Zhang,
Bo Chen,
Mingyuan Zhou
Abstract:
Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space. To avoid the sampling variability, class imbalance, and data-privacy concerns that often plague these methods, we instead provide a memory and computation-efficient probabilistic framework to extract class prototypes and align the target…
▽ More
Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space. To avoid the sampling variability, class imbalance, and data-privacy concerns that often plague these methods, we instead provide a memory and computation-efficient probabilistic framework to extract class prototypes and align the target features with them. We demonstrate the general applicability of our method on a wide range of scenarios, including single-source, multi-source, class-imbalance, and source-private domain adaptation. Requiring no additional model parameters and having a moderate increase in computation over the source model alone, the proposed method achieves competitive performance with state-of-the-art methods.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Bayesian Attention Belief Networks
Authors:
Shujian Zhang,
Xinjie Fan,
Bo Chen,
Mingyuan Zhou
Abstract:
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierar…
▽ More
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierarchy of gamma distributions, and an encoder network by stacking Weibull distributions with a deterministic-upward-stochastic-downward structure to approximate the posterior. The resulting auto-encoding networks can be optimized in a differentiable way with a variational lower bound. It is simple to convert any models with deterministic attention, including pretrained ones, to the proposed Bayesian attention belief networks. On a variety of language understanding tasks, we show that our method outperforms deterministic attention and state-of-the-art stochastic attention in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our method on neural machine translation and visual question answering, showing great potential of incorporating our method into various attention-related tasks.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Deconvolutional Density Network: Modeling Free-Form Conditional Distributions
Authors:
Bing Chen,
Mazharul Islam,
Jisuo Gao,
Lin Wang
Abstract:
Conditional density estimation (CDE) is the task of estimating the probability of an event conditioned on some inputs. A neural network (NN) can also be used to compute the output distribution for continuous-domain, which can be viewed as an extension of regression task. Nevertheless, it is difficult to explicitly approximate a distribution without knowing the information of its general form a pri…
▽ More
Conditional density estimation (CDE) is the task of estimating the probability of an event conditioned on some inputs. A neural network (NN) can also be used to compute the output distribution for continuous-domain, which can be viewed as an extension of regression task. Nevertheless, it is difficult to explicitly approximate a distribution without knowing the information of its general form a priori. In order to fit an arbitrary conditional distribution, discretizing the continuous domain into bins is an effective strategy, as long as we have sufficiently narrow bins and very large data. However, collecting enough data is often hard to reach and falls far short of that ideal in many circumstances, especially in multivariate CDE for the curse of dimensionality. In this paper, we demonstrate the benefits of modeling free-form conditional distributions using a deconvolution-based neural net framework, co** with data deficiency problems in discretization. It has the advantage of being flexible but also takes advantage of the hierarchical smoothness offered by the deconvolution layers. We compare our method to a number of other density-estimation approaches and show that our Deconvolutional Density Network (DDN) outperforms the competing methods on many univariate and multivariate tasks. The code of DDN is available at https://github.com/NBICLAB/DDN.
△ Less
Submitted 28 December, 2021; v1 submitted 29 May, 2021;
originally announced May 2021.
-
Maximum profile binomial likelihood estimation for the semiparametric Box--Cox power transformation model
Authors:
Pengfei Li,
Tao Yu,
Baojiang Chen,
**g Qin
Abstract:
The Box--Cox transformation model has been widely applied for many years. The parametric version of this model assumes that the random error follows a parametric distribution, say the normal distribution, and estimates the model parameters using the maximum likelihood method. The semiparametric version assumes that the distribution of the random error is completely unknown; existing methods either…
▽ More
The Box--Cox transformation model has been widely applied for many years. The parametric version of this model assumes that the random error follows a parametric distribution, say the normal distribution, and estimates the model parameters using the maximum likelihood method. The semiparametric version assumes that the distribution of the random error is completely unknown; existing methods either need strong assumptions, or are less effective when the distribution of the random error significantly deviates from the normal distribution. We adopt the semiparametric assumption and propose a maximum profile binomial likelihood method. We theoretically establish the joint distribution of the estimators of the model parameters. Through extensive numerical studies, we demonstrate that our method has an advantage over existing methods, especially when the distribution of the random error deviates from the normal distribution. Furthermore, we compare the performance of our method and existing methods on an HIV data set.
△ Less
Submitted 18 May, 2021; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
Authors:
Dandan Guo,
Ruiying Lu,
Bo Chen,
Zequn Zeng,
Mingyuan Zhou
Abstract:
Observing a set of images and their corresponding paragraph-captions, a challenging task is to learn how to produce a semantically coherent paragraph to describe the visual content of an image. Inspired by recent successes in integrating semantic topics into this task, this paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework, which couples a visual extract…
▽ More
Observing a set of images and their corresponding paragraph-captions, a challenging task is to learn how to produce a semantically coherent paragraph to describe the visual content of an image. Inspired by recent successes in integrating semantic topics into this task, this paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework, which couples a visual extractor with a deep topic model to guide the learning of a language model. To capture the correlations between the image and text at multiple levels of abstraction and learn the semantic topics from images, we design a variational inference network to build the map** from image features to textual captions. To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model, including Long Short-Term Memory (LSTM) and Transformer, and jointly optimized. Experiments on public datasets demonstrate that the proposed models, which are competitive with many state-of-the-art approaches in terms of standard evaluation metrics, can be used to both distill interpretable multi-layer semantic topics and generate diverse and coherent captions. We release our code at https://github.com/DandanGuo1993/VTCM-based-image-paragraph-caption.git
△ Less
Submitted 25 July, 2022; v1 submitted 10 May, 2021;
originally announced May 2021.
-
Learning Robust Variational Information Bottleneck with Reference
Authors:
Weizhu Qian,
Bowei Chen,
Xiaowei Huang
Abstract:
We propose a new approach to train a variational information bottleneck (VIB) that improves its robustness to adversarial perturbations. Unlike the traditional methods where the hard labels are usually used for the classification task, we refine the categorical class information in the training phase with soft labels which are obtained from a pre-trained reference neural network and can reflect th…
▽ More
We propose a new approach to train a variational information bottleneck (VIB) that improves its robustness to adversarial perturbations. Unlike the traditional methods where the hard labels are usually used for the classification task, we refine the categorical class information in the training phase with soft labels which are obtained from a pre-trained reference neural network and can reflect the likelihood of the original class labels. We also relax the Gaussian posterior assumption in the VIB implementation by using the mutual information neural estimation. Extensive experiments have been performed with the MNIST and CIFAR-10 datasets, and the results show that our proposed approach significantly outperforms the benchmarked models.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Maximum pairwise-rank-likelihood-based inference for the semiparametric transformation model
Authors:
Tao Yu,
Pengfei Li,
Baojiang Chen,
Ao Yuan,
**g Qin
Abstract:
In this paper, we study the linear transformation model in the most general setup. This model includes many important and popular models in statistics and econometrics as special cases. Although it has been studied for many years, the methods in the literature are based on kernel-smoothing techniques or make use of only the ranks of the responses in the estimation of the parametric components. The…
▽ More
In this paper, we study the linear transformation model in the most general setup. This model includes many important and popular models in statistics and econometrics as special cases. Although it has been studied for many years, the methods in the literature are based on kernel-smoothing techniques or make use of only the ranks of the responses in the estimation of the parametric components. The former approach needs a tuning parameter, which is not easily optimally specified in practice; and the latter is computationally expensive and may not make full use of the information in the data. In this paper, we propose two methods: a pairwise rank likelihood method and a score-function-based method based on this pairwise rank likelihood. We also explore the theoretical properties of the proposed estimators. Via extensive numerical studies, we demonstrate that our methods are appealing in that the estimators are not only robust to the distribution of the random errors but also lead to mean square errors that are in many cases comparable to or smaller than those of existing methods.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
An Investigation of how Label Smoothing Affects Generalization
Authors:
Blair Chen,
Liu Ziyin,
Zihao Wang,
Paul Pu Liang
Abstract:
It has been hypothesized that label smoothing can reduce overfitting and improve generalization, and current empirical evidence seems to corroborate these effects. However, there is a lack of mathematical understanding of when and why such empirical improvements occur. In this paper, as a step towards understanding why label smoothing is effective, we propose a theoretical framework to show how la…
▽ More
It has been hypothesized that label smoothing can reduce overfitting and improve generalization, and current empirical evidence seems to corroborate these effects. However, there is a lack of mathematical understanding of when and why such empirical improvements occur. In this paper, as a step towards understanding why label smoothing is effective, we propose a theoretical framework to show how label smoothing provides in controlling the generalization loss. In particular, we show that this benefit can be precisely formulated and identified in the label noise setting, where the training is partially mislabeled. Our theory also predicts the existence of an optimal label smoothing point, a single value for the label smoothing hyperparameter that minimizes generalization loss. Extensive experiments are done to confirm the predictions of our theory. We believe that our findings will help both theoreticians and practitioners understand label smoothing, and better apply them to real-world datasets.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Bayesian Attention Modules
Authors:
Xinjie Fan,
Shujian Zhang,
Bo Chen,
Mingyuan Zhou
Abstract:
Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main re…
▽ More
Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the proposed stochastic attention modules to various attention-based models, with applications to graph node classification, visual question answering, image captioning, machine translation, and language understanding. Our experiments show the proposed method brings consistent improvements over the corresponding baselines.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Variational Temporal Deep Generative Model for Radar HRRP Target Recognition
Authors:
Dandan Guo,
Bo Chen,
Wenchao Chen,
Chaojie Wang,
Hongwei Liu,
Mingyuan Zhou
Abstract:
We develop a recurrent gamma belief network (rGBN) for radar automatic target recognition (RATR) based on high-resolution range profile (HRRP), which characterizes the temporal dependence across the range cells of HRRP. The proposed rGBN adopts a hierarchy of gamma distributions to build its temporal deep generative model. For scalable training and fast out-of-sample prediction, we propose the hyb…
▽ More
We develop a recurrent gamma belief network (rGBN) for radar automatic target recognition (RATR) based on high-resolution range profile (HRRP), which characterizes the temporal dependence across the range cells of HRRP. The proposed rGBN adopts a hierarchy of gamma distributions to build its temporal deep generative model. For scalable training and fast out-of-sample prediction, we propose the hybrid of a stochastic-gradient Markov chain Monte Carlo (MCMC) and a recurrent variational inference model to perform posterior inference. To utilize the label information to extract more discriminative latent representations, we further propose supervised rGBN to jointly model the HRRP samples and their corresponding labels. Experimental results on synthetic and measured HRRP data show that the proposed models are efficient in computation, have good classification accuracy and generalization ability, and provide highly interpretable multi-stochastic-layer latent structure.
△ Less
Submitted 27 September, 2020;
originally announced September 2020.
-
Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation
Authors:
Wenhao Ding,
Baiming Chen,
Bo Li,
Kim Ji Eun,
Ding Zhao
Abstract:
Existing neural network-based autonomous systems are shown to be vulnerable against adversarial attacks, therefore sophisticated evaluation on their robustness is of great importance. However, evaluating the robustness only under the worst-case scenarios based on known attacks is not comprehensive, not to mention that some of them even rarely occur in the real world. In addition, the distribution…
▽ More
Existing neural network-based autonomous systems are shown to be vulnerable against adversarial attacks, therefore sophisticated evaluation on their robustness is of great importance. However, evaluating the robustness only under the worst-case scenarios based on known attacks is not comprehensive, not to mention that some of them even rarely occur in the real world. In addition, the distribution of safety-critical data is usually multimodal, while most traditional attacks and evaluation methods focus on a single modality. To solve the above challenges, we propose a flow-based multimodal safety-critical scenario generator for evaluating decisionmaking algorithms. The proposed generative model is optimized with weighted likelihood maximization and a gradient-based sampling procedure is integrated to improve the sampling efficiency. The safety-critical scenarios are generated by querying the task algorithms and the log-likelihood of the generated scenarios is in proportion to the risk level. Experiments on a self-driving task demonstrate our advantages in terms of testing efficiency and multimodal modeling capability. We evaluate six Reinforcement Learning algorithms with our generated traffic scenarios and provide empirical conclusions about their robustness.
△ Less
Submitted 26 December, 2020; v1 submitted 16 September, 2020;
originally announced September 2020.
-
Implicit Multidimensional Projection of Local Subspaces
Authors:
Rongzheng Bian,
Yumeng Xue,
Liang Zhou,
Jian Zhang,
Baoquan Chen,
Daniel Weiskopf,
Yunhai Wang
Abstract:
We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze th…
▽ More
We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze the shape and directional information of the local subspace to gain more insights into the global structure of the data through the perception of local structures. Local subspaces are fitted by multidimensional ellipses that are spanned by basis vectors. An accurate and efficient vector transformation method is proposed based on analytical differentiation of multidimensional projections formulated as implicit functions. The results are visualized as glyphs and analyzed using a full set of specifically-designed interactions supported in our efficient web-based visualization tool. The usefulness of our method is demonstrated using various multi- and high-dimensional benchmark datasets. Our implicit differentiation vector transformation is evaluated through numerical comparisons; the overall method is evaluated through exploration examples and use cases.
△ Less
Submitted 20 July, 2023; v1 submitted 7 September, 2020;
originally announced September 2020.
-
SOLAR: Sparse Orthogonal Learned and Random Embeddings
Authors:
Tharun Medini,
Beidi Chen,
Anshumali Shrivastava
Abstract:
Dense embedding models are commonly deployed in commercial search engines, wherein all the document vectors are pre-computed, and near-neighbor search (NNS) is performed with the query vector to find relevant documents. However, the bottleneck of indexing a large number of dense vectors and performing an NNS hurts the query time and accuracy of these models. In this paper, we argue that high-dimen…
▽ More
Dense embedding models are commonly deployed in commercial search engines, wherein all the document vectors are pre-computed, and near-neighbor search (NNS) is performed with the query vector to find relevant documents. However, the bottleneck of indexing a large number of dense vectors and performing an NNS hurts the query time and accuracy of these models. In this paper, we argue that high-dimensional and ultra-sparse embedding is a significantly superior alternative to dense low-dimensional embedding for both query efficiency and accuracy. Extreme sparsity eliminates the need for NNS by replacing them with simple lookups, while its high dimensionality ensures that the embeddings are informative even when sparse. However, learning extremely high dimensional embeddings leads to blow up in the model size. To make the training feasible, we propose a partitioning algorithm that learns such high dimensional embeddings across multiple GPUs without any communication. This is facilitated by our novel asymmetric mixture of Sparse, Orthogonal, Learned and Random (SOLAR) Embeddings. The label vectors are random, sparse, and near-orthogonal by design, while the query vectors are learned and sparse. We theoretically prove that our way of one-sided learning is equivalent to learning both query and label embeddings. With these unique properties, we can successfully train 500K dimensional SOLAR embeddings for the tasks of searching through 1.6M books and multi-label classification on the three largest public datasets. We achieve superior precision and recall compared to the respective state-of-the-art baselines for each of the tasks with up to 10 times faster speed.
△ Less
Submitted 30 August, 2020;
originally announced August 2020.
-
Causal mediation analysis decomposition of between-hospital variance
Authors:
Bo Chen,
Keith A. Lawson,
Antonio Finelli,
Olli Saarela
Abstract:
Causal variance decompositions for a given disease-specific quality indicator can be used to quantify differences in performance between hospitals or health care providers. While variance decompositions can demonstrate variation in quality of care, causal mediation analysis can be used to study care pathways leading to the differences in performance between the institutions. This raises the questi…
▽ More
Causal variance decompositions for a given disease-specific quality indicator can be used to quantify differences in performance between hospitals or health care providers. While variance decompositions can demonstrate variation in quality of care, causal mediation analysis can be used to study care pathways leading to the differences in performance between the institutions. This raises the question of whether the two approaches can be combined to decompose between-hospital variation in an outcome type indicator to that mediated through a given process (indirect effect) and remaining variation due to all other pathways (direct effect). For this purpose, we derive a causal mediation analysis decomposition of between-hospital variance, discuss its interpretation, and propose an estimation approach based on generalized linear mixed models for the outcome and the mediator. We study the performance of the estimators in a simulation study and demonstrate its use in administrative data on kidney cancer care in Ontario.
△ Less
Submitted 24 January, 2023; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Can weight sharing outperform random architecture search? An investigation with TuNAS
Authors:
Gabriel Bender,
Hanxiao Liu,
Bo Chen,
Grace Chu,
Shuyang Cheng,
Pieter-Jan Kindermans,
Quoc Le
Abstract:
Efficient Neural Architecture Search methods based on weight sharing have shown good promise in democratizing Neural Architecture Search for computer vision models. There is, however, an ongoing debate whether these efficient methods are significantly better than random search. Here we perform a thorough comparison between efficient and random search methods on a family of progressively larger and…
▽ More
Efficient Neural Architecture Search methods based on weight sharing have shown good promise in democratizing Neural Architecture Search for computer vision models. There is, however, an ongoing debate whether these efficient methods are significantly better than random search. Here we perform a thorough comparison between efficient and random search methods on a family of progressively larger and more challenging search spaces for image classification and detection on ImageNet and COCO. While the efficacies of both methods are problem-dependent, our experiments demonstrate that there are large, realistic tasks where efficient search methods can provide substantial gains over random search. In addition, we propose and evaluate techniques which improve the quality of searched architectures and reduce the need for manual hyper-parameter tuning.
Source code and experiment data are available at https://github.com/google-research/google-research/tree/master/tunas
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Multi-Task Variational Information Bottleneck
Authors:
Weizhu Qian,
Bowei Chen,
Yichao Zhang,
Guanghui Wen,
Franck Gechter
Abstract:
Multi-task learning (MTL) is an important subject in machine learning and artificial intelligence. Its applications to computer vision, signal processing, and speech recognition are ubiquitous. Although this subject has attracted considerable attention recently, the performance and robustness of the existing models to different tasks have not been well balanced. This article proposes an MTL model…
▽ More
Multi-task learning (MTL) is an important subject in machine learning and artificial intelligence. Its applications to computer vision, signal processing, and speech recognition are ubiquitous. Although this subject has attracted considerable attention recently, the performance and robustness of the existing models to different tasks have not been well balanced. This article proposes an MTL model based on the architecture of the variational information bottleneck (VIB), which can provide a more effective latent representation of the input features for the downstream tasks. Extensive observations on three public data sets under adversarial attacks show that the proposed model is competitive to the state-of-the-art algorithms concerning the prediction accuracy. Experimental results suggest that combining the VIB and the task-dependent uncertainties is a very effective way to abstract valid information from the input features for accomplishing multiple tasks.
△ Less
Submitted 1 March, 2021; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search
Authors:
Binghong Chen,
Chengtao Li,
Hanjun Dai,
Le Song
Abstract:
Retrosynthetic planning is a critical task in organic chemistry which identifies a series of reactions that can lead to the synthesis of a target product. The vast number of possible chemical transformations makes the size of the search space very big, and retrosynthetic planning is challenging even for experienced chemists. However, existing methods either require expensive return estimation by r…
▽ More
Retrosynthetic planning is a critical task in organic chemistry which identifies a series of reactions that can lead to the synthesis of a target product. The vast number of possible chemical transformations makes the size of the search space very big, and retrosynthetic planning is challenging even for experienced chemists. However, existing methods either require expensive return estimation by rollout with high variance, or optimize for search speed rather than the quality. In this paper, we propose Retro*, a neural-based A*-like algorithm that finds high-quality synthetic routes efficiently. It maintains the search as an AND-OR tree, and learns a neural search bias with off-policy data. Then guided by this neural network, it performs best-first search efficiently during new planning episodes. Experiments on benchmark USPTO datasets show that, our proposed method outperforms existing state-of-the-art with respect to both the success rate and solution quality, while being more efficient at the same time.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes
Authors:
Mengdi Xu,
Wenhao Ding,
Jiacheng Zhu,
Zuxin Liu,
Baiming Chen,
Ding Zhao
Abstract:
Continuously learning to solve unseen tasks with limited experience has been extensively pursued in meta-learning and continual learning, but with restricted assumptions such as accessible task distributions, independently and identically distributed tasks, and clear task delineations. However, real-world physical tasks frequently violate these assumptions, resulting in performance degradation. Th…
▽ More
Continuously learning to solve unseen tasks with limited experience has been extensively pursued in meta-learning and continual learning, but with restricted assumptions such as accessible task distributions, independently and identically distributed tasks, and clear task delineations. However, real-world physical tasks frequently violate these assumptions, resulting in performance degradation. This paper proposes a continual online model-based reinforcement learning approach that does not require pre-training to solve task-agnostic problems with unknown task boundaries. We maintain a mixture of experts to handle nonstationarity, and represent each different type of dynamics with a Gaussian Process to efficiently leverage collected data and expressively model uncertainty. We propose a transition prior to account for the temporal dependencies in streaming data and update the mixture online via sequential variational inference. Our approach reliably handles the task distribution shift by generating new models for never-before-seen dynamics and reusing old models for previously seen dynamics. In experiments, our approach outperforms alternative methods in non-stationary tasks, including classic control with changing dynamics and decision making in different driving scenarios.
△ Less
Submitted 30 November, 2020; v1 submitted 19 June, 2020;
originally announced June 2020.
-
Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference
Authors:
Hao Zhang,
Bo Chen,
Yulai Cong,
Dandan Guo,
Hongwei Liu,
Mingyuan Zhou
Abstract:
To build a flexible and interpretable model for document analysis, we develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network. In order to provide scalable posterior inference for the parameters of the generative network, we develop topic-layer-adaptive stochastic gradient Riemannian MCMC that jointly lear…
▽ More
To build a flexible and interpretable model for document analysis, we develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network. In order to provide scalable posterior inference for the parameters of the generative network, we develop topic-layer-adaptive stochastic gradient Riemannian MCMC that jointly learns simplex-constrained global parameters across all layers and topics, with topic and layer specific learning rates. Given a posterior sample of the global parameters, in order to efficiently infer the local latent representations of a document under DATM across all stochastic layers, we propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a Weibull distribution based stochastic downward generative model. To jointly model documents and their associated labels, we further propose supervised DATM that enhances the discriminative power of its latent representations. The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
Optimal Transport Graph Neural Networks
Authors:
Benson Chen,
Gary Bécigneul,
Octavian-Eugen Ganea,
Regina Barzilay,
Tommi Jaakkola
Abstract:
Current graph neural network (GNN) architectures naively average or sum node embeddings into an aggregated graph representation -- potentially losing structural or semantic information. We here introduce OT-GNN, a model that computes graph embeddings using parametric prototypes that highlight key facets of different graph aspects. Towards this goal, we successfully combine optimal transport (OT) w…
▽ More
Current graph neural network (GNN) architectures naively average or sum node embeddings into an aggregated graph representation -- potentially losing structural or semantic information. We here introduce OT-GNN, a model that computes graph embeddings using parametric prototypes that highlight key facets of different graph aspects. Towards this goal, we successfully combine optimal transport (OT) with parametric graph models. Graph representations are obtained from Wasserstein distances between the set of GNN node embeddings and ``prototype'' point clouds as free parameters. We theoretically prove that, unlike traditional sum aggregation, our function class on point clouds satisfies a fundamental universal approximation theorem. Empirically, we address an inherent collapse optimization issue by proposing a noise contrastive regularizer to steer the model towards truly exploiting the OT geometry. Finally, we outperform popular methods on several molecular property prediction tasks, while exhibiting smoother graph representations.
△ Less
Submitted 8 October, 2021; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Hierarchical causal variance decomposition for institution and provider comparisons in healthcare
Authors:
Bo Chen,
Olli Saarela
Abstract:
Disease-specific quality indicators (QIs) are used to compare institutions and health care providers in terms processes or outcomes relevant to treatment of a particular condition. In the context of surgical cancer treatments, the performance variations can be due to hospital and/or surgeon level differences, creating a hierarchical clustering. We consider how the observed variation in care receiv…
▽ More
Disease-specific quality indicators (QIs) are used to compare institutions and health care providers in terms processes or outcomes relevant to treatment of a particular condition. In the context of surgical cancer treatments, the performance variations can be due to hospital and/or surgeon level differences, creating a hierarchical clustering. We consider how the observed variation in care received at patient level can be decomposed into that causally explained by the hospital performance, surgeon performance within hospital, patient case-mix, and unexplained (residual) variation. For this purpose, we derive a four-way variance decomposition, with particular attention to the causal interpretation of the components. For estimation, we use inputs from a mixed-effect model with nested random hospital/surgeon-specific effects, and a multinomial logistic model for the hospital/surgeon-specific patient populations. We investigate the performance of our methods in a simulation study.
△ Less
Submitted 21 May, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Delay-Aware Multi-Agent Reinforcement Learning for Cooperative and Competitive Environments
Authors:
Baiming Chen,
Mengdi Xu,
Zuxin Liu,
Liang Li,
Ding Zhao
Abstract:
Action and observation delays exist prevalently in the real-world cyber-physical systems which may pose challenges in reinforcement learning design. It is particularly an arduous task when handling multi-agent systems where the delay of one agent could spread to other agents. To resolve this problem, this paper proposes a novel framework to deal with delays as well as the non-stationary training i…
▽ More
Action and observation delays exist prevalently in the real-world cyber-physical systems which may pose challenges in reinforcement learning design. It is particularly an arduous task when handling multi-agent systems where the delay of one agent could spread to other agents. To resolve this problem, this paper proposes a novel framework to deal with delays as well as the non-stationary training issue of multi-agent tasks with model-free deep reinforcement learning. We formally define the Delay-Aware Markov Game that incorporates the delays of all agents in the environment. To solve Delay-Aware Markov Games, we apply centralized training and decentralized execution that allows agents to use extra information to ease the non-stationarity issue of the multi-agent systems during training, without the need of a centralized controller during execution. Experiments are conducted in multi-agent particle environments including cooperative communication, cooperative navigation, and competitive experiments. We also test the proposed algorithm in traffic scenarios that require coordination of all autonomous vehicles to show the practical value of delay-awareness. Results show that the proposed delay-aware multi-agent reinforcement learning algorithm greatly alleviates the performance degradation introduced by delay. Codes and demo videos are available at: https://github.com/baimingc/delay-aware-MARL.
△ Less
Submitted 28 August, 2020; v1 submitted 11 May, 2020;
originally announced May 2020.
-
Delay-Aware Model-Based Reinforcement Learning for Continuous Control
Authors:
Baiming Chen,
Mengdi Xu,
Liang Li,
Ding Zhao
Abstract:
Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the le…
▽ More
Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with off-policy model-free reinforcement learning methods. Codes available at: https://github.com/baimingc/dambrl.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
Comparison and Benchmark of Graph Clustering Algorithms
Authors:
Lizhen Shi,
Bo Chen
Abstract:
Graph clustering is widely used in analysis of biological networks, social networks and etc. For over a decade many graph clustering algorithms have been published, however a comprehensive and consistent performance comparison is not available. In this paper we benchmarked more than 70 graph clustering programs to evaluate their runtime and quality performance for both weighted and unweighted grap…
▽ More
Graph clustering is widely used in analysis of biological networks, social networks and etc. For over a decade many graph clustering algorithms have been published, however a comprehensive and consistent performance comparison is not available. In this paper we benchmarked more than 70 graph clustering programs to evaluate their runtime and quality performance for both weighted and unweighted graphs. We also analyzed the characteristics of ground truth that affects the performance. Our work is capable to not only supply a start point for engineers to select clustering algorithms but also could provide a viewpoint for researchers to design new algorithms.
△ Less
Submitted 10 May, 2020;
originally announced May 2020.
-
IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report
Authors:
Qi She,
Fan Feng,
Qi Liu,
Rosa H. M. Chan,
Xinyue Hao,
Chuanlin Lan,
Qihan Yang,
Vincenzo Lomonaco,
German I. Parisi,
Heechul Bae,
Eoin Brophy,
Baoquan Chen,
Gabriele Graffieti,
Vidit Goel,
Hyonyoung Han,
Sathursan Kanagarajah,
Somesh Kumar,
Siew-Kei Lam,
Tin Lun Lam,
Liang Ma,
Davide Maltoni,
Lorenzo Pellegrini,
Duvindu Piyasena,
Shiliang Pu,
Debdoot Sheet
, et al. (11 additional authors not shown)
Abstract:
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w…
▽ More
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, with everyday objects in home, office, campus, and mall scenarios. The dataset explicitly quantifies the variants of illumination, object occlusion, object size, camera-object distance/angles, and clutter information. Rules are designed to quantify the learning capability of the robotic vision system when faced with the objects appearing in the dynamic environments in the contest. Individual reports, dataset information, rules, and released source code can be found at the project homepage: "https://lifelong-robotic-vision.github.io/competition/".
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Adversarial Evaluation of Autonomous Vehicles in Lane-Change Scenarios
Authors:
Baiming Chen,
Xiang Chen,
Wu Qiong,
Liang Li
Abstract:
Autonomous vehicles must be comprehensively evaluated before deployed in cities and highways. However, most existing evaluation approaches for autonomous vehicles are static and lack adaptability, so they are usually inefficient in generating challenging scenarios for tested vehicles. In this paper, we propose an adaptive evaluation framework to efficiently evaluate autonomous vehicles in adversar…
▽ More
Autonomous vehicles must be comprehensively evaluated before deployed in cities and highways. However, most existing evaluation approaches for autonomous vehicles are static and lack adaptability, so they are usually inefficient in generating challenging scenarios for tested vehicles. In this paper, we propose an adaptive evaluation framework to efficiently evaluate autonomous vehicles in adversarial environments generated by deep reinforcement learning. Considering the multimodal nature of dangerous scenarios, we use ensemble models to represent different local optimums for diversity. We then utilize a nonparametric Bayesian method to cluster the adversarial policies. The proposed method is validated in a typical lane-change scenario that involves frequent interactions between the ego vehicle and the surrounding vehicles. Results show that the adversarial scenarios generated by our method significantly degrade the performance of the tested vehicles. We also illustrate different patterns of generated adversarial environments, which can be used to infer the weaknesses of the tested vehicles.
△ Less
Submitted 23 November, 2020; v1 submitted 14 April, 2020;
originally announced April 2020.
-
Time-Frequency Analysis based Blind Modulation Classification for Multiple-Antenna Systems
Authors:
Weiheng Jiang,
Xiaogang Wu,
Bolin Chen,
Wenjiang Feng,
Yi **
Abstract:
Blind modulation classification is an important step to implement cognitive radio networks. The multiple-input multiple-output (MIMO) technique is widely used in military and civil communication systems. Due to the lack of prior information about channel parameters and the overlap** of signals in the MIMO systems, the traditional likelihood-based and feature-based approaches cannot be applied in…
▽ More
Blind modulation classification is an important step to implement cognitive radio networks. The multiple-input multiple-output (MIMO) technique is widely used in military and civil communication systems. Due to the lack of prior information about channel parameters and the overlap** of signals in the MIMO systems, the traditional likelihood-based and feature-based approaches cannot be applied in these scenarios directly. Hence, in this paper, to resolve the problem of blind modulation classification in MIMO systems, the time-frequency analysis method based on the windowed short-time Fourier transform is used to analyse the time-frequency characteristics of time-domain modulated signals. Then the extracted time-frequency characteristics are converted into RGB spectrogram images, and the convolutional neural network based on transfer learning is applied to classify the modulation types according to the RGB spectrogram images. Finally, a decision fusion module is used to fuse the classification results of all the receive antennas. Through simulations, we analyse the classification performance at different signal-to-noise ratios (SNRs), the results indicate that, for the single-input single-output (SISO) network, our proposed scheme can achieve 92.37% and 99.12% average classification accuracy at SNRs of -4 dB and 10 dB, respectively. For the MIMO network, our scheme achieves 80.42% and 87.92% average classification accuracy at -4 dB and 10 dB, respectively. This outperforms the existing classification methods based on baseband signals.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Automatic Hyper-Parameter Optimization Based on Map** Discovery from Data to Hyper-Parameters
Authors:
Bozhou Chen,
Kaixin Zhang,
Longshen Ou,
Chenmin Ba,
Hongzhi Wang,
Chunnan Wang
Abstract:
Machine learning algorithms have made remarkable achievements in the field of artificial intelligence. However, most machine learning algorithms are sensitive to the hyper-parameters. Manually optimizing the hyper-parameters is a common method of hyper-parameter tuning. However, it is costly and empirically dependent. Automatic hyper-parameter optimization (autoHPO) is favored due to its effective…
▽ More
Machine learning algorithms have made remarkable achievements in the field of artificial intelligence. However, most machine learning algorithms are sensitive to the hyper-parameters. Manually optimizing the hyper-parameters is a common method of hyper-parameter tuning. However, it is costly and empirically dependent. Automatic hyper-parameter optimization (autoHPO) is favored due to its effectiveness. However, current autoHPO methods are usually only effective for a certain type of problems, and the time cost is high. In this paper, we propose an efficient automatic parameter optimization approach, which is based on the map** from data to the corresponding hyper-parameters. To describe such map**, we propose a sophisticated network structure. To obtain such map**, we develop effective network constrution algorithms. We also design strategy to optimize the result futher during the application of the map**. Extensive experimental results demonstrate that the proposed approaches outperform the state-of-the-art apporaches significantly.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Learning Not to Learn in the Presence of Noisy Labels
Authors:
Liu Ziyin,
Blair Chen,
Ru Wang,
Paul Pu Liang,
Ruslan Salakhutdinov,
Louis-Philippe Morency,
Masahito Ueda
Abstract:
Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets. In this paper, we discover that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption. We show that training with this loss function encourages the model to…
▽ More
Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets. In this paper, we discover that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption. We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels, resulting in a simple and effective method to improve robustness and generalization. In addition, we propose two practical extensions of the method: 1) an analytical early stop** criterion to approximately stop training before the memorization of noisy labels, as well as 2) a heuristic for setting hyperparameters which do not require knowledge of the noise corruption rate. We demonstrate the effectiveness of our method by achieving strong results across three image and text classification tasks as compared to existing baselines.
△ Less
Submitted 16 February, 2020;
originally announced February 2020.
-
Harvesting Ambient RF for Presence Detection Through Deep Learning
Authors:
Yang Liu,
Tiexing Wang,
Yuexin Jiang,
Biao Chen
Abstract:
This paper explores the use of ambient radio frequency (RF) signals for human presence detection through deep learning. Using WiFi signal as an example, we demonstrate that the channel state information (CSI) obtained at the receiver contains rich information about the propagation environment. Through judicious pre-processing of the estimated CSI followed by deep learning, reliable presence detect…
▽ More
This paper explores the use of ambient radio frequency (RF) signals for human presence detection through deep learning. Using WiFi signal as an example, we demonstrate that the channel state information (CSI) obtained at the receiver contains rich information about the propagation environment. Through judicious pre-processing of the estimated CSI followed by deep learning, reliable presence detection can be achieved. Several challenges in passive RF sensing are addressed. With presence detection, how to collect training data with human presence can have a significant impact on the performance. This is in contrast to activity detection when a specific motion pattern is of interest. A second challenge is that RF signals are complex-valued. Handling complex-valued input in deep learning requires careful data representation and network architecture design. Finally, human presence affects CSI variation along multiple dimensions; such variation, however, is often masked by system impediments such as timing or frequency offset. Addressing these challenges, the proposed learning system uses pre-processing to preserve human motion induced channel variation while insulating against other impairments. A convolutional neural network (CNN) properly trained with both magnitude and phase information is then designed to achieve reliable presence detection. Extensive experiments are conducted. Using off-the-shelf WiFi devices, the proposed deep learning based RF sensing achieves near perfect presence detection during multiple extended periods of test and exhibits superior performance compared with leading edge passive infrared sensors. Comparison with existing RF based human presence detection also demonstrates its robustness in performance, especially when deployed in a completely new environment.
△ Less
Submitted 9 December, 2020; v1 submitted 13 February, 2020;
originally announced February 2020.