Search | arXiv e-print repository

Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues include but are not limited to high sample complexity, which relates to inaccurate approximation of black-box function; and insufficient coverage and exploration of input design modes, which leads to suboptimal proposal of new input designs. In this work, we consider finding a latent space that serves as a compressed yet accurate representation of the design-value joint space, enabling effective latent exploration of high-value input design modes. To this end, we formulate an learnable energy-based latent space, and propose Noise-intensified Telesco** density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo. The optimization process is then exploration of high-value designs guided by the learned energy-based model in the latent space, formulated as gradient-based sampling from a latent-variable-parameterized inverse model. We show that our particular parameterization encourages expanded exploration around high-value design modes, motivated by inversion thinking of a fundamental result of conditional covariance matrix typically used for variance reduction. We observe that our method, backed by an accurately learned informative latent space and an expanding-exploration model design, yields significant improvements over strong previous methods on both synthetic and real world datasets such as the design-bench suite. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2310.12667 [pdf, other]

STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models

Authors: Belhal Karimi, Jianwen Xie, ** Li

Abstract: We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based mo… ▽ More We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based models (EBM) with the purpose of improving the quality of the resulting sampled data points. While the unknown normalizing constant of EBMs makes the training procedure intractable, resorting to Markov Chain Monte Carlo (MCMC) is in general a viable option. Realizing what MCMC entails for the EBM training, we propose in this paper, a novel high dimensional sampling method, based on an anisotropic stepsize and a gradient-informed covariance matrix, embedded into a discretized Langevin diffusion. We motivate the necessity for an anisotropic update of the negative samples in the Markov Chain by the nonlinearity of the backbone of the EBM, here a Convolutional Neural Network. Our resulting method, namely STANLEY, is an optimization algorithm for training Energy-Based models via our newly introduced MCMC method. We provide a theoretical understanding of our sampling scheme by proving that the sampler leads to a geometrically uniformly ergodic Markov Chain. Several image generation experiments are provided in our paper to show the effectiveness of our method. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: arXiv admin note: text overlap with arXiv:1207.5938 by other authors

arXiv:2310.03253 [pdf, other]

Molecule Design by Latent Prompt Transformer

Authors: Deqian Kong, Yuhao Huang, Jianwen Xie, Ying Nian Wu

Abstract: This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation… ▽ More This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks. △ Less

Submitted 5 February, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.05153 [pdf, other]

Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood

Authors: Yaxuan Zhu, Jianwen Xie, Yingnian Wu, Ruiqi Gao

Abstract: Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming, and there exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likeli… ▽ More Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming, and there exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versions of a dataset, paired with an initializer model for each EBM. At each noise level, the two models are jointly estimated within a cooperative training framework: samples from the initializer serve as starting points that are refined by a few MCMC sampling steps from the EBM. The EBM is then optimized by maximizing recovery likelihood, while the initializer model is optimized by learning from the difference between the refined samples and the initial samples. In addition, we made several practical designs for EBM training to further improve the sample quality. Combining these advances, our approach significantly boost the generation performance compared to existing EBM methods on CIFAR-10 and ImageNet datasets. We also demonstrate the effectiveness of our models for several downstream tasks, including classifier-free guided generation, compositional generation, image inpainting and out-of-distribution detection. △ Less

Submitted 18 April, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

arXiv:2304.07918 [pdf, other]

Likelihood-Based Generative Radiance Field with Latent Space Energy-Based Model for 3D-Aware Disentangled Image Representation

Authors: Yaxuan Zhu, Jianwen Xie, ** Li

Abstract: We propose the NeRF-LEBM, a likelihood-based top-down 3D-aware 2D image generative model that incorporates 3D representation via Neural Radiance Fields (NeRF) and 2D imaging process via differentiable volume rendering. The model represents an image as a rendering process from 3D object to 2D image and is conditioned on some latent variables that account for object characteristics and are assumed t… ▽ More We propose the NeRF-LEBM, a likelihood-based top-down 3D-aware 2D image generative model that incorporates 3D representation via Neural Radiance Fields (NeRF) and 2D imaging process via differentiable volume rendering. The model represents an image as a rendering process from 3D object to 2D image and is conditioned on some latent variables that account for object characteristics and are assumed to follow informative trainable energy-based prior models. We propose two likelihood-based learning frameworks to train the NeRF-LEBM: (i) maximum likelihood estimation with Markov chain Monte Carlo-based inference and (ii) variational inference with the reparameterization trick. We study our models in the scenarios with both known and unknown camera poses. Experiments on several benchmark datasets demonstrate that the NeRF-LEBM can infer 3D object structures from 2D images, generate 2D images with novel views and objects, learn from incomplete 2D images, and learn from 2D images with known or unknown camera poses. △ Less

Submitted 16 April, 2023; originally announced April 2023.

arXiv:2303.05389 [pdf]

doi 10.1080/07421222.2024.2340822

Depression Detection Using Digital Traces on Social Media: A Knowledge-aware Deep Learning Approach

Authors: Wenli Zhang, Jiaheng Xie, Zhu Zhang, Xiang Liu

Abstract: Depression is a common disease worldwide. It is difficult to diagnose and continues to be underdiagnosed. Because depressed patients constantly share their symptoms, major life events, and treatments on social media, researchers are turning to user-generated digital traces on social media for depression detection. Such methods have distinct advantages in combating depression because they can facil… ▽ More Depression is a common disease worldwide. It is difficult to diagnose and continues to be underdiagnosed. Because depressed patients constantly share their symptoms, major life events, and treatments on social media, researchers are turning to user-generated digital traces on social media for depression detection. Such methods have distinct advantages in combating depression because they can facilitate innovative approaches to fight depression and alleviate its social and economic burden. However, most existing studies lack effective means to incorporate established medical domain knowledge in depression detection or suffer from feature extraction difficulties that impede greater performance. Following the design science research paradigm, we propose a Deep Knowledge-aware Depression Detection (DKDD) framework to accurately detect social media users at risk of depression and explain the critical factors that contribute to such detection. Extensive empirical studies with real-world data demonstrate that, by incorporating domain knowledge, our method outperforms existing state-of-the-art methods. Our work has significant implications for IS research in knowledge-aware machine learning, digital traces utilization, and NLP research in IS. Practically, by providing early detection and explaining the critical factors, DKDD can supplement clinical depression screening and enable large-scale evaluations of a population's mental health status. △ Less

Submitted 1 August, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Presented at INFORMS 2022 Data Science Workshop

MSC Class: H.4.m ACM Class: K.5

arXiv:2303.03532 [pdf, ps, other]

Extreme eigenvalues of sample covariance matrices under generalized elliptical models with applications

Authors: Xiucai Ding, Jiahui Xie, Long Yu, Wang Zhou

Abstract: We consider the extreme eigenvalues of the sample covariance matrix $Q=YY^*$ under the generalized elliptical model that $Y=Σ^{1/2}XD.$ Here $Σ$ is a bounded $p \times p$ positive definite deterministic matrix representing the population covariance structure, $X$ is a $p \times n$ random matrix containing either independent columns sampled from the unit sphere in $\mathbb{R}^p$ or i.i.d. centered… ▽ More We consider the extreme eigenvalues of the sample covariance matrix $Q=YY^*$ under the generalized elliptical model that $Y=Σ^{1/2}XD.$ Here $Σ$ is a bounded $p \times p$ positive definite deterministic matrix representing the population covariance structure, $X$ is a $p \times n$ random matrix containing either independent columns sampled from the unit sphere in $\mathbb{R}^p$ or i.i.d. centered entries with variance $n^{-1},$ and $D$ is a diagonal random matrix containing i.i.d. entries and independent of $X.$ Such a model finds important applications in statistics and machine learning. In this paper, assuming that $p$ and $n$ are comparably large, we prove that the extreme edge eigenvalues of $Q$ can have several types of distributions depending on $Σ$ and $D$ asymptotically. These distributions include: Gumbel, Fréchet, Weibull, Tracy-Widom, Gaussian and their mixtures. On the one hand, when the random variables in $D$ have unbounded support, the edge eigenvalues of $Q$ can have either Gumbel or Fréchet distribution depending on the tail decay property of $D.$ On the other hand, when the random variables in $D$ have bounded support, under some mild regularity assumptions on $Σ,$ the edge eigenvalues of $Q$ can exhibit Weibull, Tracy-Widom, Gaussian or their mixtures. Based on our theoretical results, we consider two important applications. First, we propose some statistics and procedure to detect and estimate the possible spikes for elliptically distributed data. Second, in the context of a factor model, by using the multiplier bootstrap procedure via selecting the weights in $D,$ we propose a new algorithm to infer and estimate the number of factors in the factor model. Numerical simulations also confirm the accuracy and powerfulness of our proposed methods and illustrate better performance compared to some existing methods in the literature. △ Less

Submitted 19 April, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: 90 pages, 6 figures, some typos are corrected

arXiv:2302.02457

Scalable inference in functional linear regression with streaming data

Authors: **han Xie, Enze Shi, Peijun Sang, Zuofeng Shang, Bei Jiang, Linglong Kong

Abstract: Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible. In addition, existing inferential tools in online learning are mainly developed for finite-dimensional problems, while inference methods for functional data are focused on… ▽ More Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible. In addition, existing inferential tools in online learning are mainly developed for finite-dimensional problems, while inference methods for functional data are focused on the batch learning setting. In this paper, we tackle these issues by develo** functional stochastic gradient descent algorithms and proposing an online bootstrap resampling procedure to systematically study the inference problem for functional linear regression. In particular, the proposed estimation and inference procedures use only one pass over the data; thus they are easy to implement and suitable to the situation where data arrive in a streaming manner. Furthermore, we establish the convergence rate as well as the asymptotic distribution of the proposed estimator. Meanwhile, the proposed perturbed estimator from the bootstrap procedure is shown to enjoy the same theoretical properties, which provide the theoretical justification for our online inference tool. As far as we know, this is the first inference result on the functional linear regression model with streaming data. Simulation studies are conducted to investigate the finite-sample performance of the proposed procedure. An application is illustrated with the Bei**g multi-site air-quality data. △ Less

Submitted 10 October, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: Due to the request of one of the co-authors, we tentatively withdrew the manuscript

arXiv:2301.09300 [pdf, other]

A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference

Authors: Jianwen Xie, Yaxuan Zhu, Yifei Xu, Dingcheng Li, ** Li

Abstract: We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from… ▽ More We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from the intractable posterior distribution is performed to infer the latent variables for each observed example, so that the parameters of the normalizing flow prior and the generator can be updated with the inferred latent variables. We show that, under the scenario of non-convergent short-run MCMC, the finite step Langevin dynamics is a flow-like approximate inference model and the learning objective actually follows the perturbation of the maximum likelihood estimation (MLE). We further point out that the learning framework seeks to (i) match the latent space normalizing flow and the aggregated posterior produced by the short-run Langevin flow, and (ii) bias the model from MLE such that the short-run Langevin flow inference is close to the true posterior. Empirical results of extensive experiments validate the effectiveness of the proposed latent space normalizing flow model in the tasks of image generation, image reconstruction, anomaly detection, supervised image inpainting and unsupervised image recovery. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI) 2023

arXiv:2211.13748 [pdf, other]

How We Express Ourselves Freely: Censorship, Self-censorship, and Anti-censorship on a Chinese Social Media

Authors: Xiang Chen, Jiamu Xie, Zixin Wang, Bohui Shen, Zhixuan Zhou

Abstract: Censorship, anti-censorship, and self-censorship in an authoritarian regime have been extensively studies, yet the relationship between these intertwined factors is not well understood. In this paper, we report results of a large-scale survey study (N = 526) with Sina Weibo users toward bridging this research gap. Through descriptive statistics, correlation analysis, and regression analysis, we un… ▽ More Censorship, anti-censorship, and self-censorship in an authoritarian regime have been extensively studies, yet the relationship between these intertwined factors is not well understood. In this paper, we report results of a large-scale survey study (N = 526) with Sina Weibo users toward bridging this research gap. Through descriptive statistics, correlation analysis, and regression analysis, we uncover how users are being censored, how and why they conduct self-censorship on different topics and in different scenarios (i.e., post, repost, and comment), and their various anti-censorship strategies. We further identify the metrics of censorship and self-censorship, find the influence factors, and construct a mediation model to measure their relationship. Based on these findings, we discuss implications for democratic social media design and future censorship research. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: iConference 2023 has accepted

arXiv:2210.01342 [pdf, other]

Estimating heterogeneous treatment effects versus building individualized treatment rules: Connection and disconnection

Authors: Zhongyuan Chen, Jun Xie

Abstract: Estimating heterogeneous treatment effects is a well-studied topic in the statistics literature. More recently, it has regained attention due to an increasing need for precision medicine as well as the increased use of state-of-art machine learning methods in the estimation. Furthermore, estimating heterogeneous treatment effects is directly related to building an individualized treatment rule, wh… ▽ More Estimating heterogeneous treatment effects is a well-studied topic in the statistics literature. More recently, it has regained attention due to an increasing need for precision medicine as well as the increased use of state-of-art machine learning methods in the estimation. Furthermore, estimating heterogeneous treatment effects is directly related to building an individualized treatment rule, which is a decision rule of treatment according to patient characteristics. This paper examines the connection and disconnection between these two research problems. Notably, a better estimation of the heterogeneous treatment effects may or may not lead to a better individualized treatment rule. We provide theoretical frameworks to explain the connection and disconnection and demonstrate two different scenarios through simulations. Our conclusion sheds light on a practical guide that under certain circumstances, there is no need to enhance estimation of the treatment effects, as it does not alter the treatment decision. △ Less

Submitted 3 October, 2022; originally announced October 2022.

arXiv:2210.00937 [pdf, ps, other]

Inference on High-dimensional Single-index Models with Streaming Data

Authors: Dongxiao Han, **han Xie, ** Liu, Liuquan Sun, Jian Huang, Bei Jian, Linglong Kong

Abstract: Traditional statistical methods are faced with new challenges due to streaming data. The major challenge is the rapidly growing volume and velocity of data, which makes storing such huge datasets in memory impossible. The paper presents an online inference framework for regression parameters in high-dimensional semiparametric single-index models with unknown link functions. The proposed online pro… ▽ More Traditional statistical methods are faced with new challenges due to streaming data. The major challenge is the rapidly growing volume and velocity of data, which makes storing such huge datasets in memory impossible. The paper presents an online inference framework for regression parameters in high-dimensional semiparametric single-index models with unknown link functions. The proposed online procedure updates only the current data batch and summary statistics of historical data instead of re-accessing the entire raw data set. At the same time, we do not need to estimate the unknown link function, which is a highly challenging task. In addition, a generalized convex loss function is used in the proposed inference procedure. To illustrate the proposed method, we use the Huber loss function and the logistic regression model's negative log-likelihood. In this study, the asymptotic normality of the proposed online debiased Lasso estimators and the bounds of the proposed online Lasso estimators are investigated. To evaluate the performance of the proposed method, extensive simulation studies have been conducted. We provide applications to Nasdaq stock prices and financial distress datasets. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 38 pages, 2 figures

arXiv:2205.06924 [pdf, other]

A Tale of Two Flows: Cooperative Learning of Langevin Flow and Normalizing Flow Toward Energy-Based Model

Authors: Jianwen Xie, Yaxuan Zhu, Jun Li, ** Li

Abstract: This paper studies the cooperative learning of two generative flow models, in which the two models are iteratively updated based on the jointly synthesized examples. The first flow model is a normalizing flow that transforms an initial simple density to a target density by applying a sequence of invertible transformations. The second flow model is a Langevin flow that runs finite steps of gradient… ▽ More This paper studies the cooperative learning of two generative flow models, in which the two models are iteratively updated based on the jointly synthesized examples. The first flow model is a normalizing flow that transforms an initial simple density to a target density by applying a sequence of invertible transformations. The second flow model is a Langevin flow that runs finite steps of gradient-based MCMC toward an energy-based model. We start from proposing a generative framework that trains an energy-based model with a normalizing flow as an amortized sampler to initialize the MCMC chains of the energy-based model. In each learning iteration, we generate synthesized examples by using a normalizing flow initialization followed by a short-run Langevin flow revision toward the current energy-based model. Then we treat the synthesized examples as fair samples from the energy-based model and update the model parameters with the maximum likelihood learning gradient, while the normalizing flow directly learns from the synthesized examples by maximizing the tractable likelihood. Under the short-run non-mixing MCMC scenario, the estimation of the energy-based model is shown to follow the perturbation of maximum likelihood, and the short-run Langevin flow and the normalizing flow form a two-flow generator that we call CoopFlow. We provide an understating of the CoopFlow algorithm by information geometry and show that it is a valid generator as it converges to a moment matching estimator. We demonstrate that the trained CoopFlow is capable of synthesizing realistic images, reconstructing images, and interpolating between images. △ Less

Submitted 13 May, 2022; originally announced May 2022.

Comments: 23 pages

Journal ref: ICLR 2022

arXiv:2204.04567 [pdf, other]

Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification

Authors: Jiangtao Xie, Fei Long, Jiaming Lv, Qilong Wang, Peihua Li

Abstract: Few-shot classification is a challenging problem as only very few training examples are given for each new task. One of the effective research lines to address this challenge focuses on learning deep representations driven by a similarity measure between a query image and few support images of some class. Statistically, this amounts to measure the dependency of image features, viewed as random vec… ▽ More Few-shot classification is a challenging problem as only very few training examples are given for each new task. One of the effective research lines to address this challenge focuses on learning deep representations driven by a similarity measure between a query image and few support images of some class. Statistically, this amounts to measure the dependency of image features, viewed as random vectors in a high-dimensional embedding space. Previous methods either only use marginal distributions without considering joint distributions, suffering from limited representation capability, or are computationally expensive though harnessing joint distributions. In this paper, we propose a deep Brownian Distance Covariance (DeepBDC) method for few-shot classification. The central idea of DeepBDC is to learn image representations by measuring the discrepancy between joint characteristic functions of embedded features and product of the marginals. As the BDC metric is decoupled, we formulate it as a highly modular and efficient layer. Furthermore, we instantiate DeepBDC in two different few-shot classification frameworks. We make experiments on six standard few-shot image benchmarks, covering general object recognition, fine-grained categorization and cross-domain classification. Extensive evaluations show our DeepBDC significantly outperforms the counterparts, while establishing new state-of-the-art results. The source code is available at http://www.peihuali.org/DeepBDC △ Less

Submitted 9 April, 2022; originally announced April 2022.

Comments: Accepted to CVPR 2022 as an oral presentation. Equal contribution from first two authors

arXiv:2110.01873 [pdf, other]

A New Multivariate Predictive Model for Stock Returns

Authors: Jianying Xie

Abstract: One of the most important studies in finance is to find out whether stock returns could be predicted. This research aims to create a new multivariate model, which includes dividend yield, earnings-to-price ratio, book-to-market ratio as well as consumption-wealth ratio as explanatory variables, for future stock returns predictions. The new multivariate model will be assessed for its forecasting pe… ▽ More One of the most important studies in finance is to find out whether stock returns could be predicted. This research aims to create a new multivariate model, which includes dividend yield, earnings-to-price ratio, book-to-market ratio as well as consumption-wealth ratio as explanatory variables, for future stock returns predictions. The new multivariate model will be assessed for its forecasting performance using empirical analysis. The empirical analysis is performed on S&P500 quarterly data from Quarter 1, 1952 to Quarter 4, 2019 as well as S&P500 monthly data from Month 12, 1920 to Month 12, 2019. Results have shown this new multivariate model has predictability for future stock returns. When compared to other benchmark models, the new multivariate model performs the best in terms of the Root Mean Squared Error (RMSE) most of the time. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2108.04364 [pdf, other]

Data-guided Treatment Recommendation with Feature Scores

Authors: Zhongyuan Chen, Ziyi Wang, Qifan Song, Jun Xie

Abstract: Despite the availability of large amounts of genomics data, medical treatment recommendations have not successfully used them. In this paper, we consider the utility of high dimensional genomic-clinical data and nonparametric methods for making cancer treatment recommendations. This builds upon the framework of the individualized treatment rule [Qian and Murphy 2011] but we aim to overcome their m… ▽ More Despite the availability of large amounts of genomics data, medical treatment recommendations have not successfully used them. In this paper, we consider the utility of high dimensional genomic-clinical data and nonparametric methods for making cancer treatment recommendations. This builds upon the framework of the individualized treatment rule [Qian and Murphy 2011] but we aim to overcome their method's limitations, specifically in the instances when the method encounters a large number of covariates and an issue of model misspecification. We tackle this problem using a dimension reduction method, namely Sliced Inverse Regression (SIR, [Li 1991]), with a rich class of models for the treatment response. Notably, SIR defines a feature space for high-dimensional data, offering an advantage similar to those found in the popular neural network models. With the features obtained from SIR, a simple visualization is used to compare different treatment options and present the recommended treatment. Additionally, we derive the consistency and the convergence rate of the proposed recommendation approach through a value function. The effectiveness of the proposed approach is demonstrated through simulation studies and the promising results from a real-data example of the treatment of multiple myeloma. △ Less

Submitted 19 February, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

arXiv:2103.11085 [pdf, other]

Distance Assisted Recursive Testing

Authors: Xuechan Li, Anthony Sung, Jichun Xie

Abstract: In many applications, a large number of features are collected with the goal to identify a few important ones. Sometimes, these features lie in a metric space with a known distance matrix, which partially reflects their co-importance pattern. Proper use of the distance matrix will boost the power of identifying important features. Hence, we develop a new multiple testing framework named the Distan… ▽ More In many applications, a large number of features are collected with the goal to identify a few important ones. Sometimes, these features lie in a metric space with a known distance matrix, which partially reflects their co-importance pattern. Proper use of the distance matrix will boost the power of identifying important features. Hence, we develop a new multiple testing framework named the Distance Assisted Recursive Testing (DART). DART has two stages. In stage 1, we transform the distance matrix into an aggregation tree, where each node represents a set of features. In stage 2, based on the aggregation tree, we set up dynamic node hypotheses and perform multiple testing on the tree. All rejections are mapped back to the features. Under mild assumptions, the false discovery proportion of DART converges to the desired level in high probability converging to one. We illustrate by theory and simulations that DART has superior performance under various models compared to the existing methods. We applied DART to a clinical trial in the allogeneic stem cell transplantation study to identify the gut microbiota whose abundance will be impacted by the after-transplant care. △ Less

Submitted 24 September, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

arXiv:2012.14936 [pdf, other]

Learning Energy-Based Model with Variational Auto-Encoder as Amortized Sampler

Authors: Jianwen Xie, Zilong Zheng, ** Li

Abstract: Due to the intractable partition function, training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo (MCMC) sampling to approximate the gradient of the Kullback-Leibler divergence between data and model distributions. However, it is non-trivial to sample from an EBM because of the difficulty of mixing between modes. In this paper, we propose to learn a variational… ▽ More Due to the intractable partition function, training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo (MCMC) sampling to approximate the gradient of the Kullback-Leibler divergence between data and model distributions. However, it is non-trivial to sample from an EBM because of the difficulty of mixing between modes. In this paper, we propose to learn a variational auto-encoder (VAE) to initialize the finite-step MCMC, such as Langevin dynamics that is derived from the energy function, for efficient amortized sampling of the EBM. With these amortized MCMC samples, the EBM can be trained by maximum likelihood, which follows an "analysis by synthesis" scheme; while the VAE learns from these MCMC samples via variational Bayes. We call this joint training algorithm the variational MCMC teaching, in which the VAE chases the EBM toward data distribution. We interpret the learning algorithm as a dynamic alternating projection in the context of information geometry. Our proposed models can generate samples comparable to GANs and EBMs. Additionally, we demonstrate that our model can learn effective probabilistic distribution toward supervised conditional learning tasks. △ Less

Submitted 23 December, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

Journal ref: Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), pp10441--10451, 2021

arXiv:2011.08595 [pdf, other]

doi 10.1109/TIP.2021.3123555

DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for Uncertainty Inference

Authors: Jiyang Xie, Zhanyu Ma, **g-Hao Xue, Guoqiang Zhang, Jun Guo

Abstract: This paper proposes a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition. In the DS-UI, we combine the classifier of a DNN, i.e., the last fully-connected (FC) layer, with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer. Unlike existing UI methods fo… ▽ More This paper proposes a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition. In the DS-UI, we combine the classifier of a DNN, i.e., the last fully-connected (FC) layer, with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer. Unlike existing UI methods for DNNs, which only calculate the means or modes of the DNN outputs' distributions, the proposed MoGMM-FC layer acts as a probabilistic interpreter for the features that are inputs of the classifier to directly calculate the probability density of them for the DS-UI. In addition, we propose a dual-supervised stochastic gradient-based variational Bayes (DS-SGVB) algorithm for the MoGMM-FC layer optimization. Unlike conventional SGVB and optimization algorithms in other UI methods, the DS-SGVB not only models the samples in the specific class for each Gaussian mixture model (GMM) in the MoGMM, but also considers the negative samples from other classes for the GMM to reduce the intra-class distances and enlarge the inter-class margins simultaneously for enhancing the learning ability of the MoGMM-FC layer in the DS-UI. Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection. We further evaluate the DS-UI in open-set out-of-domain/-distribution detection and find statistically significant improvements. Visualizations of the feature spaces demonstrate the superiority of the DS-UI. △ Less

Submitted 17 November, 2020; originally announced November 2020.

arXiv:2010.05244 [pdf, other]

doi 10.1109/TPAMI.2021.3083089

Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization

Authors: Jiyang Xie, Zhanyu Ma, and Jianjun Lei, Guoqiang Zhang, **g-Hao Xue, Zheng-Hua Tan, Jun Guo

Abstract: Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). We propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. Specifically, the distri… ▽ More Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). We propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. Specifically, the distribution parameters are optimized by stochastic gradient variational Bayes in order to carry out an end-to-end training. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets (five small-scale datasets and two large-scale datasets) with various base models. The advanced dropout outperforms all the referred techniques on all the datasets.We further compare the effectiveness ratios and find that advanced dropout achieves the highest one on most cases. Next, we conduct a set of analysis of dropout rate characteristics, including convergence of the adaptive dropout rate, the learned distributions of dropout masks, and a comparison with dropout rate generation without an explicit distribution. In addition, the ability of overfitting prevention is evaluated and confirmed. Finally, we extend the application of the advanced dropout to uncertainty inference, network pruning, text classification, and regression. The proposed advanced dropout is also superior to the corresponding referred methods. Codes are available at https://github.com/PRIS-CV/AdvancedDropout. △ Less

Submitted 10 August, 2021; v1 submitted 11 October, 2020; originally announced October 2020.

Comments: Accepted by IEEE TPAMI, 2021

arXiv:2008.09763 [pdf, other]

Variational Autoencoder for Anti-Cancer Drug Response Prediction

Authors: Hongyuan Dong, Jiaqing Xie, Zhi **g, Dexin Ren

Abstract: Cancer is a primary cause of human death, but discovering drugs and tailoring cancer therapies are expensive and time-consuming. We seek to facilitate the discovery of new drugs and treatment strategies for cancer using variational autoencoders (VAEs) and multi-layer perceptrons (MLPs) to predict anti-cancer drug responses. Our model takes as input gene expression data of cancer cell lines and ant… ▽ More Cancer is a primary cause of human death, but discovering drugs and tailoring cancer therapies are expensive and time-consuming. We seek to facilitate the discovery of new drugs and treatment strategies for cancer using variational autoencoders (VAEs) and multi-layer perceptrons (MLPs) to predict anti-cancer drug responses. Our model takes as input gene expression data of cancer cell lines and anti-cancer drug molecular data and encodes these data with our {\sc {GeneVae}} model, which is an ordinary VAE model, and a rectified junction tree variational autoencoder ({\sc JTVae}) model, respectively. A multi-layer perceptron processes these encoded features to produce a final prediction. Our tests show our system attains a high average coefficient of determination ($R^{2} = 0.83$) in predicting drug responses for breast cancer cell lines and an average $R^{2} = 0.845$ for pan-cancer cell lines. Additionally, we show that our model can generates effective drug compounds not previously used for specific cancer cell lines. △ Less

Submitted 15 April, 2021; v1 submitted 22 August, 2020; originally announced August 2020.

arXiv:2007.00426 [pdf]

Dynamic Bidding Strategies with Multivariate Feedback Control for Multiple Goals in Display Advertising

Authors: Michael Tashman, Jiayi Xie, John Hoffman, Lee Winikor, Rouzbeh Gerami

Abstract: Real-Time Bidding (RTB) display advertising is a method for purchasing display advertising inventory in auctions that occur within milliseconds. The performance of RTB campaigns is generally measured with a series of Key Performance Indicators (KPIs) - measurements used to ensure that the campaign is cost-effective and that it is purchasing valuable inventory. While an RTB campaign should ideally… ▽ More Real-Time Bidding (RTB) display advertising is a method for purchasing display advertising inventory in auctions that occur within milliseconds. The performance of RTB campaigns is generally measured with a series of Key Performance Indicators (KPIs) - measurements used to ensure that the campaign is cost-effective and that it is purchasing valuable inventory. While an RTB campaign should ideally meet all KPIs, simultaneous improvement tends to be very challenging, as an improvement to any one KPI risks a detrimental effect toward the others. Here we present an approach to simultaneously controlling multiple KPIs with a PID-based feedback-control system. This method generates a control score for each KPI, based on both the output of a PID controller module and a metric that quantifies the importance of each KPI for internal business needs. On regular intervals, this algorithm - Sequential Control - will choose the KPI with the greatest overall need for improvement. In this way, our algorithm is able to continually seek the greatest marginal improvements to its current state. Multiple methods of control can be associated with each KPI, and can be triggered either simultaneously or chosen stochastically, in order to avoid local optima. In both offline ad bidding simulations and testing on live traffic, our methods proved to be effective in simultaneously controlling multiple KPIs, and bringing them toward their respective goals. △ Less

Submitted 1 June, 2020; originally announced July 2020.

arXiv:2006.10259 [pdf, other]

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling

Authors: Ruiqi Gao, Jianwen Xie, Xue-Xin Wei, Song-Chun Zhu, Ying Nian Wu

Abstract: Understanding how grid cells perform path integration calculations remains a fundamental problem. In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector. We identify two conditions on the t… ▽ More Understanding how grid cells perform path integration calculations remains a fundamental problem. In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector. We identify two conditions on the transformation. One is a group representation condition that is necessary for path integration. The other is an isotropic scaling condition that ensures locally conformal embedding, so that the error in the vector representation translates conformally to the error in the 2D self-position. Then we investigate the simplest transformation, i.e., the linear transformation, uncover its explicit algebraic and geometric structure as matrix Lie group of rotation, and explore the connection between the isotropic scaling condition and a special class of hexagon grid patterns. Finally, with our optimization-based approach, we manage to learn hexagon grid patterns that share similar properties of the grid cells in the rodent brain. The learned model is capable of accurate long distance path integration. Code is available at https://github.com/ruiqigao/grid-cell-path. △ Less

Submitted 3 November, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

arXiv:2005.11442 [pdf, other]

Active Learning for Skewed Data Sets

Authors: Abbas Kazerouni, Qi Zhao, **g Xie, Sandeep Tata, Marc Najork

Abstract: Consider a sequential active learning problem where, at each round, an agent selects a batch of unlabeled data points, queries their labels and updates a binary classifier. While there exists a rich body of work on active learning in this general form, in this paper, we focus on problems with two distinguishing characteristics: severe class imbalance (skew) and small amounts of initial training da… ▽ More Consider a sequential active learning problem where, at each round, an agent selects a batch of unlabeled data points, queries their labels and updates a binary classifier. While there exists a rich body of work on active learning in this general form, in this paper, we focus on problems with two distinguishing characteristics: severe class imbalance (skew) and small amounts of initial training data. Both of these problems occur with surprising frequency in many web applications. For instance, detecting offensive or sensitive content in online communities (pornography, violence, and hate-speech) is receiving enormous attention from industry as well as research communities. Such problems have both the characteristics we describe -- a vast majority of content is not offensive, so the number of positive examples for such content is orders of magnitude smaller than the negative examples. Furthermore, there is usually only a small amount of initial training data available when building machine-learned models to solve such problems. To address both these issues, we propose a hybrid active learning algorithm (HAL) that balances exploiting the knowledge available through the currently labeled training examples with exploring the large amount of unlabeled data available. Through simulation results, we show that HAL makes significantly better choices for what points to label when compared to strong baselines like margin-sampling. Classifiers trained on the examples selected for labeling by HAL easily out-perform the baselines on target metrics (like area under the precision-recall curve) given the same budget for labeling examples. We believe HAL offers a simple, intuitive, and computationally tractable way to structure active learning for a wide range of machine learning applications. △ Less

Submitted 22 May, 2020; originally announced May 2020.

arXiv:2003.04575 [pdf, other]

GPCA: A Probabilistic Framework for Gaussian Process Embedded Channel Attention

Authors: Jiyang Xie, Dongliang Chang, Zhanyu Ma, Guoqiang Zhang, Jun Guo

Abstract: Channel attention mechanisms have been commonly applied in many visual tasks for effective performance improvement. It is able to reinforce the informative channels as well as to suppress the useless channels. Recently, different channel attention modules have been proposed and implemented in various ways. Generally speaking, they are mainly based on convolution and pooling operations. In this pap… ▽ More Channel attention mechanisms have been commonly applied in many visual tasks for effective performance improvement. It is able to reinforce the informative channels as well as to suppress the useless channels. Recently, different channel attention modules have been proposed and implemented in various ways. Generally speaking, they are mainly based on convolution and pooling operations. In this paper, we propose Gaussian process embedded channel attention (GPCA) module and further interpret the channel attention schemes in a probabilistic way. The GPCA module intends to model the correlations among the channels, which are assumed to be captured by beta distributed variables. As the beta distribution cannot be integrated into the end-to-end training of convolutional neural networks (CNNs) with a mathematically tractable solution, we utilize an approximation of the beta distribution to solve this problem. To specify, we adapt a Sigmoid-Gaussian approximation, in which the Gaussian distributed variables are transferred into the interval [0,1]. The Gaussian process is then utilized to model the correlations among different channels. In this case, a mathematically tractable solution is derived. The GPCA module can be efficiently implemented and integrated into the end-to-end training of the CNNs. Experimental results demonstrate the promising performance of the proposed GPCA module. Codes are available at https://github.com/PRIS-CV/GPCA. △ Less

Submitted 10 August, 2021; v1 submitted 10 March, 2020; originally announced March 2020.

Comments: Accepted by IEEE TPAMI, 2021

arXiv:2001.06587 [pdf, ps, other]

Scalable Bid Landscape Forecasting in Real-time Bidding

Authors: Aritra Ghosh, Saayan Mitra, Somdeb Sarkhel, Jason Xie, Gang Wu, Viswanathan Swaminathan

Abstract: In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time. The highest bidding advertiser wins but pays only the second-highest bid (known as the winning price). In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder's perspective. However, in a practical setting, with budget constraints, bidding the true v… ▽ More In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time. The highest bidding advertiser wins but pays only the second-highest bid (known as the winning price). In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder's perspective. However, in a practical setting, with budget constraints, bidding the true value is a sub-optimal strategy. Hence, to devise an optimal bidding strategy, it is of utmost importance to learn the winning price distribution accurately. Moreover, a demand-side platform (DSP), which bids on behalf of advertisers, observes the winning price if it wins the auction. For losing auctions, DSPs can only treat its bidding price as the lower bound for the unknown winning price. In literature, typically censored regression is used to model such partially observed data. A common assumption in censored regression is that the winning price is drawn from a fixed variance (homoscedastic) uni-modal distribution (most often Gaussian). However, in reality, these assumptions are often violated. We relax these assumptions and propose a heteroscedastic fully parametric censored regression approach, as well as a mixture density censored network. Our approach not only generalizes censored regression but also provides flexibility to model arbitrarily distributed real-world data. Experimental evaluation on the publicly available dataset for winning price estimation demonstrates the effectiveness of our method. Furthermore, we evaluate our algorithm on one of the largest demand-side platforms and significant improvement has been achieved in comparison with the baseline solutions. △ Less

Submitted 17 January, 2020; originally announced January 2020.

Comments: Appeared in ECML-PKDD 2019

arXiv:1911.11374 [pdf, other]

Representation Learning: A Statistical Perspective

Authors: Jianwen Xie, Ruiqi Gao, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu

Abstract: Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning… ▽ More Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning representations from a statistical perspective. In particular, we review the following two themes: (a) unsupervised learning of vector representations and (b) learning of both vector and matrix representations. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Journal ref: Annual Review of Statistics and Its Application 2020

arXiv:1910.09396 [pdf, ps, other]

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

Authors: Jiahao Xie, Zebang Shen, Chao Zhang, Boyu Wang, Hui Qian

Abstract: This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By e… ▽ More This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-iteration computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts. △ Less

Submitted 23 October, 2019; v1 submitted 21 October, 2019; originally announced October 2019.

Comments: 15 pages, 3 figures

arXiv:1910.09223 [pdf, ps, other]

Aggregated Gradient Langevin Dynamics

Authors: Chao Zhang, Jiahao Xie, Zebang Shen, Peilin Zhao, Tengfei Zhou, Hui Qian

Abstract: In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e.g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the fi… ▽ More In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e.g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the first time that bounds for I/O friendly strategies such as cyclic access and random reshuffle have been established in the MCMC literature. The theoretic results also indicate that methods in AGLD possess the merits of both the low per-iteration computational complexity and the short mixture time. Empirical studies demonstrate that our framework allows to derive novel schemes to generate high-quality samples for large-scale Bayesian posterior learning tasks. △ Less

Submitted 21 October, 2019; originally announced October 2019.

arXiv:1910.02660 [pdf, other]

Deep Kernel Learning via Random Fourier Features

Authors: Jiaxuan Xie, Fanghui Liu, Kaijie Wang, Xiaolin Huang

Abstract: Kernel learning methods are among the most effective learning methods and have been vigorously studied in the past decades. However, when tackling with complicated tasks, classical kernel methods are not flexible or "rich" enough to describe the data and hence could not yield satisfactory performance. In this paper, via Random Fourier Features (RFF), we successfully incorporate the deep architectu… ▽ More Kernel learning methods are among the most effective learning methods and have been vigorously studied in the past decades. However, when tackling with complicated tasks, classical kernel methods are not flexible or "rich" enough to describe the data and hence could not yield satisfactory performance. In this paper, via Random Fourier Features (RFF), we successfully incorporate the deep architecture into kernel learning, which significantly boosts the flexibility and richness of kernel machines while keeps kernels' advantage of pairwise handling small data. With RFF, we could establish a deep structure and make every kernel in RFF layers could be trained end-to-end. Since RFF with different distributions could represent different kernels, our model has the capability of finding suitable kernels for each layer, which is much more flexible than traditional kernel-based methods where the kernel is pre-selected. This fact also helps yield a more sophisticated kernel cascade connection in the architecture. On small datasets (less than 1000 samples), for which deep learning is generally not suitable due to overfitting, our method achieves superior performance compared to advanced kernel methods. On large-scale datasets, including non-image and image classification tasks, our method also has competitive performance. △ Less

Submitted 7 October, 2019; originally announced October 2019.

arXiv:1907.04433 [pdf, other]

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

Authors: Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

Abstract: We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototy** and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customiza… ▽ More We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototy** and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage. △ Less

Submitted 12 February, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

Journal ref: Journal of Machine Learning Research 21 (2020) 1-7

arXiv:1906.07757 [pdf]

TEAM: A Multiple Testing Algorithm on the Aggregation Tree for Flow Cytometry Analysis

Authors: John Pura, Xuechan Li, Cliburn Chan, Jichun Xie

Abstract: In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to pinpoint the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (PDFs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs… ▽ More In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to pinpoint the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (PDFs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs differ. In this paper, we model this comparison as a multiple testing problem. First, we partition the sample space into small bins. In each bin we form a hypothesis to test the existence of differential pdfs. Second, we develop a novel multiple testing method, called TEAM (Testing on the Aggregation tree Method), to identify those bins that harbor differential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fine- to coarse-resolution. The procedure achieves the statistical goal of pinpointing differential pdfs to the smallest possible regions. TEAM is computationally efficient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomeglovirus (CMV)-pp65 antigen stimulation. TEAM successfully identified the monofunctional, bifunctional, and polyfunctional T cells while the competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally efficient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies. △ Less

Submitted 26 March, 2021; v1 submitted 18 June, 2019; originally announced June 2019.

arXiv:1904.12590 [pdf, other]

doi 10.1080/16000870.2019.1697166

Assimilation of semi-qualitative sea ice thickness data with the EnKF-SQ

Authors: Abhishek Shah, Laurent Bertino, Francois Counillon, Mohamad El Gharamti, Abstract: A newly introduced stochastic data assimilation method, the Ensemble Kalman Filter Semi-Qualitative (EnKF-SQ) is applied to a realistic coupled ice-ocean model of the Arctic, the TOPAZ4 configuration, in a twin experiment framework. The method is shown to add value to range-limited thin ice thickness measurements, as obtained from passive microwave remote sensing, with respect to more trivial solu… ▽ More A newly introduced stochastic data assimilation method, the Ensemble Kalman Filter Semi-Qualitative (EnKF-SQ) is applied to a realistic coupled ice-ocean model of the Arctic, the TOPAZ4 configuration, in a twin experiment framework. The method is shown to add value to range-limited thin ice thickness measurements, as obtained from passive microwave remote sensing, with respect to more trivial solutions like neglecting the out-of-range values or assimilating climatology instead. Some known properties inherent to the EnKF-SQ are evaluated: the tendency to draw the solution closer to the thickness threshold, the skewness of the resulting analysis ensemble and the potential appearance of outliers. The experiments show that none of these properties prove deleterious in light of the other sub-optimal characters of the sea ice data assimilation system used here (non-linearities, non-Gaussian variables, lack of strong coupling). The EnKF-SQ has a single tuning parameter that is adjusted for best performance of the system at hand. The sensitivity tests reveal that the results do not depend critically on the choice of this tuning parameter. The EnKF-SQ makes overall a valid approach for assimilating semi-qualitative observations into high-dimensional nonlinear systems. △ Less

Submitted 29 April, 2019; originally announced April 2019.

Comments: 24 pages, 11 figures, research article

arXiv:1904.06836 [pdf, ps, other]

doi 10.1109/TPAMI.2020.2974833

Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization

Authors: Qilong Wang, Jiangtao Xie, Wangmeng Zuo, Lei Zhang, Peihua Li

Abstract: Compared with global average pooling in existing deep convolutional neural networks (CNNs), global covariance pooling can capture richer statistics of deep features, having potential for improving representation and generalization abilities of deep CNNs. However, integration of global covariance pooling into deep CNNs brings two challenges: (1) robust covariance estimation given deep features of h… ▽ More Compared with global average pooling in existing deep convolutional neural networks (CNNs), global covariance pooling can capture richer statistics of deep features, having potential for improving representation and generalization abilities of deep CNNs. However, integration of global covariance pooling into deep CNNs brings two challenges: (1) robust covariance estimation given deep features of high dimension and small sample size; (2) appropriate usage of geometry of covariances. To address these challenges, we propose a global Matrix Power Normalized COVariance (MPN-COV) Pooling. Our MPN-COV conforms to a robust covariance estimator, very suitable for scenario of high dimension and small sample size. It can also be regarded as Power-Euclidean metric between covariances, effectively exploiting their geometry. Furthermore, a global Gaussian embedding network is proposed to incorporate first-order statistics into MPN-COV. For fast training of MPN-COV networks, we implement an iterative matrix square root normalization, avoiding GPU unfriendly eigen-decomposition inherent in MPN-COV. Additionally, progressive 1x1 convolutions and group convolution are introduced to compress covariance representations. The proposed methods are highly modular, readily plugged into existing deep CNNs. Extensive experiments are conducted on large-scale object classification, scene categorization, fine-grained visual recognition and texture classification, showing our methods outperform the counterparts and obtain state-of-the-art performance. △ Less

Submitted 10 August, 2020; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: Accepted to IEEE TPAMI. Code is at http://peihuali.org/MPN-COV/

arXiv:1904.05453 [pdf, other]

Energy-Based Continuous Inverse Optimal Control

Authors: Yifei Xu, Jianwen Xie, Tianyang Zhao, Chris Baker, Yibiao Zhao, Ying Nian Wu

Abstract: The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model, where the observed expert trajectories are assumed to be random samples from a probability density function defined… ▽ More The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model, where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an "analysis by synthesis" scheme, which iterates (1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via back-propagation through time, and (2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose to train the energy-based model simultaneously with a top-down trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of the energy-based model. We demonstrate the proposed methods on autonomous driving tasks, and show that they can learn suitable cost functions for optimal control. △ Less

Submitted 18 April, 2022; v1 submitted 10 April, 2019; originally announced April 2019.

arXiv:1902.03871 [pdf, other]

Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion

Authors: Ruiqi Gao, Jianwen Xie, Siyuan Huang, Yufan Ren, Song-Chun Zhu, Ying Nian Wu

Abstract: This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displace… ▽ More This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene. When the image frame undergoes changes due to local pixel displacements, the vectors are multiplied by the matrices that represent the local displacements. Thus the vector representation is equivariant as it varies according to the local displacements. Our experiments show that our model can learn Gabor-like filter pairs of quadrature phases. The profiles of the learned filters match those of simple cells in Macaque V1. Moreover, we demonstrate that the model can learn to infer local motions in either a supervised or unsupervised manner. With such a simple model, we achieve competitive results on optical flow estimation. △ Less

Submitted 5 April, 2022; v1 submitted 24 January, 2019; originally announced February 2019.

arXiv:1902.02812 [pdf, other]

Cooperative Training of Fast Thinking Initializer and Slow Thinking Solver for Conditional Learning

Authors: Jianwen Xie, Zilong Zheng, Xiaolin Fang, Song-Chun Zhu, Ying Nian Wu

Abstract: This paper studies the problem of learning the conditional distribution of a high-dimensional output given an input, where the output and input may belong to two different domains, e.g., the output is a photo image and the input is a sketch image. We solve this problem by cooperative training of a fast thinking initializer and slow thinking solver. The initializer generates the output directly by… ▽ More This paper studies the problem of learning the conditional distribution of a high-dimensional output given an input, where the output and input may belong to two different domains, e.g., the output is a photo image and the input is a sketch image. We solve this problem by cooperative training of a fast thinking initializer and slow thinking solver. The initializer generates the output directly by a non-linear transformation of the input as well as a noise vector that accounts for latent variability in the output. The slow thinking solver learns an objective function in the form of a conditional energy function, so that the output can be generated by optimizing the objective function, or more rigorously by sampling from the conditional energy-based model. We propose to learn the two models jointly, where the fast thinking initializer serves to initialize the sampling of the slow thinking solver, and the solver refines the initial output by an iterative algorithm. The solver learns from the difference between the refined output and the observed output, while the initializer learns from how the solver refines its initial output. We demonstrate the effectiveness of the proposed method on various conditional learning tasks, e.g., class-to-image generation, image-to-image translation, and image recovery. The advantage of our method over GAN-based methods is that our method is equipped with a slow thinking process that refines the solution guided by a learned objective function. △ Less

Submitted 7 April, 2021; v1 submitted 7 February, 2019; originally announced February 2019.

Comments: 16 pages

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021

arXiv:1812.10587 [pdf, other]

Learning Dynamic Generator Model by Alternating Back-Propagation Through Time

Authors: Jianwen Xie, Ruiqi Gao, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu

Abstract: This paper studies the dynamic generator model for spatial-temporal processes such as dynamic textures and action sequences in video data. In this model, each time frame of the video sequence is generated by a generator model, which is a non-linear transformation of a latent state vector, where the non-linear transformation is parametrized by a top-down neural network. The sequence of latent state… ▽ More This paper studies the dynamic generator model for spatial-temporal processes such as dynamic textures and action sequences in video data. In this model, each time frame of the video sequence is generated by a generator model, which is a non-linear transformation of a latent state vector, where the non-linear transformation is parametrized by a top-down neural network. The sequence of latent state vectors follows a non-linear auto-regressive model, where the state vector of the next frame is a non-linear transformation of the state vector of the current frame as well as an independent noise vector that provides randomness in the transition. The non-linear transformation of this transition model can be parametrized by a feedforward neural network. We show that this model can be learned by an alternating back-propagation through time algorithm that iteratively samples the noise vectors and updates the parameters in the transition model and the generator model. We show that our training method can learn realistic models for dynamic textures and action patterns. △ Less

Submitted 26 December, 2018; originally announced December 2018.

Comments: 10 pages

Journal ref: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI) 2019

arXiv:1812.09203 [pdf]

Pan-Cancer Epigenetic Biomarker Selection from Blood Samples Using SAS

Authors: Xi Chen, ** Xie, Qingcong Yuan

Abstract: A key focus in current cancer research is the discovery of cancer biomarkers that allow earlier detection with high accuracy and lower costs for both patients and hospitals. Blood samples have long been used as a health status indicator, but DNA methylation signatures in blood have not been fully appreciated in cancer research. Historically, analysis of cancer has been conducted directly with the… ▽ More A key focus in current cancer research is the discovery of cancer biomarkers that allow earlier detection with high accuracy and lower costs for both patients and hospitals. Blood samples have long been used as a health status indicator, but DNA methylation signatures in blood have not been fully appreciated in cancer research. Historically, analysis of cancer has been conducted directly with the patient's tumor or related tissues. Such analyses allow physicians to diagnose a patient's health and cancer status; however, physicians must observe certain symptoms that prompt them to use biopsies or imaging to verify the diagnosis. This is a post-hoc approach. Our study will focus on epigenetic information for cancer detection, specifically information about DNA methylation in human peripheral blood samples in cancer discordant monozygotic twin-pairs. This information might be able to help us detect cancer much earlier, before the first symptom appears. Several other types of epigenetic data can also be used, but here we demonstrate the potential of blood DNA methylation data as a biomarker for pan-cancer using SAS 9.3 and SAS EM. We report that 55 methylation CpG sites measurable in blood samples can be used as biomarkers for early cancer detection and classification. △ Less

Submitted 21 December, 2018; originally announced December 2018.

Comments: 9 pages, MWSUG 2018

Report number: HS-45

Journal ref: MWSUG 2018 conference proceedings

arXiv:1812.08674 [pdf, other]

A Method to Facilitate Cancer Detection and Type Classification from Gene Expression Data using a Deep Autoencoder and Neural Network

Authors: Xi Chen, ** Xie, Qingcong Yuan

Abstract: With the increased affordability and availability of whole-genome sequencing, large-scale and high-throughput gene expression is widely used to characterize diseases, including cancers. However, establishing specificity in cancer diagnosis using gene expression data continues to pose challenges due to the high dimensionality and complexity of the data. Here we present models of deep learning (DL)… ▽ More With the increased affordability and availability of whole-genome sequencing, large-scale and high-throughput gene expression is widely used to characterize diseases, including cancers. However, establishing specificity in cancer diagnosis using gene expression data continues to pose challenges due to the high dimensionality and complexity of the data. Here we present models of deep learning (DL) and apply them to gene expression data for the diagnosis and categorization of cancer. In this study, we have developed two DL models using messenger ribonucleic acid (mRNA) datasets available from the Genomic Data Commons repository. Our models achieved 98% accuracy in cancer detection, with false negative and false positive rates below 1.7%. In our results, we demonstrated that 18 out of 32 cancer-ty** classifications achieved more than 90% accuracy. Due to the limitation of a small sample size (less than 50 observations), certain cancers could not achieve a higher accuracy in ty** classification, but still achieved high accuracy for the cancer detection task. To validate our models, we compared them with traditional statistical models. The main advantage of our models over traditional cancer detection is the ability to use data from various cancer types to automatically form features to enhance the detection and diagnosis of a specific cancer type. △ Less

Submitted 20 December, 2018; originally announced December 2018.

Comments: 6 pages

arXiv:1810.05597 [pdf, other]

Learning Grid Cells as Vector Representation of Self-Position Coupled with Matrix Representation of Self-Motion

Authors: Ruiqi Gao, Jianwen Xie, Song-Chun Zhu, Ying Nian Wu

Abstract: This paper proposes a representational model for grid cells. In this model, the 2D self-position of the agent is represented by a high-dimensional vector, and the 2D self-motion or displacement of the agent is represented by a matrix that transforms the vector. Each component of the vector is a unit or a cell. The model consists of the following three sub-models. (1) Vector-matrix multiplication.… ▽ More This paper proposes a representational model for grid cells. In this model, the 2D self-position of the agent is represented by a high-dimensional vector, and the 2D self-motion or displacement of the agent is represented by a matrix that transforms the vector. Each component of the vector is a unit or a cell. The model consists of the following three sub-models. (1) Vector-matrix multiplication. The movement from the current position to the next position is modeled by matrix-vector multiplication, i.e., the vector of the next position is obtained by multiplying the matrix of the motion to the vector of the current position. (2) Magnified local isometry. The angle between two nearby vectors equals the Euclidean distance between the two corresponding positions multiplied by a magnifying factor. (3) Global adjacency kernel. The inner product between two vectors measures the adjacency between the two corresponding positions, which is defined by a kernel function of the Euclidean distance between the two positions. Our representational model has explicit algebra and geometry. It can learn hexagon patterns of grid cells, and it is capable of error correction, path integral and path planning. △ Less

Submitted 24 May, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

arXiv:1808.09011 [pdf, other]

Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures

Authors: Yaowu Liu, Jun Xie

Abstract: Combining individual p-values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher's combination test. In modern large-scale data analysis, correlation and sparsity are common features and efficient computation is a necessary requirement for dealing with massive data. To overcome these challenges, we propose a new test that takes advanta… ▽ More Combining individual p-values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher's combination test. In modern large-scale data analysis, correlation and sparsity are common features and efficient computation is a necessary requirement for dealing with massive data. To overcome these challenges, we propose a new test that takes advantage of the Cauchy distribution. Our test statistic has a very simple form and is defined as a weighted sum of Cauchy transformation of individual p-values. We prove a non-asymptotic result that the tail of the null distribution of our proposed test statistic can be well approximated by a Cauchy distribution under arbitrary dependency structures. Based on this theoretical result, the p-value calculation of our proposed test is not only accurate, but also as simple as the classic z-test or t-test, making our test well suited for analyzing massive data. We further show that the power of the proposed test is asymptotically optimal in a strong sparsity setting. Extensive simulations demonstrate that the proposed test has both strong power against sparse alternatives and a good accuracy with respect to p-value calculations, especially for very small p-values. The proposed test has also been applied to a genome-wide association study of Crohn's disease and compared with several existing tests. △ Less

Submitted 28 November, 2018; v1 submitted 27 August, 2018; originally announced August 2018.

Comments: To appear in Journal of the American Statistical Association, Theory and Methods

arXiv:1808.00961 [pdf, ps, other]

Impacts of Weather Conditions on District Heat System

Authors: Jiyang Xie, Zhanyu Ma, Jun Guo

Abstract: Using artificial neural network for the prediction of heat demand has attracted more and more attention. Weather conditions, such as ambient temperature, wind speed and direct solar irradiance, have been identified as key input parameters. In order to further improve the model accuracy, it is of great importance to understand the influence of different parameters. Based on an Elman neural network… ▽ More Using artificial neural network for the prediction of heat demand has attracted more and more attention. Weather conditions, such as ambient temperature, wind speed and direct solar irradiance, have been identified as key input parameters. In order to further improve the model accuracy, it is of great importance to understand the influence of different parameters. Based on an Elman neural network (ENN), this paper investigates the impact of direct solar irradiance and wind speed on predicting the heat demand of a district heating network. Results show that including wind speed can generally result in a lower overall mean absolute percentage error (MAPE) (6.43%) than including direct solar irradiance (6.47%); while including direct solar irradiance can achieve a lower maximum absolute deviation (71.8%) than including wind speed (81.53%). In addition, even though including both wind speed and direct solar irradiance shows the best overall performance (MAPE=6.35%). △ Less

Submitted 28 January, 2020; v1 submitted 2 August, 2018; originally announced August 2018.

Comments: Technical Report

arXiv:1808.00803 [pdf]

Mobile big data analysis with machine learning

Authors: Jiyang Xie, Zeyu Song, Yupeng Li, Zhanyu Ma

Abstract: This paper investigates to identify the requirement and the development of machine learning-based mobile big data analysis through discussing the insights of challenges in the mobile big data (MBD). Furthermore, it reviews the state-of-the-art applications of data analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently adopted methods of data analysis ar… ▽ More This paper investigates to identify the requirement and the development of machine learning-based mobile big data analysis through discussing the insights of challenges in the mobile big data (MBD). Furthermore, it reviews the state-of-the-art applications of data analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently adopted methods of data analysis are reviewed. Three typical applications of MBD analysis, namely wireless channel modeling, human online and offline behavior analysis, and speech recognition in the internet of vehicles, are introduced respectively. Finally, we summarize the main challenges and future development directions of mobile big data analysis. △ Less

Submitted 2 February, 2020; v1 submitted 2 August, 2018; originally announced August 2018.

Comments: Version 0.1

arXiv:1808.00331 [pdf]

SEA: A Combined Model for Heat Demand Prediction

Authors: Jiyang Xie, Jiaxin Guo, Zhanyu Ma, **g-Hao Xue, Qie Sun, Hailong Li, Jun Guo

Abstract: Heat demand prediction is a prominent research topic in the area of intelligent energy networks. It has been well recognized that periodicity is one of the important characteristics of heat demand. Seasonal-trend decomposition based on LOESS (STL) algorithm can analyze the periodicity of a heat demand series, and decompose the series into seasonal and trend components. Then, predicting the seasona… ▽ More Heat demand prediction is a prominent research topic in the area of intelligent energy networks. It has been well recognized that periodicity is one of the important characteristics of heat demand. Seasonal-trend decomposition based on LOESS (STL) algorithm can analyze the periodicity of a heat demand series, and decompose the series into seasonal and trend components. Then, predicting the seasonal and trend components respectively, and combining their predictions together as the heat demand prediction is a possible way to predict heat demand. In this paper, STL-ENN-ARIMA (SEA), a combined model, was proposed based on the combination of the Elman neural network (ENN) and the autoregressive integrated moving average (ARIMA) model, which are commonly applied to heat demand prediction. ENN and ARIMA are used to predict seasonal and trend components, respectively. Experimental results demonstrate that the proposed SEA model has a promising performance. △ Less

Submitted 28 July, 2018; originally announced August 2018.

arXiv:1807.02795 [pdf, ps, other]

BALSON: Bayesian Least Squares Optimization with Nonnegative L1-Norm Constraint

Authors: Jiyang Xie, Zhanyu Ma, Guoqiang Zhang, **g-Hao Xue, Jen-Tzung Chien, Zhiqing Lin, Jun Guo

Abstract: A Bayesian approach termed BAyesian Least Squares Optimization with Nonnegative L1-norm constraint (BALSON) is proposed. The error distribution of data fitting is described by Gaussian likelihood. The parameter distribution is assumed to be a Dirichlet distribution. With the Bayes rule, searching for the optimal parameters is equivalent to finding the mode of the posterior distribution. In order t… ▽ More A Bayesian approach termed BAyesian Least Squares Optimization with Nonnegative L1-norm constraint (BALSON) is proposed. The error distribution of data fitting is described by Gaussian likelihood. The parameter distribution is assumed to be a Dirichlet distribution. With the Bayes rule, searching for the optimal parameters is equivalent to finding the mode of the posterior distribution. In order to explicitly characterize the nonnegative L1-norm constraint of the parameters, we further approximate the true posterior distribution by a Dirichlet distribution. We estimate the statistics of the approximating Dirichlet posterior distribution by sampling methods. Four sampling methods have been introduced. With the estimated posterior distributions, the original parameters can be effectively reconstructed in polynomial fitting problems, and the BALSON framework is found to perform better than conventional methods. △ Less

Submitted 8 July, 2018; originally announced July 2018.

arXiv:1612.08365 [pdf, other]

Optimal model averaging forecasting in high-dimensional survival analysis

Authors: Xiaodong Yan, Hongni Wang, Wei Wang, **han Xie, Yanyan Ren, Xinjun Wang

Abstract: This article considers ultrahigh-dimensional forecasting problems with survival response variables. We propose a two-step model averaging procedure for improving the forecasting accuracy of the true conditional mean of a survival response variable. The first step is to construct a class of candidate models, each with low-dimensional covariates. For this, a feature screening procedure is developed… ▽ More This article considers ultrahigh-dimensional forecasting problems with survival response variables. We propose a two-step model averaging procedure for improving the forecasting accuracy of the true conditional mean of a survival response variable. The first step is to construct a class of candidate models, each with low-dimensional covariates. For this, a feature screening procedure is developed to separate the active and inactive predictors through a marginal BuckleyCJames index, and to group covariates with a similar index size together to form regression models with survival response variables. The proposed screening method can select active predictors under covariate-dependent censoring, and enjoys sure screening consistency under mild regularity conditions. The second step is to find the optimal model weights for averaging by adapting a delete-one cross-validation criterion, without the standard constraint that the weights sum to one. The theoretical results show that the delete-one cross-validation criterion achieves the lowest possible forecasting loss asymptotically. Numerical studies demonstrate the superior performance of the proposed variable screening and model averaging procedures over existing methods. △ Less

Submitted 24 November, 2022; v1 submitted 26 December, 2016; originally announced December 2016.

arXiv:1609.09408 [pdf, other]

Cooperative Training of Descriptor and Generator Networks

Authors: Jianwen Xie, Yang Lu, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu

Abstract: This paper studies the cooperative training of two generative models for image modeling and synthesis. Both models are parametrized by convolutional neural networks (ConvNets). The first model is a deep energy-based model, whose energy function is defined by a bottom-up ConvNet, which maps the observed image to the energy. We call it the descriptor network. The second model is a generator network,… ▽ More This paper studies the cooperative training of two generative models for image modeling and synthesis. Both models are parametrized by convolutional neural networks (ConvNets). The first model is a deep energy-based model, whose energy function is defined by a bottom-up ConvNet, which maps the observed image to the energy. We call it the descriptor network. The second model is a generator network, which is a non-linear version of factor analysis. It is defined by a top-down ConvNet, which maps the latent factors to the observed image. The maximum likelihood learning algorithms of both models involve MCMC sampling such as Langevin dynamics. We observe that the two learning algorithms can be seamlessly interwoven into a cooperative learning algorithm that can train both models simultaneously. Specifically, within each iteration of the cooperative learning algorithm, the generator model generates initial synthesized examples to initialize a finite-step MCMC that samples and trains the energy-based descriptor model. After that, the generator model learns from how the MCMC changes its synthesized examples. That is, the descriptor model teaches the generator model by MCMC, so that the generator model accumulates the MCMC transitions and reproduces them by direct ancestral sampling. We call this scheme MCMC teaching. We show that the cooperative algorithm can learn highly realistic generative models. △ Less

Submitted 29 October, 2018; v1 submitted 29 September, 2016; originally announced September 2016.

Comments: 18 pages

arXiv:1609.04778 [pdf]

doi 10.1111/rssb.12288

False Discovery Rate Control for High-Dimensional Networks of Quantile Associations Conditioning on Covariates

Authors: Jichun Xie, Ruosha Li

Abstract: Motivated by the gene co-expression pattern analysis, we propose a novel sample quantile-based contingency (squac) statistic to infer quantile associations conditioning on covariates. It features enhanced flexibility in handling variables with both arbitrary distributions and complex association patterns conditioning on covariates. We first derive its asymptotic null distribution, and then develop… ▽ More Motivated by the gene co-expression pattern analysis, we propose a novel sample quantile-based contingency (squac) statistic to infer quantile associations conditioning on covariates. It features enhanced flexibility in handling variables with both arbitrary distributions and complex association patterns conditioning on covariates. We first derive its asymptotic null distribution, and then develop a multiple testing procedure based on squac to simultaneously test the independence between one pair of variables conditioning on covariates for all $p(p-1)/2$ pairs. Here, $p$ is the length of the outcomes and could exceed the sample size. The testing procedure does not require resampling or perturbation, and thus is computationally efficient. We prove by theory and numerical experiments that this testing method asymptotically controls the false discovery rate (\FDR). It outperforms all alternative methods when the complex association panterns exist. Applied to a gastric cancer data, this testing method successfully inferred the gene co-expression networks of early and late stage patients. It identified more changes in the networks which are associated with cancer survivals. We extend our method to the case that both the length of the outcomes and the length of covariates exceed the sample size, and show that the asymptotic theory still holds. △ Less

Submitted 23 August, 2018; v1 submitted 15 September, 2016; originally announced September 2016.

Comments: 31 pages, 1 figure

arXiv:1607.00435 [pdf, other]

Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA), and Sparse Coding Algorithms

Authors: Jianwen Xie, Pamela K. Douglas, Ying Nian Wu, Arthur L. Brody, Ariana E. Anderson

Abstract: Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet mathematical constraints such as sparse coding and positivity both provide alternate biologically-plausible frameworks for generating brain networks. Non-negative Matrix Factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms ($L1$ Regula… ▽ More Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet mathematical constraints such as sparse coding and positivity both provide alternate biologically-plausible frameworks for generating brain networks. Non-negative Matrix Factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms ($L1$ Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks for different constraints are used as basis functions to encode the observed functional activity at a given time point. These encodings are decoded using machine learning to compare both the algorithms and their assumptions, using the time series weights to predict whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. For classifying cognitive activity, the sparse coding algorithm of $L1$ Regularized Learning consistently outperformed 4 variations of ICA across different numbers of networks and noise levels (p$<$0.001). The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy. Within each algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p$<$0.001). The success of sparse coding algorithms may suggest that algorithms which enforce sparse coding, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA. △ Less

Submitted 1 July, 2016; originally announced July 2016.

Showing 1–50 of 56 results for author: Xie, J