Search | arXiv e-print repository

Geodesic Optimal Transport Regression

Abstract: Classical regression models do not cover non-Euclidean data that reside in a general metric space, while the current literature on non-Euclidean regression by and large has focused on scenarios where either predictors or responses are random objects, i.e., non-Euclidean, but not both. In this paper we propose geodesic optimal transport regression models for the case where both predictors and respo… ▽ More Classical regression models do not cover non-Euclidean data that reside in a general metric space, while the current literature on non-Euclidean regression by and large has focused on scenarios where either predictors or responses are random objects, i.e., non-Euclidean, but not both. In this paper we propose geodesic optimal transport regression models for the case where both predictors and responses lie in a common geodesic metric space and predictors may include not only one but also several random objects. This provides an extension of classical multiple regression to the case where both predictors and responses reside in non-Euclidean metric spaces, a scenario that has not been considered before. It is based on the concept of optimal geodesic transports, which we define as an extension of the notion of optimal transports in distribution spaces to more general geodesic metric spaces, where we characterize optimal transports as transports along geodesics. The proposed regression models cover the relation between non-Euclidean responses and vectors of non-Euclidean predictors in many spaces of practical statistical interest. These include one-dimensional distributions viewed as elements of the 2-Wasserstein space and multidimensional distributions with the Fisher-Rao metric that are represented as data on the Hilbert sphere. Also included are data on finite-dimensional Riemannian manifolds, with an emphasis on spheres, covering directional and compositional data, as well as data that consist of symmetric positive definite matrices. We illustrate the utility of geodesic optimal transport regression with data on summer temperature distributions and human mortality. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.08075 [pdf, other]

TERM Model: Tensor Ring Mixture Model for Density Estimation

Authors: Ruituo Wu, Jiani Liu, Ce Zhu, Anh-Huy Phan, Ivan V. Oseledets, Yipeng Liu

Abstract: Efficient probability density estimation is a core challenge in statistical machine learning. Tensor-based probabilistic graph methods address interpretability and stability concerns encountered in neural network approaches. However, a substantial number of potential tensor permutations can lead to a tensor network with the same structure but varying expressive capabilities. In this paper, we take… ▽ More Efficient probability density estimation is a core challenge in statistical machine learning. Tensor-based probabilistic graph methods address interpretability and stability concerns encountered in neural network approaches. However, a substantial number of potential tensor permutations can lead to a tensor network with the same structure but varying expressive capabilities. In this paper, we take tensor ring decomposition for density estimator, which significantly reduces the number of permutation candidates while enhancing expressive capability compared with existing used decompositions. Additionally, a mixture model that incorporates multiple permutation candidates with adaptive weights is further designed, resulting in increased expressive flexibility and comprehensiveness. Different from the prevailing directions of tensor network structure/permutation search, our approach provides a new viewpoint inspired by ensemble learning. This approach acknowledges that suboptimal permutations can offer distinctive information besides that of optimal permutations. Experiments show the superiority of the proposed approach in estimating probability density for moderately dimensional datasets and sampling to capture intricate details. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2308.11419 [pdf]

doi 10.1561/2200000087.

Tensor Regression

Authors: Jiani Liu, Ce Zhu, Zhen Long, Yipeng Liu

Abstract: Regression analysis is a key area of interest in the field of data analysis and machine learning which is devoted to exploring the dependencies between variables, often using vectors. The emergence of high dimensional data in technologies such as neuroimaging, computer vision, climatology and social networks, has brought challenges to traditional data representation methods. Tensors, as high dimen… ▽ More Regression analysis is a key area of interest in the field of data analysis and machine learning which is devoted to exploring the dependencies between variables, often using vectors. The emergence of high dimensional data in technologies such as neuroimaging, computer vision, climatology and social networks, has brought challenges to traditional data representation methods. Tensors, as high dimensional extensions of vectors, are considered as natural representations of high dimensional data. In this book, the authors provide a systematic study and analysis of tensor-based regression models and their applications in recent years. It groups and illustrates the existing tensor-based regression methods and covers the basics, core ideas, and theoretical characteristics of most tensor-based regression methods. In addition, readers can learn how to use existing tensor-based regression methods to solve specific regression tasks with multiway data, what datasets can be selected, and what software packages are available to start related work as soon as possible. Tensor Regression is the first thorough overview of the fundamentals, motivations, popular algorithms, strategies for efficient implementation, related applications, available datasets, and software resources for tensor-based regression analysis. It is essential reading for all students, researchers and practitioners of working on high dimensional data. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 187 pages, 32 figures, 10 tables

Journal ref: Foundations and Trends in Machine Learning: Vol. 14: No. 4, pp 379-565 (2021)

arXiv:2307.04318 [pdf, other]

Two-Sample and Change-Point Inference for Non-Euclidean Valued Time Series

Authors: Feiyu Jiang, Changbo Zhu, Xiaofeng Shao

Abstract: Data objects taking value in a general metric space have become increasingly common in modern data analysis. In this paper, we study two important statistical inference problems, namely, two-sample testing and change-point detection, for such non-Euclidean data under temporal dependence. Typical examples of non-Euclidean valued time series include yearly mortality distributions, time-varying netwo… ▽ More Data objects taking value in a general metric space have become increasingly common in modern data analysis. In this paper, we study two important statistical inference problems, namely, two-sample testing and change-point detection, for such non-Euclidean data under temporal dependence. Typical examples of non-Euclidean valued time series include yearly mortality distributions, time-varying networks, and covariance matrix time series. To accommodate unknown temporal dependence, we advance the self-normalization (SN) technique (Shao, 2010) to the inference of non-Euclidean time series, which is substantially different from the existing SN-based inference for functional time series that reside in Hilbert space (Zhang et al., 2011). Theoretically, we propose new regularity conditions that could be easier to check than those in the recent literature, and derive the limiting distributions of the proposed test statistics under both null and local alternatives. For change-point detection problem, we also derive the consistency for the change-point location estimator, and combine our proposed change-point test with wild binary segmentation to perform multiple change-point estimation. Numerical simulations demonstrate the effectiveness and robustness of our proposed tests compared with existing methods in the literature. Finally, we apply our tests to two-sample inference in mortality data and change-point detection in cryptocurrency data. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2211.17132 [pdf, other]

Targets in Reinforcement Learning to solve Stackelberg Security Games

Authors: Saptarashmi Bandyopadhyay, Chenqi Zhu, Philip Daniel, Joshua Morrison, Ethan Shay, John Dickerson

Abstract: Reinforcement Learning (RL) algorithms have been successfully applied to real world situations like illegal smuggling, poaching, deforestation, climate change, airport security, etc. These scenarios can be framed as Stackelberg security games (SSGs) where defenders and attackers compete to control target resources. The algorithm's competency is assessed by which agent is controlling the targets. T… ▽ More Reinforcement Learning (RL) algorithms have been successfully applied to real world situations like illegal smuggling, poaching, deforestation, climate change, airport security, etc. These scenarios can be framed as Stackelberg security games (SSGs) where defenders and attackers compete to control target resources. The algorithm's competency is assessed by which agent is controlling the targets. This review investigates modeling of SSGs in RL with a focus on possible improvements of target representations in RL algorithms. △ Less

Submitted 30 November, 2022; originally announced November 2022.

Comments: Appears in Proceedings of AAAI FSS-22 Symposium "Lessons Learned for Autonomous Assessment of Machine Abilities (LLAAMA)"

arXiv:2210.12874 [pdf, other]

Global Contrastive Batch Sampling via Optimization on Sample Permutations

Authors: Vin Sachidananda, Ziyi Yang, Chenguang Zhu

Abstract: Contrastive Learning has recently achieved state-of-the-art performance in a wide range of tasks. Many contrastive learning approaches use mined hard negatives to make batches more informative during training but these approaches are inefficient as they increase epoch length proportional to the number of mined negatives and require frequent updates of nearest neighbor indices or mining from recent… ▽ More Contrastive Learning has recently achieved state-of-the-art performance in a wide range of tasks. Many contrastive learning approaches use mined hard negatives to make batches more informative during training but these approaches are inefficient as they increase epoch length proportional to the number of mined negatives and require frequent updates of nearest neighbor indices or mining from recent batches. In this work, we provide an alternative to hard negative mining, Global Contrastive Batch Sampling (GCBS), an efficient approximation to the batch assignment problem that upper bounds the gap between the global and training losses, $\mathcal{L}^{Global} - \mathcal{L}^{Train}$, in contrastive learning settings. Through experimentation we find GCBS improves state-of-the-art performance in sentence embedding and code-search tasks. Additionally, GCBS is easy to implement as it requires only a few additional lines of code, does not maintain external data structures such as nearest neighbor indices, is more computationally efficient than the most minimal hard negative mining approaches, and makes no changes to the model being trained. △ Less

Submitted 7 June, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: ICML 2023; 21 pages, 7 figures

arXiv:2207.00941 [pdf, other]

Testing Homogeneity: The Trouble with Sparse Functional Data

Authors: Changbo Zhu, Jane-Ling Wang

Abstract: Testing the homogeneity between two samples of functional data is an important task. While this is feasible for intensely measured functional data, we explain why it is challenging for sparsely measured functional data and show what can be done for such data. In particular, we show that testing the marginal homogeneity based on point-wise distributions is feasible under some constraints and propos… ▽ More Testing the homogeneity between two samples of functional data is an important task. While this is feasible for intensely measured functional data, we explain why it is challenging for sparsely measured functional data and show what can be done for such data. In particular, we show that testing the marginal homogeneity based on point-wise distributions is feasible under some constraints and propose a new two sample statistic that works well with both intensively and sparsely measured functional data. The proposed test statistic is formulated upon Energy distance, and the critical value is obtained via the permutation test. The convergence rate of the test statistic to its population version is derived along with the consistency of the associated permutation test. To the best of our knowledge, this is the first paper that provides guaranteed consistency for testing the homogeneity for sparse functional data. The aptness of our method is demonstrated on both synthetic and real data sets. △ Less

Submitted 2 July, 2022; originally announced July 2022.

arXiv:2203.12783 [pdf, other]

Spherical Autoregressive Models, With Application to Distributional and Compositional Time Series

Authors: Changbo Zhu, Hans-Georg Müller

Abstract: We introduce a new class of autoregressive models for spherical time series, where the dimension of the spheres on which the observations of the time series are situated may be finite-dimensional or infinite-dimensional as in the case of a general Hilbert sphere. Spherical time series arise in various settings. We focus here on distributional and compositional time series. Applying a square root t… ▽ More We introduce a new class of autoregressive models for spherical time series, where the dimension of the spheres on which the observations of the time series are situated may be finite-dimensional or infinite-dimensional as in the case of a general Hilbert sphere. Spherical time series arise in various settings. We focus here on distributional and compositional time series. Applying a square root transformation to the densities of the observations of a distributional time series maps the distributional observations to the Hilbert sphere, equipped with the Fisher-Rao metric. Likewise, applying a square root transformation to the components of the observations of a compositional time series maps the compositional observations to a finite-dimensional sphere, equipped with the geodesic metric on spheres. The challenge in modeling such time series lies in the intrinsic non-linearity of spheres and Hilbert spheres, where conventional arithmetic operations such as addition or scalar multiplication are no longer available. To address this difficulty, we consider rotation operators to map observations on the sphere. Specifically, we introduce a class of skew-symmetric operator such that the associated exponential operators are rotation operators that for each given pair of points on the sphere map one of the points to the other one. We exploit the fact that the space of skew-symmetric operators is Hilbertian to develop autoregressive modeling of geometric differences that correspond to rotations of spherical and distributional time series. Motivating data for our methods include a time series of yearly observations of bivariate distributions of the minimum/maximum temperatures for a period of 120 days during each summer for the years 1990-2018 at Los Angeles (LAX) and John F. Kennedy (JFK) international airports. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2110.14363 [pdf, other]

VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

Authors: Mucong Ding, Kezhi Kong, **gling Li, Chen Zhu, John P Dickerson, Furong Huang, Tom Goldstein

Abstract: Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the "neighbor explosion" problem by considering only a small subset of messages passed to the nodes in a… ▽ More Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the "neighbor explosion" problem by considering only a small subset of messages passed to the nodes in a mini-batch. However, sampling-based methods are difficult to apply to GNNs that utilize many-hops-away or global context each layer, show unstable performance for different tasks and datasets, and do not speed up model inference. We propose a principled and fundamentally different approach, VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. In contrast to sampling-based techniques, our approach can effectively preserve all the messages passed to a mini-batch of nodes by learning and updating a small number of quantized reference vectors of global node representations, using VQ within each GNN layer. Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix. We show that such a compact low-rank version of the gigantic convolution matrix is sufficient both theoretically and experimentally. In company with VQ, we design a novel approximated message passing algorithm and a nontrivial back-propagation rule for our framework. Experiments on various types of GNN backbones demonstrate the scalability and competitive performance of our framework on large-graph node classification and link prediction benchmarks. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021

arXiv:2105.05439 [pdf, other]

Autoregressive Optimal Transport Models

Authors: Changbo Zhu, Hans-Georg Müller

Abstract: Series of univariate distributions indexed by equally spaced time points are ubiquitous in applications and their analysis constitutes one of the challenges of the emerging field of distributional data analysis. To quantify such distributional time series, we propose a class of intrinsic autoregressive models that operate in the space of optimal transport maps. The autoregressive transport models… ▽ More Series of univariate distributions indexed by equally spaced time points are ubiquitous in applications and their analysis constitutes one of the challenges of the emerging field of distributional data analysis. To quantify such distributional time series, we propose a class of intrinsic autoregressive models that operate in the space of optimal transport maps. The autoregressive transport models that we introduce here are based on regressing optimal transport maps on each other, where predictors can be transport maps from an overall barycenter to a current distribution or transport maps between past consecutive distributions of the distributional time series. Autoregressive transport models and their associated distributional regression models specify the link between predictor and response transport maps by moving along geodesics in Wasserstein space. These models emerge as natural extensions of the classical autoregressive models in Euclidean space. Unique stationary solutions of autoregressive transport models are shown to exist under a geometric moment contraction condition of Wu and Shao (2004), using properties of iterated random functions. We also discuss an extension to a varying coefficient model for first order autoregressive transport models. In addition to simulations, the proposed models are illustrated with distributional time series of house prices across U.S. counties and annual summer temperature distributions. △ Less

Submitted 19 May, 2023; v1 submitted 12 May, 2021; originally announced May 2021.

arXiv:2104.08894 [pdf, other]

The Intrinsic Dimension of Images and Its Impact on Learning

Authors: Phillip Pope, Chen Zhu, Ahmed Abdelkader, Micah Goldblum, Tom Goldstein

Abstract: It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. This idea underlies a common intuition for the remarkable success of deep learning in computer vision. In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning. We… ▽ More It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. This idea underlies a common intuition for the remarkable success of deep learning in computer vision. In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning. We find that common natural image datasets indeed have very low intrinsic dimension relative to the high number of pixels in the images. Additionally, we find that low dimensional datasets are easier for neural networks to learn, and models solving these tasks generalize better from training to test data. Along the way, we develop a technique for validating our dimension estimation tools on synthetic data generated by GANs allowing us to actively manipulate the intrinsic dimension by controlling the image generation process. Code for our experiments may be found here https://github.com/ppope/dimensions. △ Less

Submitted 18 April, 2021; originally announced April 2021.

Comments: To appear at ICLR 2021 (spotlight), 17 pages with appendix, 15 figures

ACM Class: I.2.6; I.5.1

arXiv:2103.04413 [pdf, other]

Esca** Saddle Points with Stochastically Controlled Stochastic Gradient Methods

Authors: Guannan Liang, Qianqian Tong, Chunjiang Zhu, **bo Bi

Abstract: Stochastically controlled stochastic gradient (SCSG) methods have been proved to converge efficiently to first-order stationary points which, however, can be saddle points in nonconvex optimization. It has been observed that a stochastic gradient descent (SGD) step introduces anistropic noise around saddle points for deep learning and non-convex half space learning problems, which indicates that S… ▽ More Stochastically controlled stochastic gradient (SCSG) methods have been proved to converge efficiently to first-order stationary points which, however, can be saddle points in nonconvex optimization. It has been observed that a stochastic gradient descent (SGD) step introduces anistropic noise around saddle points for deep learning and non-convex half space learning problems, which indicates that SGD satisfies the correlated negative curvature (CNC) condition for these problems. Therefore, we propose to use a separate SGD step to help the SCSG method escape from strict saddle points, resulting in the CNC-SCSG method. The SGD step plays a role similar to noise injection but is more stable. We prove that the resultant algorithm converges to a second-order stationary point with a convergence rate of $\tilde{O}( ε^{-2} log( 1/ε))$ where $ε$ is the pre-specified error tolerance. This convergence rate is independent of the problem dimension, and is faster than that of CNC-SGD. A more general framework is further designed to incorporate the proposed CNC-SCSG into any first-order method for the method to escape saddle points. Simulation studies illustrate that the proposed algorithm can escape saddle points in much fewer epochs than the gradient descent methods perturbed by either noise injection or a SGD step. △ Less

Submitted 23 April, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

arXiv:2010.09891 [pdf, other]

Robust Optimization as Data Augmentation for Large-scale Graphs

Authors: Kezhi Kong, Guohao Li, Mucong Ding, Zuxuan Wu, Chen Zhu, Bernard Ghanem, Gavin Taylor, Tom Goldstein

Abstract: Data augmentation helps neural networks generalize better by enlarging the training set, but it remains an open question how to effectively augment graph data to enhance the performance of GNNs (Graph Neural Networks). While most existing graph regularizers focus on manipulating graph topological structures by adding/removing edges, we offer a method to augment node features for better performance… ▽ More Data augmentation helps neural networks generalize better by enlarging the training set, but it remains an open question how to effectively augment graph data to enhance the performance of GNNs (Graph Neural Networks). While most existing graph regularizers focus on manipulating graph topological structures by adding/removing edges, we offer a method to augment node features for better performance. We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. By making the model invariant to small fluctuations in input data, our method helps models generalize to out-of-distribution samples and boosts model performance at test time. FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks. FLAG is also highly flexible and scalable, and is deployable with arbitrary GNN backbones and large-scale datasets. We demonstrate the efficacy and stability of our method through extensive experiments and ablation studies. We also provide intuitive observations for a deeper understanding of our method. △ Less

Submitted 29 March, 2022; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: Accepted at CVPR 2022

arXiv:2007.13067 [pdf, other]

doi 10.1016/j.ins.2020.12.073

Deep Embedded Multi-view Clustering with Collaborative Training

Authors: Jie Xu, Yazhou Ren, Guofeng Li, Lili Pan, Ce Zhu, Zenglin Xu

Abstract: Multi-view clustering has attracted increasing attentions recently by utilizing information from multiple views. However, existing multi-view clustering methods are either with high computation and space complexities, or lack of representation capability. To address these issues, we propose deep embedded multi-view clustering with collaborative training (DEMVC) in this paper. Firstly, the embedded… ▽ More Multi-view clustering has attracted increasing attentions recently by utilizing information from multiple views. However, existing multi-view clustering methods are either with high computation and space complexities, or lack of representation capability. To address these issues, we propose deep embedded multi-view clustering with collaborative training (DEMVC) in this paper. Firstly, the embedded representations of multiple views are learned individually by deep autoencoders. Then, both consensus and complementary of multiple views are taken into account and a novel collaborative training scheme is proposed. Concretely, the feature representations and cluster assignments of all views are learned collaboratively. A new consistency strategy for cluster centers initialization is further developed to improve the multi-view clustering performance with collaborative training. Experimental results on several popular multi-view datasets show that DEMVC achieves significant improvements over state-of-the-art methods. △ Less

Submitted 26 July, 2020; originally announced July 2020.

arXiv:2007.10720 [pdf, ps, other]

doi 10.1109/TPAMI.2020.3010953

Unsupervised Heterogeneous Coupling Learning for Categorical Representation

Authors: Chengzhang Zhu, Longbing Cao, Jian** Yin

Abstract: Complex categorical data is often hierarchically coupled with heterogeneous relationships between attributes and attribute values and the couplings between objects. Such value-to-object couplings are heterogeneous with complementary and inconsistent interactions and distributions. Limited research exists on unlabeled categorical data representations, ignores the heterogeneous and hierarchical coup… ▽ More Complex categorical data is often hierarchically coupled with heterogeneous relationships between attributes and attribute values and the couplings between objects. Such value-to-object couplings are heterogeneous with complementary and inconsistent interactions and distributions. Limited research exists on unlabeled categorical data representations, ignores the heterogeneous and hierarchical couplings, underestimates data characteristics and complexities, and overuses redundant information, etc. The deep representation learning of unlabeled categorical data is challenging, overseeing such value-to-object couplings, complementarity and inconsistency, and requiring large data, disentanglement, and high computational power. This work introduces a shallow but powerful UNsupervised heTerogeneous couplIng lEarning (UNTIE) approach for representing coupled categorical data by untying the interactions between couplings and revealing heterogeneous distributions embedded in each type of couplings. UNTIE is efficiently optimized w.r.t. a kernel k-means objective function for unsupervised representation learning of heterogeneous and hierarchical value-to-object couplings. Theoretical analysis shows that UNTIE can represent categorical data with maximal separability while effectively represent heterogeneous couplings and disclose their roles in categorical data. The UNTIE-learned representations make significant performance improvement against the state-of-the-art categorical representations and deep representation models on 25 categorical data sets with diversified characteristics. △ Less

Submitted 21 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

arXiv:2007.01055 [pdf, other]

doi 10.1109/TIP.2021.3062195

Bayesian Low Rank Tensor Ring Model for Image Completion

Authors: Zhen Long, Ce Zhu, Jiani Liu, Yipeng Liu

Abstract: Low rank tensor ring model is powerful for image completion which recovers missing entries in data acquisition and transformation. The recently proposed tensor ring (TR) based completion algorithms generally solve the low rank optimization problem by alternating least squares method with predefined ranks, which may easily lead to overfitting when the unknown ranks are set too large and only a few… ▽ More Low rank tensor ring model is powerful for image completion which recovers missing entries in data acquisition and transformation. The recently proposed tensor ring (TR) based completion algorithms generally solve the low rank optimization problem by alternating least squares method with predefined ranks, which may easily lead to overfitting when the unknown ranks are set too large and only a few measurements are available. In this paper, we present a Bayesian low rank tensor ring model for image completion by automatically learning the low rank structure of data. A multiplicative interaction model is developed for the low-rank tensor ring decomposition, where core factors are enforced to be sparse by assuming their entries obey Student-T distribution. Compared with most of the existing methods, the proposed one is free of parameter-tuning, and the TR ranks can be obtained by Bayesian inference. Numerical Experiments, including synthetic data, color images with different sizes and YaleFace dataset B with respect to one pose, show that the proposed approach outperforms state-of-the-art ones, especially in terms of recovery accuracy. △ Less

Submitted 28 June, 2020; originally announced July 2020.

arXiv:2006.14978 [pdf, other]

Online 3D Bin Packing with Constrained Deep Reinforcement Learning

Authors: Hang Zhao, Qi** She, Chenyang Zhu, Yin Yang, Kai Xu

Abstract: We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP). In our problem, the agent has limited information about the items to be packed into the bin, and an item must be packed immediately after its arrival without buffering or readjusting. The item's placement also subjects to the constraints of collision avoidance and physical stability. We formulate this online… ▽ More We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP). In our problem, the agent has limited information about the items to be packed into the bin, and an item must be packed immediately after its arrival without buffering or readjusting. The item's placement also subjects to the constraints of collision avoidance and physical stability. We formulate this online 3D-BPP as a constrained Markov decision process. To solve the problem, we propose an effective and easy-to-implement constrained deep reinforcement learning (DRL) method under the actor-critic framework. In particular, we introduce a feasibility predictor to predict the feasibility mask for the placement actions and use it to modulate the action probabilities output by the actor during training. Such supervisions and transformations to DRL facilitate the agent to learn feasible policies efficiently. Our method can also be generalized e.g., with the ability to handle lookahead or items with different orientations. We have conducted extensive evaluation showing that the learned policy significantly outperforms the state-of-the-art methods. A user study suggests that our method attains a human-level performance. △ Less

Submitted 13 January, 2022; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: AAAI 2021

arXiv:2006.11918 [pdf, ps, other]

MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients

Authors: Chen Zhu, Yu Cheng, Zhe Gan, Furong Huang, **g**g Liu, Tom Goldstein

Abstract: Adaptive gradient methods such as RMSProp and Adam use exponential moving estimate of the squared gradient to compute adaptive step sizes, achieving better convergence than SGD in face of noisy objectives. However, Adam can have undesirable convergence behaviors due to unstable or extreme adaptive learning rates. Methods such as AMSGrad and AdaBound have been proposed to stabilize the adaptive lea… ▽ More Adaptive gradient methods such as RMSProp and Adam use exponential moving estimate of the squared gradient to compute adaptive step sizes, achieving better convergence than SGD in face of noisy objectives. However, Adam can have undesirable convergence behaviors due to unstable or extreme adaptive learning rates. Methods such as AMSGrad and AdaBound have been proposed to stabilize the adaptive learning rates of Adam in the later stage of training, but they do not outperform Adam in some practical tasks such as training Transformers \cite{transformer}. In this paper, we propose an adaptive learning rate principle, in which the running mean of squared gradient in Adam is replaced by a weighted mean, with weights chosen to maximize the estimated variance of each coordinate. This results in a faster adaptation to the local gradient variance, which leads to more desirable empirical convergence behaviors than Adam. We prove the proposed algorithm converges under mild assumptions for nonconvex stochastic optimization problems, and demonstrate the improved efficacy of our adaptive averaging approach on machine translation, natural language understanding and large-batch pretraining of BERT. The code is available at https://github.com/zhuchen03/MaxVA. △ Less

Submitted 4 July, 2021; v1 submitted 21 June, 2020; originally announced June 2020.

Comments: ECML PKDD 2021

arXiv:2005.14359 [pdf, other]

Unsupervised Feature Selection via Multi-step Markov Transition Probability

Authors: Yan Min, Mao Ye, Liang Tian, Yulin Jian, Ce Zhu, Shangming Yang

Abstract: Feature selection is a widely used dimension reduction technique to select feature subsets because of its interpretability. Many methods have been proposed and achieved good results, in which the relationships between adjacent data points are mainly concerned. But the possible associations between data pairs that are may not adjacent are always neglected. Different from previous methods, we propos… ▽ More Feature selection is a widely used dimension reduction technique to select feature subsets because of its interpretability. Many methods have been proposed and achieved good results, in which the relationships between adjacent data points are mainly concerned. But the possible associations between data pairs that are may not adjacent are always neglected. Different from previous methods, we propose a novel and very simple approach for unsupervised feature selection, named MMFS (Multi-step Markov transition probability for Feature Selection). The idea is using multi-step Markov transition probability to describe the relation between any data pair. Two ways from the positive and negative viewpoints are employed respectively to keep the data structure after feature selection. From the positive viewpoint, the maximum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. Then, the features which can keep the compact data structure are selected. From the viewpoint of negative, the minimum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. On the contrary, the features that least maintain the loose data structure are selected. And the two ways can also be combined. Thus three algorithms are proposed. Our main contributions are a novel feature section approach which uses multi-step transition probability to characterize the data structure, and three algorithms proposed from the positive and negative aspects for kee** data structure. The performance of our approach is compared with the state-of-the-art methods on eight real-world data sets, and the experimental results show that the proposed MMFS is effective in unsupervised feature selection. △ Less

Submitted 28 May, 2020; originally announced May 2020.

arXiv:2004.14088 [pdf, other]

Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting

Authors: Guanhua Zhang, Bing Bai, Junqi Zhang, Kun Bai, Conghui Zhu, Tiejun Zhao

Abstract: With the recent proliferation of the use of text classifications, researchers have found that there are certain unintended biases in text classification datasets. For example, texts containing some demographic identity-terms (e.g., "gay", "black") are more likely to be abusive in existing abusive language detection datasets. As a result, models trained with these datasets may consider sentences li… ▽ More With the recent proliferation of the use of text classifications, researchers have found that there are certain unintended biases in text classification datasets. For example, texts containing some demographic identity-terms (e.g., "gay", "black") are more likely to be abusive in existing abusive language detection datasets. As a result, models trained with these datasets may consider sentences like "She makes me happy to be gay" as abusive simply because of the word "gay." In this paper, we formalize the unintended biases in text classification datasets as a kind of selection bias from the non-discrimination distribution to the discrimination distribution. Based on this formalization, we further propose a model-agnostic debiasing training framework by recovering the non-discrimination distribution using instance weighting, which does not require any extra resources or annotations apart from a pre-defined set of demographic identity-terms. Experiments demonstrate that our method can effectively alleviate the impacts of the unintended biases without significantly hurting models' generalization ability. △ Less

Submitted 20 August, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: Accepted by ACL 2020

arXiv:2004.09007 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053181

Headless Horseman: Adversarial Attacks on Transfer Learning Models

Authors: Ahmed Abdelkader, Michael J. Curry, Liam Fowl, Tom Goldstein, Avi Schwarzschild, Manli Shu, Christoph Studer, Chen Zhu

Abstract: Transfer learning facilitates the training of task-specific classifiers using pre-trained models as feature extractors. We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these \emph{headless attacks}. We first demonstrate successful transfer attacks against a victim network using \textit{only} its feature… ▽ More Transfer learning facilitates the training of task-specific classifiers using pre-trained models as feature extractors. We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these \emph{headless attacks}. We first demonstrate successful transfer attacks against a victim network using \textit{only} its feature extractor. This motivates the introduction of a label-blind adversarial attack. This transfer attack method does not require any information about the class-label space of the victim. Our attack lowers the accuracy of a ResNet18 trained on CIFAR10 by over 40\%. △ Less

Submitted 19 April, 2020; originally announced April 2020.

Comments: 5 pages, 2 figures. Accepted in ICASSP 2020. Code available on https://github.com/zhuchen03/headless-attack.git

arXiv:2003.11235 [pdf, other]

AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction

Authors: Bin Liu, Chenxu Zhu, Guilin Li, Weinan Zhang, **cai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, Yong Yu

Abstract: Learning feature interactions is crucial for click-through rate (CTR) prediction in recommender systems. In most existing deep learning models, feature interactions are either manually designed or simply enumerated. However, enumerating all feature interactions brings large memory and computation cost. Even worse, useless interactions may introduce noise and complicate the training process. In thi… ▽ More Learning feature interactions is crucial for click-through rate (CTR) prediction in recommender systems. In most existing deep learning models, feature interactions are either manually designed or simply enumerated. However, enumerating all feature interactions brings large memory and computation cost. Even worse, useless interactions may introduce noise and complicate the training process. In this work, we propose a two-stage algorithm called Automatic Feature Interaction Selection (AutoFIS). AutoFIS can automatically identify important feature interactions for factorization models with computational cost just equivalent to training the target model to convergence. In the \emph{search stage}, instead of searching over a discrete set of candidate feature interactions, we relax the choices to be continuous by introducing the architecture parameters. By implementing a regularized optimizer over the architecture parameters, the model can automatically identify and remove the redundant feature interactions during the training process of the model. In the \emph{re-train stage}, we keep the architecture parameters serving as an attention unit to further boost the performance. Offline experiments on three large-scale datasets (two public benchmarks, one private) demonstrate that AutoFIS can significantly improve various FM based models. AutoFIS has been deployed onto the training platform of Huawei App Store recommendation service, where a 10-day online A/B test demonstrated that AutoFIS improved the DeepFM model by 20.3\% and 20.1\% in terms of CTR and CVR respectively. △ Less

Submitted 3 July, 2020; v1 submitted 25 March, 2020; originally announced March 2020.

Comments: KDD 2020 ADS track oral accepted

arXiv:2003.06693 [pdf, other]

Certified Defenses for Adversarial Patches

Authors: **-Yeh Chiang, Renkun Ni, Ahmed Abdelkader, Chen Zhu, Christoph Studer, Tom Goldstein

Abstract: Adversarial patch attacks are among one of the most practical threat models against real-world computer vision systems. This paper studies certified and empirical defenses against patch attacks. We begin with a set of experiments showing that most existing defenses, which work by pre-processing input images to mitigate adversarial patches, are easily broken by simple white-box adversaries. Motivat… ▽ More Adversarial patch attacks are among one of the most practical threat models against real-world computer vision systems. This paper studies certified and empirical defenses against patch attacks. We begin with a set of experiments showing that most existing defenses, which work by pre-processing input images to mitigate adversarial patches, are easily broken by simple white-box adversaries. Motivated by this finding, we propose the first certified defense against patch attacks, and propose faster methods for its training. Furthermore, we experiment with different patch shapes for testing, obtaining surprisingly good robustness transfer across shapes, and present preliminary results on certified defense against sparse attacks. Our complete implementation can be found on: https://github.com/**-C/certifiedpatchdefense. △ Less

Submitted 25 September, 2020; v1 submitted 14 March, 2020; originally announced March 2020.

Comments: International Conference on Learning Representations, ICLR 2020

arXiv:2002.09841 [pdf, other]

SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback

Authors: Chao Wang, Hengshu Zhu, Chen Zhu, Chuan Qin, Hui Xiong

Abstract: The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the well-known pairwise and listwise approaches h… ▽ More The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the well-known pairwise and listwise approaches have still been limited by various challenges. Specifically, for the pairwise approaches, the assumption of independent pairwise preference is not always held in practice. Also, the listwise approaches cannot efficiently accommodate "ties" due to the precondition of the entire list permutation. To this end, in this paper, we propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to inherently accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons and can be implemented with matrix factorization and neural networks. Meanwhile, we also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $\sqrt{M/N}$, where $M$ and $N$ are the numbers of items and users, respectively. Finally, extensive experiments on four real-world datasets clearly validate the superiority of SetRank compared with various state-of-the-art baselines. △ Less

Submitted 23 February, 2020; originally announced February 2020.

Comments: This paper has been accepted in AAAI'20

Journal ref: The Thirty-Fourth AAAI Conference on Artificial Intelligenc (AAAI'20), New York, New York, USA, 2020

arXiv:2002.09766 [pdf, other]

Improving the Tightness of Convex Relaxation Bounds for Training Certifiably Robust Classifiers

Authors: Chen Zhu, Renkun Ni, **-yeh Chiang, Hengduo Li, Furong Huang, Tom Goldstein

Abstract: Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical robustness. In principle, convex relaxation can provide tight bounds if the solution to the relaxed problem is feasible for the original non-convex problem. We propose two regularizers that can be used to train neural ne… ▽ More Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical robustness. In principle, convex relaxation can provide tight bounds if the solution to the relaxed problem is feasible for the original non-convex problem. We propose two regularizers that can be used to train neural networks that yield tighter convex relaxation bounds for robustness. In all of our experiments, the proposed regularizers result in higher certified accuracy than non-regularized baselines. △ Less

Submitted 22 February, 2020; originally announced February 2020.

arXiv:2001.02810 [pdf, other]

A Unified Framework for Coupled Tensor Completion

Authors: Huyan Huang, Yipeng Liu, Ce Zhu

Abstract: Coupled tensor decomposition reveals the joint data structure by incorporating priori knowledge that come from the latent coupled factors. The tensor ring (TR) decomposition is invariant under the permutation of tensors with different mode properties, which ensures the uniformity of decomposed factors and mode attributes. The TR has powerful expression ability and achieves success in some multi-di… ▽ More Coupled tensor decomposition reveals the joint data structure by incorporating priori knowledge that come from the latent coupled factors. The tensor ring (TR) decomposition is invariant under the permutation of tensors with different mode properties, which ensures the uniformity of decomposed factors and mode attributes. The TR has powerful expression ability and achieves success in some multi-dimensional data processing applications. To let coupled tensors help each other for missing component estimation, in this paper we utilize TR for coupled completion by sharing parts of the latent factors. The optimization model for coupled TR completion is developed with a novel Frobenius norm. It is solved by the block coordinate descent algorithm which efficiently solves a series of quadratic problems resulted from sampling pattern. The excess risk bound for this optimization model shows the theoretical performance enhancement in comparison with other coupled nuclear norm based methods. The proposed method is validated on numerical experiments on synthetic data, and experimental results on real-world data demonstrate its superiority over the state-of-the-art methods in terms of recovery accuracy. △ Less

Submitted 8 November, 2020; v1 submitted 8 January, 2020; originally announced January 2020.

arXiv:1905.08446 [pdf, other]

Inference for Change Points in High Dimensional Data via Self-Normalization

Authors: Runmin Wang, Changbo Zhu, Stanislav Volgushev, Xiaofeng Shao

Abstract: This article considers change point testing and estimation for a sequence of high-dimensional data. In the case of testing for a mean shift for high-dimensional independent data, we propose a new test which is based on $U$-statistic in Chen and Qin (2010) and utilizes the self-normalization principle [Shao (2010), Shao and Zhang (2010)]. Our test targets dense alternatives in the high-dimensional… ▽ More This article considers change point testing and estimation for a sequence of high-dimensional data. In the case of testing for a mean shift for high-dimensional independent data, we propose a new test which is based on $U$-statistic in Chen and Qin (2010) and utilizes the self-normalization principle [Shao (2010), Shao and Zhang (2010)]. Our test targets dense alternatives in the high-dimensional setting and involves no tuning parameters. To extend to change point testing for high-dimensional time series, we introduce a trimming parameter and formulate a self-normalized test statistic with trimming to accommodate the weak temporal dependence. On the theory front, we derive the limiting distributions of self-normalized test statistics under both the null and alternatives for both independent and dependent high-dimensional data. At the core of our asymptotic theory, we obtain weak convergence of a sequential U-statistic based process for high-dimensional independent data, and weak convergence of sequential trimmed U-statistic based processes for high-dimensional linear processes, both of which are of independent interests. Additionally, we illustrate how our tests can be used in combination with wild binary segmentation to estimate the number and location of multiple change points. Numerical simulations demonstrate the competitiveness of our proposed testing and estimation procedures in comparison with several existing methods in the literature. △ Less

Submitted 8 August, 2021; v1 submitted 21 May, 2019; originally announced May 2019.

arXiv:1905.08232 [pdf, other]

Adversarially robust transfer learning

Authors: Ali Shafahi, Parsa Saadatpanah, Chen Zhu, Amin Ghiasi, Christoph Studer, David Jacobs, Tom Goldstein

Abstract: Transfer learning, in which a network is trained on one task and re-purposed on another, is often used to produce neural network classifiers when data is scarce or full-scale training is too costly. When the goal is to produce a model that is not only accurate but also adversarially robust, data scarcity and computational limitations become even more cumbersome. We consider robust transfer learnin… ▽ More Transfer learning, in which a network is trained on one task and re-purposed on another, is often used to produce neural network classifiers when data is scarce or full-scale training is too costly. When the goal is to produce a model that is not only accurate but also adversarially robust, data scarcity and computational limitations become even more cumbersome. We consider robust transfer learning, in which we transfer not only performance but also robustness from a source model to a target domain. We start by observing that robust networks contain robust feature extractors. By training classifiers on top of these feature extractors, we produce new models that inherit the robustness of their parent networks. We then consider the case of fine tuning a network by re-training end-to-end in the target domain. When using lifelong learning strategies, this process preserves the robustness of the source network while achieving high accuracy. By using such strategies, it is possible to produce accurate and robust models with little data, and without the cost of adversarial training. Additionally, we can improve the generalization of adversarially trained models, while maintaining their robustness. △ Less

Submitted 21 February, 2020; v1 submitted 20 May, 2019; originally announced May 2019.

arXiv:1905.05897 [pdf, other]

Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

Authors: Chen Zhu, W. Ronny Huang, Ali Shafahi, Hengduo Li, Gavin Taylor, Christoph Studer, Tom Goldstein

Abstract: Clean-label poisoning attacks inject innocuous looking (and "correctly" labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data. We consider transferable poisoning attacks that succeed without access to the victim network's outputs, architecture, or (in some cases) training data. To achieve this, we propose a new "polytope attack"… ▽ More Clean-label poisoning attacks inject innocuous looking (and "correctly" labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data. We consider transferable poisoning attacks that succeed without access to the victim network's outputs, architecture, or (in some cases) training data. To achieve this, we propose a new "polytope attack" in which poison images are designed to surround the targeted image in feature space. We also demonstrate that using Dropout during poison creation helps to enhance transferability of this attack. We achieve transferable attack success rates of over 50% while poisoning only 1% of the training set. △ Less

Submitted 16 May, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

Comments: Accepted to ICML2019

arXiv:1904.00435 [pdf, other]

doi 10.1109/TCI.2020.3006718

Robust Low-Rank Tensor Ring Completion

Authors: Huyan Huang, Yipeng Liu, Ce Zhu

Abstract: Low-rank tensor completion recovers missing entries based on different tensor decompositions. Due to its outstanding performance in exploiting some higher-order data structure, low rank tensor ring has been applied in tensor completion. To further deal with its sensitivity to sparse component as it does in tensor principle component analysis, we propose robust tensor ring completion (RTRC), which… ▽ More Low-rank tensor completion recovers missing entries based on different tensor decompositions. Due to its outstanding performance in exploiting some higher-order data structure, low rank tensor ring has been applied in tensor completion. To further deal with its sensitivity to sparse component as it does in tensor principle component analysis, we propose robust tensor ring completion (RTRC), which separates latent low-rank tensor component from sparse component with limited number of measurements. The low rank tensor component is constrained by the weighted sum of nuclear norms of its balanced unfoldings, while the sparse component is regularized by its l1 norm. We analyze the RTRC model and gives the exact recovery guarantee. The alternating direction method of multipliers is used to divide the problem into several sub-problems with fast solutions. In numerical experiments, we verify the recovery condition of the proposed method on synthetic data, and show the proposed method outperforms the state-of-the-art ones in terms of both accuracy and computational complexity in a number of real-world data based tasks, i.e., light-field image recovery, shadow removal in face images, and background extraction in color video. △ Less

Submitted 23 April, 2020; v1 submitted 31 March, 2019; originally announced April 2019.

arXiv:1903.04735 [pdf, other]

Low-rank Tensor Grid for Image Completion

Authors: Huyan Huang, Yipeng Liu, Ce Zhu

Abstract: Tensor completion estimates missing components by exploiting the low-rank structure of multi-way data. The recently proposed methods based on tensor train (TT) and tensor ring (TR) show better performance in image recovery than classical ones. Compared with TT and TR, the projected entangled pair state (PEPS), which is also called tensor grid (TG), allows more interactions between different dimens… ▽ More Tensor completion estimates missing components by exploiting the low-rank structure of multi-way data. The recently proposed methods based on tensor train (TT) and tensor ring (TR) show better performance in image recovery than classical ones. Compared with TT and TR, the projected entangled pair state (PEPS), which is also called tensor grid (TG), allows more interactions between different dimensions, and may lead to more compact representation. In this paper, we propose to perform image completion based on low-rank tensor grid. A two-stage density matrix renormalization group algorithm is used for initialization of TG decomposition, which consists of multiple TT decompositions. The latent TG factors can be alternatively obtained by solving alternating least squares problems. To further improve the computational efficiency, a multi-linear matrix factorization for low rank TG completion is developed by using parallel matrix factorization. Experimental results on synthetic data and real-world images show the proposed methods outperform the existing ones in terms of recovery accuracy. △ Less

Submitted 23 April, 2020; v1 submitted 12 March, 2019; originally announced March 2019.

arXiv:1903.03315 [pdf, other]

doi 10.1016/j.sigpro.2020.107486

Provable Tensor Ring Completion

Authors: Huyan Huang, Yipeng Liu, Ce Zhu

Abstract: Tensor completion recovers a multi-dimensional array from a limited number of measurements. Using the recently proposed tensor ring (TR) decomposition, in this paper we show that a d-order tensor of dimensional size n and TR rank r can be exactly recovered with high probability by solving a convex optimization program, given n^{d/2} r^2 ln^7(n^{d/2})samples. The proposed TR incoherence condition u… ▽ More Tensor completion recovers a multi-dimensional array from a limited number of measurements. Using the recently proposed tensor ring (TR) decomposition, in this paper we show that a d-order tensor of dimensional size n and TR rank r can be exactly recovered with high probability by solving a convex optimization program, given n^{d/2} r^2 ln^7(n^{d/2})samples. The proposed TR incoherence condition under which the result holds is similar to the matrix incoherence condition. The experiments on synthetic data verify the recovery guarantee for TR completion. Moreover, the experiments on real-world data show that our method improves the recovery performance compared with the state-of-the-art methods. △ Less

Submitted 8 January, 2020; v1 submitted 8 March, 2019; originally announced March 2019.

Journal ref: Signal Processing, vol. 171, p. 107486, 2020

arXiv:1902.07279 [pdf, ps, other]

Interpoint Distance Based Two Sample Tests in High Dimension

Authors: Changbo Zhu, Xiaofeng Shao

Abstract: In this paper, we study a class of two sample test statistics based on inter-point distances in the high dimensional and low sample size setting. Our test statistics include the well-known energy distance and maximum mean discrepancy with Gaussian and Laplacian kernels, and the critical values are obtained via permutations. We show that all these tests are inconsistent when the two high dimensiona… ▽ More In this paper, we study a class of two sample test statistics based on inter-point distances in the high dimensional and low sample size setting. Our test statistics include the well-known energy distance and maximum mean discrepancy with Gaussian and Laplacian kernels, and the critical values are obtained via permutations. We show that all these tests are inconsistent when the two high dimensional distributions correspond to the same marginal distributions but differ in other aspects of the distributions. The tests based on energy distance and maximum mean discrepancy are mainly targeting the differences between marginal means and variances, whereas the test based on $L^1$-distance can capture the difference in marginal distributions. Our theory sheds new light on the limitation of inter-point distance based tests, the impact of different distance metrics, and the behavior of permutation tests in high dimension. Some simulation results and a real data illustration are also presented to corroborate our theoretical findings. △ Less

Submitted 10 April, 2020; v1 submitted 19 February, 2019; originally announced February 2019.

arXiv:1902.03291 [pdf, other]

Distance-based and RKHS-based Dependence Metrics in High Dimension

Authors: Changbo Zhu, Shun Yao, Xianyang Zhang, Xiaofeng Shao

Abstract: In this paper, we study distance covariance, Hilbert-Schmidt covariance (aka Hilbert-Schmidt independence criterion [Gretton et al. (2008)]) and related independence tests under the high dimensional scenario. We show that the sample distance/Hilbert-Schmidt covariance between two random vectors can be approximated by the sum of squared componentwise sample cross-covariances up to an asymptotically… ▽ More In this paper, we study distance covariance, Hilbert-Schmidt covariance (aka Hilbert-Schmidt independence criterion [Gretton et al. (2008)]) and related independence tests under the high dimensional scenario. We show that the sample distance/Hilbert-Schmidt covariance between two random vectors can be approximated by the sum of squared componentwise sample cross-covariances up to an asymptotically constant factor, which indicates that the distance/Hilbert-Schmidt covariance based test can only capture linear dependence in high dimension. As a consequence, the distance correlation based t-test developed by Szekely and Rizzo (2013) for independence is shown to have trivial limiting power when the two random vectors are nonlinearly dependent but component-wisely uncorrelated. This new and surprising phenomenon, which seems to be discovered for the first time, is further confirmed in our simulation study. As a remedy, we propose tests based on an aggregation of marginal sample distance/Hilbert-Schmidt covariances and show their superior power behavior against their joint counterparts in simulations. We further extend the distance correlation based t-test to those based on Hilbert-Schmidt covariance and marginal distance/Hilbert-Schmidt covariance. A novel unified approach is developed to analyze the studentized sample distance/Hilbert-Schmidt covariance as well as the studentized sample marginal distance covariance under both null and alternative hypothesis. Our theoretical and simulation results shed light on the limitation of distance/Hilbert-Schmidt covariance when used jointly in the high dimensional setting and suggest the aggregation of marginal distance/Hilbert-Schmidt covariance as a useful alternative. △ Less

Submitted 8 February, 2019; originally announced February 2019.

arXiv:1806.01453 [pdf, other]

Calibration for computer experiments with binary responses and application to cell adhesion study

Authors: Chih-Li Sung, Ying Hung, William Rittase, Cheng Zhu, C. F. Jeff Wu

Abstract: Calibration refers to the estimation of unknown parameters which are present in computer experiments but not available in physical experiments. An accurate estimation of these parameters is important because it provides a scientific understanding of the underlying system which is not available in physical experiments. Most of the work in the literature is limited to the analysis of continuous resp… ▽ More Calibration refers to the estimation of unknown parameters which are present in computer experiments but not available in physical experiments. An accurate estimation of these parameters is important because it provides a scientific understanding of the underlying system which is not available in physical experiments. Most of the work in the literature is limited to the analysis of continuous responses. Motivated by a study of cell adhesion experiments, we propose a new calibration framework for binary responses. Its application to the T cell adhesion data provides insight into the unknown values of the kinetic parameters which are difficult to determine by physical experiments due to the limitation of the existing experimental techniques. △ Less

Submitted 20 March, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

Comments: 39 pages, 7 figures

arXiv:1709.06134 [pdf, other]

Discrete Dynamic Causal Modeling and Its Relationship with Directed Information

Authors: Zhe Wang, Yu Zheng, David C. Zhu, Jian Ren, Tongtong Li

Abstract: This paper explores the discrete Dynamic Causal Modeling (DDCM) and its relationship with Directed Information (DI). We prove the conditional equivalence between DDCM and DI in characterizing the causal relationship between two brain regions. The theoretical results are demonstrated using fMRI data obtained under both resting state and stimulus based state. Our numerical analysis is consistent wit… ▽ More This paper explores the discrete Dynamic Causal Modeling (DDCM) and its relationship with Directed Information (DI). We prove the conditional equivalence between DDCM and DI in characterizing the causal relationship between two brain regions. The theoretical results are demonstrated using fMRI data obtained under both resting state and stimulus based state. Our numerical analysis is consistent with that reported in previous study. △ Less

Submitted 18 September, 2017; originally announced September 2017.

arXiv:1705.02511 [pdf, other]

A generalized Gaussian process model for computer experiments with binary time series

Authors: Chih-Li Sung, Ying Hung, William Rittase, Cheng Zhu, C. F. Jeff Wu

Abstract: Non-Gaussian observations such as binary responses are common in some computer experiments. Motivated by the analysis of a class of cell adhesion experiments, we introduce a generalized Gaussian process model for binary responses, which shares some common features with standard GP models. In addition, the proposed model incorporates a flexible mean function that can capture different types of time… ▽ More Non-Gaussian observations such as binary responses are common in some computer experiments. Motivated by the analysis of a class of cell adhesion experiments, we introduce a generalized Gaussian process model for binary responses, which shares some common features with standard GP models. In addition, the proposed model incorporates a flexible mean function that can capture different types of time series structures. Asymptotic properties of the estimators are derived, and an optimal predictor as well as its predictive distribution are constructed. Their performance is examined via two simulation studies. The methodology is applied to study computer simulations for cell adhesion experiments. The fitted model reveals important biological information in repeated cell bindings, which is not directly observable in lab experiments. △ Less

Submitted 24 September, 2018; v1 submitted 6 May, 2017; originally announced May 2017.

Comments: 49 pages, 4 figures

Showing 1–37 of 37 results for author: Zhu, C