Search | arXiv e-print repository

Gaussian Differential Privacy on Riemannian Manifolds

Authors: Yangdi Jiang, Xiaotian Chang, Yi Liu, Lei Ding, Linglong Kong, Bei Jiang

Abstract: We develop an advanced approach for extending Gaussian Differential Privacy (GDP) to general Riemannian manifolds. The concept of GDP stands out as a prominent privacy definition that strongly warrants extension to manifold settings, due to its central limit properties. By harnessing the power of the renowned Bishop-Gromov theorem in geometric analysis, we propose a Riemannian Gaussian distributio… ▽ More We develop an advanced approach for extending Gaussian Differential Privacy (GDP) to general Riemannian manifolds. The concept of GDP stands out as a prominent privacy definition that strongly warrants extension to manifold settings, due to its central limit properties. By harnessing the power of the renowned Bishop-Gromov theorem in geometric analysis, we propose a Riemannian Gaussian distribution that integrates the Riemannian distance, allowing us to achieve GDP in Riemannian manifolds with bounded Ricci curvature. To the best of our knowledge, this work marks the first instance of extending the GDP framework to accommodate general Riemannian manifolds, encompassing curved spaces, and circumventing the reliance on tangent space summaries. We provide a simple algorithm to evaluate the privacy budget $μ$ on any one-dimensional manifold and introduce a versatile Markov Chain Monte Carlo (MCMC)-based algorithm to calculate $μ$ on any Riemannian manifold with constant curvature. Through simulations on one of the most prevalent manifolds in statistics, the unit sphere $S^d$, we demonstrate the superior utility of our Riemannian Gaussian mechanism in comparison to the previously proposed Riemannian Laplace mechanism for implementing GDP. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.06746 [pdf, other]

Causal Rule Learning: Enhancing the Understanding of Heterogeneous Treatment Effect via Weighted Causal Rules

Authors: Ying Wu, Hanzhong Liu, Kai Ren, Xiangyu Chang

Abstract: Interpretability is a key concern in estimating heterogeneous treatment effects using machine learning methods, especially for healthcare applications where high-stake decisions are often made. Inspired by the Predictive, Descriptive, Relevant framework of interpretability, we propose causal rule learning which finds a refined set of causal rules characterizing potential subgroups to estimate and… ▽ More Interpretability is a key concern in estimating heterogeneous treatment effects using machine learning methods, especially for healthcare applications where high-stake decisions are often made. Inspired by the Predictive, Descriptive, Relevant framework of interpretability, we propose causal rule learning which finds a refined set of causal rules characterizing potential subgroups to estimate and enhance our understanding of heterogeneous treatment effects. Causal rule learning involves three phases: rule discovery, rule selection, and rule analysis. In the rule discovery phase, we utilize a causal forest to generate a pool of causal rules with corresponding subgroup average treatment effects. The selection phase then employs a D-learning method to select a subset of these rules to deconstruct individual-level treatment effects as a linear combination of the subgroup-level effects. This helps to answer an ignored question by previous literature: what if an individual simultaneously belongs to multiple groups with different average treatment effects? The rule analysis phase outlines a detailed procedure to further analyze each rule in the subset from multiple perspectives, revealing the most promising rules for further validation. The rules themselves, their corresponding subgroup treatment effects, and their weights in the linear combination give us more insights into heterogeneous treatment effects. Simulation and real-world data analysis demonstrate the superior performance of causal rule learning on the interpretable estimation of heterogeneous treatment effect when the ground truth is complex and the sample size is sufficient. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2307.10572 [pdf, other]

doi 10.1016/j.csda.2024.107987

Spectral co-Clustering in Multi-layer Directed Networks

Authors: Wenqing Su, Xiao Guo, Xiangyu Chang, Ying Yang

Abstract: Modern network analysis often involves multi-layer network data in which the nodes are aligned, and the edges on each layer represent one of the multiple relations among the nodes. Current literature on multi-layer network data is mostly limited to undirected relations. However, direct relations are more common and may introduce extra information. This study focuses on community detection (or clus… ▽ More Modern network analysis often involves multi-layer network data in which the nodes are aligned, and the edges on each layer represent one of the multiple relations among the nodes. Current literature on multi-layer network data is mostly limited to undirected relations. However, direct relations are more common and may introduce extra information. This study focuses on community detection (or clustering) in multi-layer directed networks. To take into account the asymmetry, a novel spectral-co-clustering-based algorithm is developed to detect co-clusters, which capture the sending patterns and receiving patterns of nodes, respectively. Specifically, the eigendecomposition of the debiased sum of Gram matrices over the layer-wise adjacency matrices is computed, followed by the k-means, where the sum of Gram matrices is used to avoid possible cancellation of clusters caused by direct summation. Theoretical analysis of the algorithm under the multi-layer stochastic co-block model is provided, where the common assumption that the cluster number is coupled with the rank of the model is relaxed. After a systematic analysis of the eigenvectors of the population version algorithm, the misclassification rates are derived, which show that multi-layers would bring benefits to the clustering performance. The experimental results of simulated data corroborate the theoretical predictions, and the analysis of a real-world trade network dataset provides interpretable results. △ Less

Submitted 16 June, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

Journal ref: Computational Statistics & Data Analysis (2024) 107987

arXiv:2306.15709 [pdf, other]

Privacy-Preserving Community Detection for Locally Distributed Multiple Networks

Authors: Xiao Guo, Xiang Li, Xiangyu Chang, Shujie Ma

Abstract: Modern multi-layer networks are commonly stored and analyzed in a local and distributed fashion because of the privacy, ownership, and communication costs. The literature on the model-based statistical methods for community detection based on these data is still limited. This paper proposes a new method for consensus community detection and estimation in a multi-layer stochastic block model using… ▽ More Modern multi-layer networks are commonly stored and analyzed in a local and distributed fashion because of the privacy, ownership, and communication costs. The literature on the model-based statistical methods for community detection based on these data is still limited. This paper proposes a new method for consensus community detection and estimation in a multi-layer stochastic block model using locally stored and computed network data with privacy protection. A novel algorithm named privacy-preserving Distributed Spectral Clustering (ppDSC) is developed. To preserve the edges' privacy, we adopt the randomized response (RR) mechanism to perturb the network edges, which satisfies the strong notion of differential privacy. The ppDSC algorithm is performed on the squared RR-perturbed adjacency matrices to prevent possible cancellation of communities among different layers. To remove the bias incurred by RR and the squared network matrices, we develop a two-step bias-adjustment procedure. Then we perform eigen-decomposition on the debiased matrices, aggregation of the local eigenvectors using an orthogonal Procrustes transformation, and k-means clustering. We provide theoretical analysis on the statistical errors of ppDSC in terms of eigen-vector estimation. In addition, the blessings and curses of network heterogeneity are well-explained by our bounds. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2304.06900 [pdf, other]

Subsampling-Based Modified Bayesian Information Criterion for Large-Scale Stochastic Block Models

Authors: Jiayi Deng, Danyang Huang, Xiangyu Chang, Bo Zhang

Abstract: Identifying the number of communities is a fundamental problem in community detection, which has received increasing attention recently. However, rapid advances in technology have led to the emergence of large-scale networks in various disciplines, thereby making existing methods computationally infeasible. To address this challenge, we propose a novel subsampling-based modified Bayesian informati… ▽ More Identifying the number of communities is a fundamental problem in community detection, which has received increasing attention recently. However, rapid advances in technology have led to the emergence of large-scale networks in various disciplines, thereby making existing methods computationally infeasible. To address this challenge, we propose a novel subsampling-based modified Bayesian information criterion (SM-BIC) for identifying the number of communities in a network generated via the stochastic block model and degree-corrected stochastic block model. We first propose a node-pair subsampling method to extract an informative subnetwork from the entire network, and then we derive a purely data-driven criterion to identify the number of communities for the subnetwork. In this way, the SM-BIC can identify the number of communities based on the subsampled network instead of the entire dataset. This leads to important computational advantages over existing methods. We theoretically investigate the computational complexity and identification consistency of the SM-BIC. Furthermore, the advantages of the SM-BIC are demonstrated by extensive numerical studies. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2303.05223 [pdf, other]

LEAP: The latent exchangeability prior for borrowing information from historical data

Authors: Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, Joseph G. Ibrahim

Abstract: It is becoming increasingly popular to elicit informative priors on the basis of historical data. Popular existing priors, including the power prior, commensurate prior, and robust meta-analytic prior provide blanket discounting. Thus, if only a subset of participants in the historical data are exchangeable with the current data, these priors may not be appropriate. In order to combat this issue,… ▽ More It is becoming increasingly popular to elicit informative priors on the basis of historical data. Popular existing priors, including the power prior, commensurate prior, and robust meta-analytic prior provide blanket discounting. Thus, if only a subset of participants in the historical data are exchangeable with the current data, these priors may not be appropriate. In order to combat this issue, propensity score (PS) approaches have been proposed. However, PS approaches are only concerned with the covariate distribution, whereas exchangeability is typically assessed with parameters pertaining to the outcome. In this paper, we introduce the latent exchangeability prior (LEAP), where observations in the historical data are classified into exchangeable and non-exchangeable groups. The LEAP discounts the historical data by identifying the most relevant subjects from the historical data. We compare our proposed approach against alternative approaches in simulations and present a case study using our proposed prior to augment a control arm in a phase 3 clinical trial in plaque psoriasis with an unbalanced randomization scheme. △ Less

Submitted 9 March, 2023; originally announced March 2023.

arXiv:2210.16835 [pdf, other]

Variance reduced Shapley value estimation for trustworthy data valuation

Authors: Mengmeng Wu, Ruoxi Jia, Changle Lin, Wei Huang, Xiangyu Chang

Abstract: Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of the permutation sampling algorithm. To make up for the large estimation variance of the permutation sampling that hinders the development of the data marketplace,… ▽ More Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of the permutation sampling algorithm. To make up for the large estimation variance of the permutation sampling that hinders the development of the data marketplace, we propose a more robust data valuation method using stratified sampling, named variance reduced data Shapley (VRDS for short). We theoretically show how to stratify, how many samples are taken at each stratum, and the sample complexity analysis of VRDS. Finally, the effectiveness of VRDS is illustrated in different types of datasets and data removal applications. △ Less

Submitted 22 May, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

arXiv:2209.13807 [pdf, other]

Asynchronous and Error-prone Longitudinal Data Analysis via Functional Calibration

Authors: Xinyue Chang, Yehua Li, Yi Li

Abstract: In many longitudinal settings, time-varying covariates may not be measured at the same time as responses and are often prone to measurement error. Naive last-observation-carried-forward methods incur estimation biases, and existing kernel-based methods suffer from slow convergence rates and large variations. To address these challenges, we propose a new functional calibration approach to efficient… ▽ More In many longitudinal settings, time-varying covariates may not be measured at the same time as responses and are often prone to measurement error. Naive last-observation-carried-forward methods incur estimation biases, and existing kernel-based methods suffer from slow convergence rates and large variations. To address these challenges, we propose a new functional calibration approach to efficiently learn longitudinal covariate processes based on sparse functional data with measurement error. Our approach, stemming from functional principal component analysis, calibrates the unobserved synchronized covariate values from the observed asynchronous and error-prone covariate values, and is broadly applicable to asynchronous longitudinal regression with time-invariant or time-varying coefficients. For regression with time-invariant coefficients, our estimator is asymptotically unbiased, root-n consistent, and asymptotically normal; for time-varying coefficient models, our estimator has the optimal varying coefficient model convergence rate with inflated asymptotic variance from the calibration. In both cases, our estimators present asymptotic properties superior to the existing methods. The feasibility and usability of the proposed methods are verified by simulations and an application to the Study of Women's Health Across the Nation, a large-scale multi-site longitudinal study on women's health during mid-life. △ Less

Submitted 8 March, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

arXiv:2209.12401 [pdf, ps, other]

Elevator Optimization: Application of Spatial Process and Gibbs Random Field Approaches for Dumbwaiter Modeling and Multi-Dumbwaiter Systems

Authors: Zheng Cao, Benjamin Lu Davis, Wanchaloem Wunkaew, Xinyu Chang

Abstract: This research investigates analytical and quantitative methods for simulating elevator optimizations. To maximize overall elevator usage, we concentrate on creating a multiple-user positive-sum system that is inspired by agent-based game theory. We define and create basic "Dumbwaiter" models by attempting both the Spatial Process Approach and the Gibbs Random Field Approach. These two mathematical… ▽ More This research investigates analytical and quantitative methods for simulating elevator optimizations. To maximize overall elevator usage, we concentrate on creating a multiple-user positive-sum system that is inspired by agent-based game theory. We define and create basic "Dumbwaiter" models by attempting both the Spatial Process Approach and the Gibbs Random Field Approach. These two mathematical techniques approach the problem from different points of view: the spatial process can give an analytical solution in continuous space and the Gibbs Random Field provides a discrete framework to flexibly model the problem on a computer. Starting from the simplest case, we target the assumptions to provide concrete solutions to the models and develop a "Multi-Dumbwaiter System". This paper examines, evaluates, and proves the ultimate success of such implemented strategies to design the basic elevator's optimal policy; consequently, not only do we believe in the results' practicality for industry, but also their potential for application. △ Less

Submitted 23 December, 2022; v1 submitted 25 September, 2022; originally announced September 2022.

Comments: 14 pages

MSC Class: 93-10; 60J05; 90B36 ACM Class: G.1.6; G.3; I.6.5

arXiv:2206.15379 [pdf, ps, other]

On the efficacy of higher-order spectral clustering under weighted stochastic block models

Authors: Xiao Guo, Hai Zhang, Xiangyu Chang

Abstract: Higher-order structures of networks, namely, small subgraphs of networks (also called network motifs), are widely known to be crucial and essential to the organization of networks. There has been a few work studying the community detection problem -- a fundamental problem in network analysis, at the level of motifs. In particular, higher-order spectral clustering has been developed, where the noti… ▽ More Higher-order structures of networks, namely, small subgraphs of networks (also called network motifs), are widely known to be crucial and essential to the organization of networks. There has been a few work studying the community detection problem -- a fundamental problem in network analysis, at the level of motifs. In particular, higher-order spectral clustering has been developed, where the notion of motif adjacency matrix is introduced as the input of the algorithm. However, it remains largely unknown that how higher-order spectral clustering works and when it performs better than its edge-based counterpart. To elucidate these problems, we investigate higher-order spectral clustering from a statistical perspective. In particular, we theoretically study the clustering performance of higher-order spectral clustering under a weighted stochastic block model and compare the resulting bounds with the corresponding results of edge-based spectral clustering. It turns out that when the network is dense with weak signal of weights, higher-order spectral clustering can really lead to the performance gain in clustering. We also use simulations and real data experiments to support the findings. △ Less

Submitted 13 April, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

arXiv:2205.05343 [pdf, other]

Learning Multitask Gaussian Bayesian Networks

Authors: Shuai Liu, Yixuan Qiu, Baojuan Li, Huaning Wang, Xiangyu Chang

Abstract: Major depressive disorder (MDD) requires study of brain functional connectivity alterations for patients, which can be uncovered by resting-state functional magnetic resonance imaging (rs-fMRI) data. We consider the problem of identifying alterations of brain functional connectivity for a single MDD patient. This is particularly difficult since the amount of data collected during an fMRI scan is t… ▽ More Major depressive disorder (MDD) requires study of brain functional connectivity alterations for patients, which can be uncovered by resting-state functional magnetic resonance imaging (rs-fMRI) data. We consider the problem of identifying alterations of brain functional connectivity for a single MDD patient. This is particularly difficult since the amount of data collected during an fMRI scan is too limited to provide sufficient information for individual analysis. Additionally, rs-fMRI data usually has the characteristics of incompleteness, sparsity, variability, high dimensionality and high noise. To address these problems, we proposed a multitask Gaussian Bayesian network (MTGBN) framework capable for identifying individual disease-induced alterations for MDD patients. We assume that such disease-induced alterations show some degrees of similarity with the tool to learn such network structures from observations to understanding of how system are structured jointly from related tasks. First, we treat each patient in a class of observation as a task and then learn the Gaussian Bayesian networks (GBNs) of this data class by learning from all tasks that share a default covariance matrix that encodes prior knowledge. This setting can help us to learn more information from limited data. Next, we derive a closed-form formula of the complete likelihood function and use the Monte-Carlo Expectation-Maximization(MCEM) algorithm to search for the approximately best Bayesian network structures efficiently. Finally, we assess the performance of our methods with simulated and real-world rs-fMRI data. △ Less

Submitted 8 June, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

arXiv:2109.10053 [pdf, other]

Toward a Fairness-Aware Scoring System for Algorithmic Decision-Making

Authors: Yi Yang, Ying Wu, Mei Li, Xiangyu Chang, Yong Tan

Abstract: Scoring systems, as a type of predictive model, have significant advantages in interpretability and transparency and facilitate quick decision-making. As such, scoring systems have been extensively used in a wide variety of industries such as healthcare and criminal justice. However, the fairness issues in these models have long been criticized, and the use of big data and machine learning algorit… ▽ More Scoring systems, as a type of predictive model, have significant advantages in interpretability and transparency and facilitate quick decision-making. As such, scoring systems have been extensively used in a wide variety of industries such as healthcare and criminal justice. However, the fairness issues in these models have long been criticized, and the use of big data and machine learning algorithms in the construction of scoring systems heightens this concern. In this paper, we propose a general framework to create fairness-aware, data-driven scoring systems. First, we develop a social welfare function that incorporates both efficiency and group fairness. Then, we transform the social welfare maximization problem into the risk minimization task in machine learning, and derive a fairness-aware scoring system with the help of mixed integer programming. Lastly, several theoretical bounds are derived for providing parameter selection suggestions. Our proposed framework provides a suitable solution to address group fairness concerns in the development of scoring systems. It enables policymakers to set and customize their desired fairness requirements as well as other application-specific constraints. We test the proposed algorithm with several empirical data sets. Experimental evidence supports the effectiveness of the proposed scoring system in achieving the optimal welfare of stakeholders and in balancing the needs for interpretability, fairness, and efficiency. △ Less

Submitted 22 November, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

arXiv:2109.01326 [pdf, other]

Statistical Estimation and Inference via Local SGD in Federated Learning

Authors: Xiang Li, Jiadong Liang, Xiangyu Chang, Zhihua Zhang

Abstract: Federated Learning (FL) makes a large amount of edge computing devices (e.g., mobile phones) jointly learn a global model without data sharing. In FL, data are generated in a decentralized manner with high heterogeneity. This paper studies how to perform statistical estimation and inference in the federated setting. We analyze the so-called Local SGD, a multi-round estimation procedure that uses i… ▽ More Federated Learning (FL) makes a large amount of edge computing devices (e.g., mobile phones) jointly learn a global model without data sharing. In FL, data are generated in a decentralized manner with high heterogeneity. This paper studies how to perform statistical estimation and inference in the federated setting. We analyze the so-called Local SGD, a multi-round estimation procedure that uses intermittent communication to improve communication efficiency. We first establish a {\it functional central limit theorem} that shows the averaged iterates of Local SGD weakly converge to a rescaled Brownian motion. We next provide two iterative inference methods: the {\it plug-in} and the {\it random scaling}. Random scaling constructs an asymptotically pivotal statistic for inference by using the information along the whole Local SGD path. Both the methods are communication efficient and applicable to online data. Our theoretical and empirical results show that Local SGD simultaneously achieves both statistical efficiency and communication efficiency. △ Less

Submitted 17 December, 2021; v1 submitted 3 September, 2021; originally announced September 2021.

arXiv:2103.00704 [pdf, other]

FedPower: Privacy-Preserving Distributed Eigenspace Estimation

Authors: Xiao Guo, Xiang Li, Xiangyu Chang, Shusen Wang, Zhihua Zhang

Abstract: Eigenspace estimation is fundamental in machine learning and statistics, which has found applications in PCA, dimension reduction, and clustering, among others. The modern machine learning community usually assumes that data come from and belong to different organizations. The low communication power and the possible privacy breaches of data make the computation of eigenspace challenging. To addre… ▽ More Eigenspace estimation is fundamental in machine learning and statistics, which has found applications in PCA, dimension reduction, and clustering, among others. The modern machine learning community usually assumes that data come from and belong to different organizations. The low communication power and the possible privacy breaches of data make the computation of eigenspace challenging. To address these challenges, we propose a class of algorithms called \textsf{FedPower} within the federated learning (FL) framework. \textsf{FedPower} leverages the well-known power method by alternating multiple local power iterations and a global aggregation step, thus improving communication efficiency. In the aggregation, we propose to weight each local eigenvector matrix with {\it Orthogonal Procrustes Transformation} (OPT) for better alignment. To ensure strong privacy protection, we add Gaussian noise in each iteration by adopting the notion of \emph{differential privacy} (DP). We provide convergence bounds for \textsf{FedPower} that are composed of different interpretable terms corresponding to the effects of Gaussian noise, parallelization, and random sampling of local machines. Additionally, we conduct experiments to demonstrate the effectiveness of our proposed algorithms. △ Less

Submitted 27 June, 2023; v1 submitted 28 February, 2021; originally announced March 2021.

arXiv:2101.09418 [pdf, other]

A Geospatial Functional Model For OCO-2 Data with Application on Imputation and Land Fraction Estimation

Authors: Xinyue Chang, Zhengyuan Zhu, Xiongtao Dai, Jonathan Hobbs

Abstract: Data from NASA's Orbiting Carbon Observatory-2 (OCO-2) satellite is essential to many carbon management strategies. A retrieval algorithm is used to estimate CO2 concentration using the radiance data measured by OCO-2. However, due to factors such as cloud cover and cosmic rays, the spatial coverage of the retrieval algorithm is limited in some areas of critical importance for carbon cycle science… ▽ More Data from NASA's Orbiting Carbon Observatory-2 (OCO-2) satellite is essential to many carbon management strategies. A retrieval algorithm is used to estimate CO2 concentration using the radiance data measured by OCO-2. However, due to factors such as cloud cover and cosmic rays, the spatial coverage of the retrieval algorithm is limited in some areas of critical importance for carbon cycle science. Mixed land/water pixels along the coastline are also not used in the retrieval processing due to the lack of valid ancillary variables including land fraction. We propose an approach to model spatial spectral data to solve these two problems by radiance imputation and land fraction estimation. The spectral observations are modeled as spatially indexed functional data with footprint-specific parameters and are reduced to much lower dimensions by functional principal component analysis. The principal component scores are modeled as random fields to account for the spatial dependence, and the missing spectral observations are imputed by kriging the principal component scores. The proposed method is shown to impute spectral radiance with high accuracy for observations over the Pacific Ocean. An unmixing approach based on this model provides much more accurate land fraction estimates in our validation study along Greece coastlines. △ Less

Submitted 23 January, 2021; originally announced January 2021.

arXiv:2012.08749 [pdf, other]

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Authors: Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Abstract: Deep networks are typically trained with many more parameters than the size of the training dataset. Recent empirical evidence indicates that the practice of overparameterization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models. Specifically, it suggests that overparameterization benefits model pruning / sparsification. This paper… ▽ More Deep networks are typically trained with many more parameters than the size of the training dataset. Recent empirical evidence indicates that the practice of overparameterization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models. Specifically, it suggests that overparameterization benefits model pruning / sparsification. This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional asymptotics of model pruning in the overparameterized regime. The theory presented addresses the following core question: "should one train a small model from the beginning, or first train a large model and then prune?". We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning rather than simply training with the known informative features. This leads to a new double descent in the training of sparse models: growing the original model, while preserving the target sparsity, improves the test accuracy as one moves beyond the overparameterization threshold. Our analysis further reveals the benefit of retraining by relating it to feature correlations. We find that the above phenomena are already present in linear and random-features models. Our technical approach advances the toolset of high-dimensional analysis and precisely characterizes the asymptotic distribution of over-parameterized least-squares. The intuition gained by analytically studying simpler models is numerically verified on neural networks. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: to appear at AAAI 2021

arXiv:2009.12362 [pdf, other]

Self-Weighted Robust LDA for Multiclass Classification with Edge Classes

Authors: Caixia Yan, Xiaojun Chang, Minnan Luo, Qinghua Zheng, Xiaoqin Zhang, Zhihui Li, Fei** Nie

Abstract: Linear discriminant analysis (LDA) is a popular technique to learn the most discriminative features for multi-class classification. A vast majority of existing LDA algorithms are prone to be dominated by the class with very large deviation from the others, i.e., edge class, which occurs frequently in multi-class classification. First, the existence of edge classes often makes the total mean biased… ▽ More Linear discriminant analysis (LDA) is a popular technique to learn the most discriminative features for multi-class classification. A vast majority of existing LDA algorithms are prone to be dominated by the class with very large deviation from the others, i.e., edge class, which occurs frequently in multi-class classification. First, the existence of edge classes often makes the total mean biased in the calculation of between-class scatter matrix. Second, the exploitation of l2-norm based between-class distance criterion magnifies the extremely large distance corresponding to edge class. In this regard, a novel self-weighted robust LDA with l21-norm based pairwise between-class distance criterion, called SWRLDA, is proposed for multi-class classification especially with edge classes. SWRLDA can automatically avoid the optimal mean calculation and simultaneously learn adaptive weights for each class pair without setting any additional parameter. An efficient re-weighted algorithm is exploited to derive the global optimum of the challenging l21-norm maximization problem. The proposed SWRLDA is easy to implement, and converges fast in practice. Extensive experiments demonstrate that SWRLDA performs favorably against other compared methods on both synthetic and real-world datasets, while presenting superior computational efficiency in comparison with other techniques. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: 17 pages, has been accepted by ACM TIST

arXiv:2009.01514 [pdf, other]

Kernel Interpolation of High Dimensional Scattered Data

Authors: Shao-Bo Lin, Xiangyu Chang, ** Sun

Abstract: Data sites selected from modeling high-dimensional problems often appear scattered in non-paternalistic ways. Except for sporadic clustering at some spots, they become relatively far apart as the dimension of the ambient space grows. These features defy any theoretical treatment that requires local or global quasi-uniformity of distribution of data sites. Incorporating a recently-developed applica… ▽ More Data sites selected from modeling high-dimensional problems often appear scattered in non-paternalistic ways. Except for sporadic clustering at some spots, they become relatively far apart as the dimension of the ambient space grows. These features defy any theoretical treatment that requires local or global quasi-uniformity of distribution of data sites. Incorporating a recently-developed application of integral operator theory in machine learning, we propose and study in the current article a new framework to analyze kernel interpolation of high dimensional data, which features bounding stochastic approximation error by the spectrum of the underlying kernel matrix. Both theoretical analysis and numerical simulations show that spectra of kernel matrices are reliable and stable barometers for gauging the performance of kernel-interpolation methods for high dimensional data. △ Less

Submitted 27 September, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

Comments: 33 pages, 5 figures

arXiv:2009.00236 [pdf, other]

A Survey of Deep Active Learning

Authors: Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Brij B. Gupta, Xiaojiang Chen, Xin Wang

Abstract: Active learning (AL) attempts to maximize the performance gain of the model by marking the fewest samples. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize massive parameters, so that the model learns how to extract high-quality features. In recent years, due to the rapid development of internet technology, we are in an era of information torrents and we… ▽ More Active learning (AL) attempts to maximize the performance gain of the model by marking the fewest samples. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize massive parameters, so that the model learns how to extract high-quality features. In recent years, due to the rapid development of internet technology, we are in an era of information torrents and we have massive amounts of data. In this way, DL has aroused strong interest of researchers and has been rapidly developed. Compared with DL, researchers have relatively low interest in AL. This is mainly because before the rise of DL, traditional machine learning requires relatively few labeled samples. Therefore, early AL is difficult to reflect the value it deserves. Although DL has made breakthroughs in various fields, most of this success is due to the publicity of the large number of existing annotation datasets. However, the acquisition of a large number of high-quality annotated datasets consumes a lot of manpower, which is not allowed in some fields that require high expertise, especially in the fields of speech recognition, information extraction, medical images, etc. Therefore, AL has gradually received due attention. A natural idea is whether AL can be used to reduce the cost of sample annotations, while retaining the powerful learning capabilities of DL. Therefore, deep active learning (DAL) has emerged. Although the related research has been quite abundant, it lacks a comprehensive survey of DAL. This article is to fill this gap, we provide a formal classification method for the existing work, and a comprehensive and systematic overview. In addition, we also analyzed and summarized the development of DAL from the perspective of application. Finally, we discussed the confusion and problems in DAL, and gave some possible development directions for DAL. △ Less

Submitted 5 December, 2021; v1 submitted 30 August, 2020; originally announced September 2020.

arXiv:2008.08844 [pdf, other]

Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Networks

Authors: Sitao Luan, Mingde Zhao, Chenqing Hua, Xiao-Wen Chang, Doina Precup

Abstract: The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filters the neighborhood node information. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN methods for learning on certain datasets, as they force the node representations similar, making… ▽ More The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filters the neighborhood node information. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN methods for learning on certain datasets, as they force the node representations similar, making the nodes gradually lose their identity and become indistinguishable. Hence, we augment the aggregation operations with their dual, i.e. diversification operators that make the node more distinct and preserve the identity. Such augmentation replaces the aggregation with a two-channel filtering process that, in theory, is beneficial for enriching the node representations. In practice, the proposed two-channel filters can be easily patched on existing GNN methods with diverse training strategies, including spectral and spatial (message passing) methods. In the experiments, we observe desired characteristics of the models and significant performance boost upon the baselines on 9 node classification tasks. △ Less

Submitted 2 November, 2022; v1 submitted 20 August, 2020; originally announced August 2020.

Comments: New Frontiers in Graph Learning (GLFrontiers) Workshop (Oral), NeurIPS 2022

arXiv:2008.08838 [pdf, ps, other]

Training Matters: Unlocking Potentials of Deeper Graph Convolutional Neural Networks

Authors: Sitao Luan, Mingde Zhao, Xiao-Wen Chang, Doina Precup

Abstract: The performance limit of Graph Convolutional Networks (GCNs) and the fact that we cannot stack more of them to increase the performance, which we usually do for other deep learning paradigms, are pervasively thought to be caused by the limitations of the GCN layers, including insufficient expressive power, etc. However, if so, for a fixed architecture, it would be unlikely to lower the training di… ▽ More The performance limit of Graph Convolutional Networks (GCNs) and the fact that we cannot stack more of them to increase the performance, which we usually do for other deep learning paradigms, are pervasively thought to be caused by the limitations of the GCN layers, including insufficient expressive power, etc. However, if so, for a fixed architecture, it would be unlikely to lower the training difficulty and to improve performance by changing only the training procedure, which we show in this paper not only possible but possible in several ways. This paper first identify the training difficulty of GCNs from the perspective of graph signal energy loss. More specifically, we find that the loss of energy in the backward pass during training nullifies the learning of the layers closer to the input. Then, we propose several methodologies to mitigate the training problem by slightly modifying the GCN operator, from the energy perspective. After empirical validation, we confirm that these changes of operator lead to significant decrease in the training difficulties and notable performance boost, without changing the composition of parameters. With these, we conclude that the root cause of the problem is more likely the training difficulty than the others. △ Less

Submitted 3 November, 2023; v1 submitted 20 August, 2020; originally announced August 2020.

Comments: Accepted by 12th International Conference on Complex Networks and Their Applications

arXiv:2006.13681 [pdf, other]

Multi-view Drone-based Geo-localization via Style and Spatial Alignment

Authors: Siyi Hu, Xiaojun Chang

Abstract: In this paper, we focus on the task of multi-view multi-source geo-localization, which serves as an important auxiliary method of GPS positioning by matching drone-view image and satellite-view image with pre-annotated GPS tag. To solve this problem, most existing methods adopt metric loss with an weighted classification block to force the generation of common feature space shared by different vie… ▽ More In this paper, we focus on the task of multi-view multi-source geo-localization, which serves as an important auxiliary method of GPS positioning by matching drone-view image and satellite-view image with pre-annotated GPS tag. To solve this problem, most existing methods adopt metric loss with an weighted classification block to force the generation of common feature space shared by different view points and view sources. However, these methods fail to pay sufficient attention to spatial information (especially viewpoint variances). To address this drawback, we propose an elegant orientation-based method to align the patterns and introduce a new branch to extract aligned partial feature. Moreover, we provide a style alignment strategy to reduce the variance in image style and enhance the feature unification. To demonstrate the performance of the proposed approach, we conduct extensive experiments on the large-scale benchmark dataset. The experimental results confirm the superiority of the proposed approach compared to state-of-the-art alternatives. △ Less

Submitted 8 July, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: 9 pages 9 figures. arXiv admin note: text overlap with arXiv:2002.12186 by other authors

ACM Class: I.4.7; I.2.10

arXiv:2006.02903 [pdf, other]

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions

Authors: Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen, Xin Wang

Abstract: Deep learning has made breakthroughs and substantial in many fields due to its powerful automatic representation capabilities. It has been proven that neural architecture design is crucial to the feature representation of data and the final performance. However, the design of the neural architecture heavily relies on the researchers' prior knowledge and experience. And due to the limitations of hu… ▽ More Deep learning has made breakthroughs and substantial in many fields due to its powerful automatic representation capabilities. It has been proven that neural architecture design is crucial to the feature representation of data and the final performance. However, the design of the neural architecture heavily relies on the researchers' prior knowledge and experience. And due to the limitations of human' inherent knowledge, it is difficult for people to jump out of their original thinking paradigm and design an optimal model. Therefore, an intuitive idea would be to reduce human intervention as much as possible and let the algorithm automatically design the neural architecture. Neural Architecture Search (NAS) is just such a revolutionary algorithm, and the related research work is complicated and rich. Therefore, a comprehensive and systematic survey on the NAS is essential. Previously related surveys have begun to classify existing work mainly based on the key components of NAS: search space, search strategy, and evaluation strategy. While this classification method is more intuitive, it is difficult for readers to grasp the challenges and the landmark work involved. Therefore, in this survey, we provide a new perspective: beginning with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then providing solutions for subsequent related research work. Besides, we conduct a detailed and comprehensive analysis, comparison, and summary of these works. Finally, we provide some possible future research directions. △ Less

Submitted 2 March, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: Accepted by ACM Computing Surveys 2021

arXiv:2005.11650 [pdf, other]

Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks

Authors: Zonghan Wu, Shirui Pan, Guodong Long, **g Jiang, Xiaojun Chang, Chengqi Zhang

Abstract: Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs… ▽ More Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs of variables. In recent years, meanwhile, graph neural networks (GNNs) have shown high capability in handling relational dependencies. GNNs require well-defined graph structures for information propagation which means they cannot be applied directly for multivariate time series where the dependencies are not known in advance. In this paper, we propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module, into which external knowledge like variable attributes can be easily integrated. A novel mix-hop propagation layer and a dilated inception layer are further proposed to capture the spatial and temporal dependencies within the time series. The graph learning, graph convolution, and temporal convolution modules are jointly learned in an end-to-end framework. Experimental results show that our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information. △ Less

Submitted 24 May, 2020; originally announced May 2020.

Comments: Accepted by KDD 2020

arXiv:2004.12164 [pdf, other]

Randomized spectral co-clustering for large-scale directed networks

Authors: Xiao Guo, Yixuan Qiu, Hai Zhang, Xiangyu Chang

Abstract: Directed networks are broadly used to represent asymmetric relationships among units. Co-clustering aims to cluster the senders and receivers of directed networks simultaneously. In particular, the well-known spectral clustering algorithm could be modified as the spectral co-clustering to co-cluster directed networks. However, large-scale networks pose great computational challenges to it. In this… ▽ More Directed networks are broadly used to represent asymmetric relationships among units. Co-clustering aims to cluster the senders and receivers of directed networks simultaneously. In particular, the well-known spectral clustering algorithm could be modified as the spectral co-clustering to co-cluster directed networks. However, large-scale networks pose great computational challenges to it. In this paper, we leverage sketching techniques and derive two randomized spectral co-clustering algorithms, one \emph{random-projection-based} and the other \emph{random-sampling-based}, to accelerate the co-clustering of large-scale directed networks. We theoretically analyze the resulting algorithms under two generative models -- the stochastic co-block model and the degree-corrected stochastic co-block model, and establish their approximation error rates and misclustering error rates, indicating better bounds than the state-of-the-art results of co-clustering literature. Numerically, we design and conduct simulations to support our theoretical results and test the efficiency of the algorithms on real networks with up to millions of nodes. A publicly available R package \textsf{RandClust} is developed for better usability and reproducibility of the proposed methods. △ Less

Submitted 9 April, 2022; v1 submitted 25 April, 2020; originally announced April 2020.

arXiv:2004.10956 [pdf, other]

Few-Shot Class-Incremental Learning

Authors: Xiaoyu Tao, Xiaopeng Hong, Xinyuan Chang, Songlin Dong, Xing Wei, Yihong Gong

Abstract: The ability to incrementally learn new classes is crucial to the development of real-world artificial intelligence systems. In this paper, we focus on a challenging but practical few-shot class-incremental learning (FSCIL) problem. FSCIL requires CNN models to incrementally learn new classes from very few labelled samples, without forgetting the previously learned ones. To address this problem, we… ▽ More The ability to incrementally learn new classes is crucial to the development of real-world artificial intelligence systems. In this paper, we focus on a challenging but practical few-shot class-incremental learning (FSCIL) problem. FSCIL requires CNN models to incrementally learn new classes from very few labelled samples, without forgetting the previously learned ones. To address this problem, we represent the knowledge using a neural gas (NG) network, which can learn and preserve the topology of the feature manifold formed by different classes. On this basis, we propose the TOpology-Preserving knowledge InCrementer (TOPIC) framework. TOPIC mitigates the forgetting of the old classes by stabilizing NG's topology and improves the representation learning for few-shot new classes by growing and adapting NG to new training samples. Comprehensive experimental results demonstrate that our proposed method significantly outperforms other state-of-the-art class-incremental learning methods on CIFAR100, miniImageNet, and CUB200 datasets. △ Less

Submitted 23 April, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: Accepted by CVPR 2020 (oral)

arXiv:2003.07017 [pdf, ps, other]

Uncertainty Quantification for Demand Prediction in Contextual Dynamic Pricing

Authors: Yining Wang, Xi Chen, Xiangyu Chang, Dongdong Ge

Abstract: Data-driven sequential decision has found a wide range of applications in modern operations management, such as dynamic pricing, inventory control, and assortment optimization. Most existing research on data-driven sequential decision focuses on designing an online policy to maximize the revenue. However, the research on uncertainty quantification on the underlying true model function (e.g., deman… ▽ More Data-driven sequential decision has found a wide range of applications in modern operations management, such as dynamic pricing, inventory control, and assortment optimization. Most existing research on data-driven sequential decision focuses on designing an online policy to maximize the revenue. However, the research on uncertainty quantification on the underlying true model function (e.g., demand function), a critical problem for practitioners, has not been well explored. In this paper, using the problem of demand function prediction in dynamic pricing as the motivating example, we study the problem of constructing accurate confidence intervals for the demand function. The main challenge is that sequentially collected data leads to significant distributional bias in the maximum likelihood estimator or the empirical risk minimization estimate, making classical statistics approaches such as the Wald's test no longer valid. We address this challenge by develo** a debiased approach and provide the asymptotic normality guarantee of the debiased estimator. Based this the debiased estimator, we provide both point-wise and uniform confidence intervals of the demand function. △ Less

Submitted 31 August, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

arXiv:2003.03691 [pdf, other]

Angle-Based Cost-Sensitive Multicategory Classification

Authors: Yi Yang, Yuxuan Guo, Xiangyu Chang

Abstract: Many real-world classification problems come with costs which can vary for different types of misclassification. It is thus important to develop cost-sensitive classifiers which minimize the total misclassification cost. Although binary cost-sensitive classifiers have been well-studied, solving multicategory classification problems is still challenging. A popular approach to address this issue is… ▽ More Many real-world classification problems come with costs which can vary for different types of misclassification. It is thus important to develop cost-sensitive classifiers which minimize the total misclassification cost. Although binary cost-sensitive classifiers have been well-studied, solving multicategory classification problems is still challenging. A popular approach to address this issue is to construct K classification functions for a K-class problem and remove the redundancy by imposing a sum-to-zero constraint. However, such method usually results in higher computational complexity and inefficient algorithms. In this paper, we propose a novel angle-based cost-sensitive classification framework for multicategory classification without the sum-to-zero constraint. Loss functions that included in the angle-based cost-sensitive classification framework are further justified to be Fisher consistent. To show the usefulness of the framework, two cost-sensitive multicategory boosting algorithms are derived as concrete instances. Numerical experiments demonstrate that proposed boosting algorithms yield competitive classification performances against other existing boosting approaches. △ Less

Submitted 7 March, 2020; originally announced March 2020.

arXiv:2002.00839 [pdf, other]

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

Authors: Hai Zhang, Xiao Guo, Xiangyu Chang

Abstract: Spectral clustering has been one of the widely used methods for community detection in networks. However, large-scale networks bring computational challenges to the eigenvalue decomposition therein. In this paper, we study the spectral clustering using randomized sketching algorithms from a statistical perspective, where we typically assume the network data are generated from a stochastic block mo… ▽ More Spectral clustering has been one of the widely used methods for community detection in networks. However, large-scale networks bring computational challenges to the eigenvalue decomposition therein. In this paper, we study the spectral clustering using randomized sketching algorithms from a statistical perspective, where we typically assume the network data are generated from a stochastic block model that is not necessarily of full rank. To do this, we first use the recently developed sketching algorithms to obtain two randomized spectral clustering algorithms, namely, the random projection-based and the random sampling-based spectral clustering. Then we study the theoretical bounds of the resulting algorithms in terms of the approximation error for the population adjacency matrix, the misclassification error, and the estimation error for the link probability matrix. It turns out that, under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm. We also extend the results to degree-corrected stochastic block models. Numerical experiments support our theoretical findings and show the efficiency of randomized methods. A new R package called Rclust is developed and made available to the public. △ Less

Submitted 6 January, 2022; v1 submitted 19 January, 2020; originally announced February 2020.

arXiv:2001.02879

Adaptive Stop** Rule for Kernel-based Gradient Descent Algorithms

Authors: Xiangyu Chang, Shao-Bo Lin

Abstract: In this paper, we propose an adaptive stop** rule for kernel-based gradient descent (KGD) algorithms. We introduce the empirical effective dimension to quantify the increments of iterations in KGD and derive an implementable early stop** strategy. We analyze the performance of the adaptive stop** rule in the framework of learning theory. Using the recently developed integral operator approac… ▽ More In this paper, we propose an adaptive stop** rule for kernel-based gradient descent (KGD) algorithms. We introduce the empirical effective dimension to quantify the increments of iterations in KGD and derive an implementable early stop** strategy. We analyze the performance of the adaptive stop** rule in the framework of learning theory. Using the recently developed integral operator approach, we rigorously prove the optimality of the adaptive stop** rule in terms of showing the optimal learning rates for KGD equipped with this rule. Furthermore, a sharp bound on the number of iterations in KGD equipped with the proposed early stop** rule is also given to demonstrate its computational advantage. △ Less

Submitted 13 June, 2023; v1 submitted 9 January, 2020; originally announced January 2020.

Comments: There is a critical wrong in the proof

arXiv:1906.09205 [pdf, other]

Continual Reinforcement Learning with Diversity Exploration and Adversarial Self-Correction

Authors: Fengda Zhu, Xiaojun Chang, Runhao Zeng, Mingkui Tan

Abstract: Deep reinforcement learning has made significant progress in the field of continuous control, such as physical control and autonomous driving. However, it is challenging for a reinforcement model to learn a policy for each task sequentially due to catastrophic forgetting. Specifically, the model would forget knowledge it learned in the past when trained on a new task. We consider this challenge fr… ▽ More Deep reinforcement learning has made significant progress in the field of continuous control, such as physical control and autonomous driving. However, it is challenging for a reinforcement model to learn a policy for each task sequentially due to catastrophic forgetting. Specifically, the model would forget knowledge it learned in the past when trained on a new task. We consider this challenge from two perspectives: i) acquiring task-specific skills is difficult since task information and rewards are not highly related; ii) learning knowledge from previous experience is difficult in continuous control domains. In this paper, we introduce an end-to-end framework namely Continual Diversity Adversarial Network (CDAN). We first develop an unsupervised diversity exploration method to learn task-specific skills using an unsupervised objective. Then, we propose an adversarial self-correction mechanism to learn knowledge by exploiting past experience. The two learning procedures are presumably reciprocal. To evaluate the proposed method, we propose a new continuous reinforcement learning environment named Continual Ant Maze (CAM) and a new metric termed Normalized Shorten Distance (NSD). The experimental results confirm the effectiveness of diversity exploration and self-correction. It is worthwhile noting that our final result outperforms baseline by 18.35% in terms of NSD, and 0.61 according to the average reward. △ Less

Submitted 21 June, 2019; originally announced June 2019.

arXiv:1906.02174 [pdf, other]

Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks

Authors: Sitao Luan, Mingde Zhao, Xiao-Wen Chang, Doina Precup

Abstract: Recently, neural network based approaches have achieved significant improvement for solving large, complex, graph-structured problems. However, their bottlenecks still need to be addressed, and the advantages of multi-scale information and deep architectures have not been sufficiently exploited. In this paper, we theoretically analyze how existing Graph Convolutional Networks (GCNs) have limited e… ▽ More Recently, neural network based approaches have achieved significant improvement for solving large, complex, graph-structured problems. However, their bottlenecks still need to be addressed, and the advantages of multi-scale information and deep architectures have not been sufficiently exploited. In this paper, we theoretically analyze how existing Graph Convolutional Networks (GCNs) have limited expressive power due to the constraint of the activation functions and their architectures. We generalize spectral graph convolution and deep GCN in block Krylov subspace forms and devise two architectures, both with the potential to be scaled deeper but each making use of the multi-scale information in different ways. We further show that the equivalence of these two architectures can be established under certain conditions. On several node classification tasks, with or without the help of validation, the two new architectures achieve better performance compared to many state-of-the-art methods. △ Less

Submitted 8 September, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

Comments: Accepted and to be published by NeurIPS 2019

arXiv:1702.08701 [pdf, ps, other]

Learning rates for classification with Gaussian kernels

Authors: Shao-Bo Lin, **shan Zeng, Xiangyu Chang

Abstract: This paper aims at refined error analysis for binary classification using support vector machine (SVM) with Gaussian kernel and convex loss. Our first result shows that for some loss functions such as the truncated quadratic loss and quadratic loss, SVM with Gaussian kernel can reach the almost optimal learning rate, provided the regression function is smooth. Our second result shows that, for a l… ▽ More This paper aims at refined error analysis for binary classification using support vector machine (SVM) with Gaussian kernel and convex loss. Our first result shows that for some loss functions such as the truncated quadratic loss and quadratic loss, SVM with Gaussian kernel can reach the almost optimal learning rate, provided the regression function is smooth. Our second result shows that, for a large number of loss functions, under some Tsybakov noise assumption, if the regression function is infinitely smooth, then SVM with Gaussian kernel can achieve the learning rate of order $m^{-1}$, where $m$ is the number of samples. △ Less

Submitted 5 October, 2017; v1 submitted 28 February, 2017; originally announced February 2017.

Comments: This paper has been accepted by Neural Computation

arXiv:1702.01229 [pdf, other]

Simple to Complex Cross-modal Learning to Rank

Authors: Minnan Luo, Xiaojun Chang, Zhihui Li, Liqiang Nie, Alexander G. Hauptmann, Qinghua Zheng

Abstract: The heterogeneity-gap between different modalities brings a significant challenge to multimedia information retrieval. Some studies formalize the cross-modal retrieval tasks as a ranking problem and learn a shared multi-modal embedding space to measure the cross-modality similarity. However, previous methods often establish the shared embedding space based on linear map** functions which might n… ▽ More The heterogeneity-gap between different modalities brings a significant challenge to multimedia information retrieval. Some studies formalize the cross-modal retrieval tasks as a ranking problem and learn a shared multi-modal embedding space to measure the cross-modality similarity. However, previous methods often establish the shared embedding space based on linear map** functions which might not be sophisticated enough to reveal more complicated inter-modal correspondences. Additionally, current studies assume that the rankings are of equal importance, and thus all rankings are used simultaneously, or a small number of rankings are selected randomly to train the embedding space at each iteration. Such strategies, however, always suffer from outliers as well as reduced generalization capability due to their lack of insightful understanding of procedure of human cognition. In this paper, we involve the self-paced learning theory with diversity into the cross-modal learning to rank and learn an optimal multi-modal embedding space based on non-linear map** functions. This strategy enhances the model's robustness to outliers and achieves better generalization via training the model gradually from easy rankings by diverse queries to more complex ones. An efficient alternative algorithm is exploited to solve the proposed challenging problem with fast convergence in practice. Extensive experimental results on several benchmark datasets indicate that the proposed method achieves significant improvements over the state-of-the-arts in this literature. △ Less

Submitted 7 July, 2017; v1 submitted 3 February, 2017; originally announced February 2017.

Comments: 14 pages; Accepted by Computer Vision and Image Understanding

arXiv:1405.0212 [pdf, other]

doi 10.1109/TVT.2014.2339734

Mobile Localization in Non-Line-of-Sight Using Constrained Square-Root Unscented Kalman Filter

Authors: Siamak Yousefi, Xiao-Wen Chang, Benoit Champagne

Abstract: Localization and tracking of a mobile node (MN) in non-line-of-sight (NLOS) scenarios, based on time of arrival (TOA) measurements, is considered in this work. To this end, we develop a constrained form of square root unscented Kalman filter (SRUKF), where the sigma points of the unscented transformation are projected onto the feasible region by solving constrained optimization problems. The feasi… ▽ More Localization and tracking of a mobile node (MN) in non-line-of-sight (NLOS) scenarios, based on time of arrival (TOA) measurements, is considered in this work. To this end, we develop a constrained form of square root unscented Kalman filter (SRUKF), where the sigma points of the unscented transformation are projected onto the feasible region by solving constrained optimization problems. The feasible region is the intersection of several discs formed by the NLOS measurements. We show how we can reduce the size of the optimization problem and formulate it as a convex quadratically constrained quadratic program (QCQP), which depends on the Cholesky factor of the \textit{a posteriori} error covariance matrix of SRUKF. As a result of these modifications, the proposed constrained SRUKF (CSRUKF) is more efficient and has better numerical stability compared to the constrained UKF. Through simulations, we also show that the CSRUKF achieves a smaller localization error compared to other techniques and that its performance is robust under different NLOS conditions. △ Less

Submitted 1 May, 2014; originally announced May 2014.

Comments: Under review by IEEE Trans. on Vehicular Technology

arXiv:1403.7890 [pdf, other]

Sparse K-Means with $\ell_{\infty}/\ell_0$ Penalty for High-Dimensional Data Clustering

Authors: Xiangyu Chang, Yu Wang, Rongjian Li, Zongben Xu

Abstract: Sparse clustering, which aims to find a proper partition of an extremely high-dimensional data set with redundant noise features, has been attracted more and more interests in recent years. The existing studies commonly solve the problem in a framework of maximizing the weighted feature contributions subject to a $\ell_2/\ell_1$ penalty. Nevertheless, this framework has two serious drawbacks: One… ▽ More Sparse clustering, which aims to find a proper partition of an extremely high-dimensional data set with redundant noise features, has been attracted more and more interests in recent years. The existing studies commonly solve the problem in a framework of maximizing the weighted feature contributions subject to a $\ell_2/\ell_1$ penalty. Nevertheless, this framework has two serious drawbacks: One is that the solution of the framework unavoidably involves a considerable portion of redundant noise features in many situations, and the other is that the framework neither offers intuitive explanations on why this framework can select relevant features nor leads to any theoretical guarantee for feature selection consistency. In this article, we attempt to overcome those drawbacks through develo** a new sparse clustering framework which uses a $\ell_{\infty}/\ell_0$ penalty. First, we introduce new concepts on optimal partitions and noise features for the high-dimensional data clustering problems, based on which the previously known framework can be intuitively explained in principle. Then, we apply the suggested $\ell_{\infty}/\ell_0$ framework to formulate a new sparse k-means model with the $\ell_{\infty}/\ell_0$ penalty ($\ell_0$-k-means for short). We propose an efficient iterative algorithm for solving the $\ell_0$-k-means. To deeply understand the behavior of $\ell_0$-k-means, we prove that the solution yielded by the $\ell_0$-k-means algorithm has feature selection consistency whenever the data matrix is generated from a high-dimensional Gaussian mixture model. Finally, we provide experiments with both synthetic data and the Allen Develo** Mouse Brain Atlas data to support that the proposed $\ell_0$-k-means exhibits better noise feature detection capacity over the previously known sparse k-means with the $\ell_2/\ell_1$ penalty ($\ell_1$-k-means for short). △ Less

Submitted 31 March, 2014; originally announced March 2014.

Comments: 36 pages, 4 figures, Present the paper at ICSA 2013

Report number: SS-2015-0261

Journal ref: Statistica Sinica 28 (2018)1265-1284

Showing 1–36 of 36 results for author: Chang, X