Search | arXiv e-print repository

On the Limitation of Kernel Dependence Maximization for Feature Selection

Abstract: A simple and intuitive method for feature selection consists of choosing the feature subset that maximizes a nonparametric measure of dependence between the response and the features. A popular proposal from the literature uses the Hilbert-Schmidt Independence Criterion (HSIC) as the nonparametric dependence measure. The rationale behind this approach to feature selection is that important feature… ▽ More A simple and intuitive method for feature selection consists of choosing the feature subset that maximizes a nonparametric measure of dependence between the response and the features. A popular proposal from the literature uses the Hilbert-Schmidt Independence Criterion (HSIC) as the nonparametric dependence measure. The rationale behind this approach to feature selection is that important features will exhibit a high dependence with the response and their inclusion in the set of selected features will increase the HSIC. Through counterexamples, we demonstrate that this rationale is flawed and that feature selection via HSIC maximization can miss critical features. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2404.19145 [pdf, other]

Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty

Authors: Kaizhao Liu, Jose Blanchet, Lexing Ying, Yi** Lu

Abstract: Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result kno… ▽ More Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result known as Infinitesimal Jackknife and the \textit{orthogonal part} which is easier to be simulated. We theoretically and numerically show that Orthogonal Bootstrap significantly reduces the computational cost of Bootstrap while improving empirical accuracy and maintaining the same width of the constructed interval. △ Less

Submitted 30 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.15207 [pdf, other]

doi 10.1063/5.0195232

Simulation-Free Determination of Microstructure Representative Volume Element Size via Fisher Scores

Authors: Wei Liu, Satyajit Mojumder, Wing Kam Liu, Wei Chen, Daniel W. Apley

Abstract: A representative volume element (RVE) is a reasonably small unit of microstructure that can be simulated to obtain the same effective properties as the entire microstructure sample. Finite element (FE) simulation of RVEs, as opposed to much larger samples, saves computational expense, especially in multiscale modeling. Therefore, it is desirable to have a framework that determines RVE size prior t… ▽ More A representative volume element (RVE) is a reasonably small unit of microstructure that can be simulated to obtain the same effective properties as the entire microstructure sample. Finite element (FE) simulation of RVEs, as opposed to much larger samples, saves computational expense, especially in multiscale modeling. Therefore, it is desirable to have a framework that determines RVE size prior to FE simulations. Existing methods select the RVE size based on when the FE-simulated properties of samples of increasing size converge with insignificant statistical variations, with the drawback that many samples must be simulated. We propose a simulation-free alternative that determines RVE size based only on a micrograph. The approach utilizes a machine learning model trained to implicitly characterize the stochastic nature of the input micrograph. The underlying rationale is to view RVE size as the smallest moving window size for which the stochastic nature of the microstructure within the window is stationary as the window moves across a large micrograph. For this purpose, we adapt a recently developed Fisher score-based framework for microstructure nonstationarity monitoring. Because the resulting RVE size is based solely on the micrograph and does not involve any FE simulation of specific properties, it constitutes an RVE for any property of interest that solely depends on the microstructure characteristics. Through numerical experiments of simple and complex microstructures, we validate our approach and show that our selected RVE sizes are consistent with when the chosen FE-simulated properties converge. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Journal ref: APL Mach. Learn. 2(2): 026101 (2024)

arXiv:2404.10004 [pdf]

A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios

Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang

Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in… ▽ More Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in this method: (1) The similarity evaluation indicators are determined from three dimensions, i.e., the Basis of National Epidemic Prevention & Control, Social Resilience, and Infection Situation. (2) The data related to the indicators are collected and preprocessed. (3) The first round of screening on the preprocessed dataset is conducted through an improved collaborative filtering algorithm to calculate the preliminary similarity result from the perspective of the infection situation. (4) Finally, the K-Means model is used for the second round of screening to obtain the final similarity values. The approach will be applied to decision-making support in the context of COVID-19. Our results demonstrate that the recommendations generated by the STDSA model are more accurate and aligned better with the actual situation than those produced by pure K-means models. This study will provide new insights into preventing and controlling epidemics in regions that lack experience. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 20 pages, 9 figures

arXiv:2404.00220 [pdf, other]

Partially-Observable Sequential Change-Point Detection for Autocorrelated Data via Upper Confidence Region

Authors: Haijie Xu, Xiaochen Xian, Chen Zhang, Kaibo Liu

Abstract: Sequential change point detection for multivariate autocorrelated data is a very common problem in practice. However, when the sensing resources are limited, only a subset of variables from the multivariate system can be observed at each sensing time point. This raises the problem of partially observable multi-sensor sequential change point detection. For it, we propose a detection scheme called a… ▽ More Sequential change point detection for multivariate autocorrelated data is a very common problem in practice. However, when the sensing resources are limited, only a subset of variables from the multivariate system can be observed at each sensing time point. This raises the problem of partially observable multi-sensor sequential change point detection. For it, we propose a detection scheme called adaptive upper confidence region with state space model (AUCRSS). It models multivariate time series via a state space model (SSM), and uses an adaptive sampling policy for efficient change point detection and localization. A partially-observable Kalman filter algorithm is developed for online inference of SSM, and accordingly, a change point detection scheme based on a generalized likelihood ratio test is developed. How its detection power relates to the adaptive sampling strategy is analyzed. Meanwhile, by treating the detection power as a reward, its connection with the online combinatorial multi-armed bandit (CMAB) problem is formulated and an adaptive upper confidence region algorithm is proposed for adaptive sampling policy design. Theoretical analysis of the asymptotic average detection delay is performed, and thorough numerical studies with synthetic data and real-world data are conducted to demonstrate the effectiveness of our method. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.07185 [pdf, other]

Uncertainty in Graph Neural Networks: A Survey

Authors: Fangxin Wang, Yuqing Liu, Kay Liu, Yibo Wang, Sourav Medya, Philip S. Yu

Abstract: Graph Neural Networks (GNNs) have been extensively used in various real-world applications. However, the predictive uncertainty of GNNs stemming from diverse sources such as inherent randomness in data and model training errors can lead to unstable and erroneous predictions. Therefore, identifying, quantifying, and utilizing uncertainty are essential to enhance the performance of the model for the… ▽ More Graph Neural Networks (GNNs) have been extensively used in various real-world applications. However, the predictive uncertainty of GNNs stemming from diverse sources such as inherent randomness in data and model training errors can lead to unstable and erroneous predictions. Therefore, identifying, quantifying, and utilizing uncertainty are essential to enhance the performance of the model for the downstream tasks as well as the reliability of the GNN predictions. This survey aims to provide a comprehensive overview of the GNNs from the perspective of uncertainty with an emphasis on its integration in graph learning. We compare and summarize existing graph uncertainty theory and methods, alongside the corresponding downstream tasks. Thereby, we bridge the gap between theory and practice, meanwhile connecting different GNN communities. Moreover, our work provides valuable insights into promising directions in this field. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 13 main pages, 3 figures, 1 table. Under review

arXiv:2402.10062 [pdf, other]

Optimal Parameter and Neuron Pruning for Out-of-Distribution Detection

Authors: Chao Chen, Zhihang Fu, Kai Liu, Ze Chen, Mingyuan Tao, Jie** Ye

Abstract: For a machine learning model deployed in real world scenarios, the ability of detecting out-of-distribution (OOD) samples is indispensable and challenging. Most existing OOD detection methods focused on exploring advanced training skills or training-free tricks to prevent the model from yielding overconfident confidence score for unknown samples. The training-based methods require expensive traini… ▽ More For a machine learning model deployed in real world scenarios, the ability of detecting out-of-distribution (OOD) samples is indispensable and challenging. Most existing OOD detection methods focused on exploring advanced training skills or training-free tricks to prevent the model from yielding overconfident confidence score for unknown samples. The training-based methods require expensive training cost and rely on OOD samples which are not always available, while most training-free methods can not efficiently utilize the prior information from the training data. In this work, we propose an \textbf{O}ptimal \textbf{P}arameter and \textbf{N}euron \textbf{P}runing (\textbf{OPNP}) approach, which aims to identify and remove those parameters and neurons that lead to over-fitting. The main method is divided into two steps. In the first step, we evaluate the sensitivity of the model parameters and neurons by averaging gradients over all training samples. In the second step, the parameters and neurons with exceptionally large or close to zero sensitivities are removed for prediction. Our proposal is training-free, compatible with other post-hoc methods, and exploring the information from all training data. Extensive experiments are performed on multiple OOD detection tasks and model architectures, showing that our proposed OPNP consistently outperforms the existing methods by a large margin. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted by NeurIPS 2023. 19 pages

Journal ref: NeurIPS 2023

arXiv:2402.05395 [pdf, other]

Efficient Estimation for Functional Accelerated Failure Time Model

Authors: Changyu Liu, Wen Su, Kin-Yat Liu, Guosheng Yin, Xingqiu Zhao

Abstract: We propose a functional accelerated failure time model to characterize effects of both functional and scalar covariates on the time to event of interest, and provide regularity conditions to guarantee model identifiability. For efficient estimation of model parameters, we develop a sieve maximum likelihood approach where parametric and nonparametric coefficients are bundled with an unknown baselin… ▽ More We propose a functional accelerated failure time model to characterize effects of both functional and scalar covariates on the time to event of interest, and provide regularity conditions to guarantee model identifiability. For efficient estimation of model parameters, we develop a sieve maximum likelihood approach where parametric and nonparametric coefficients are bundled with an unknown baseline hazard function in the likelihood function. Not only do the bundled parameters cause immense numerical difficulties, but they also result in new challenges in theoretical development. By develo** a general theoretical framework, we overcome the challenges arising from the bundled parameters and derive the convergence rate of the proposed estimator. Furthermore, we prove that the finite-dimensional estimator is $\sqrt{n}$-consistent, asymptotically normal and achieves the semiparametric information bound. The proposed inference procedures are evaluated by extensive simulation studies and illustrated with an application to the sequential organ failure assessment data from the Improving Care of Acute Lung Injury Patients study. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2312.00359 [pdf, other]

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

Authors: Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W. Mahoney, Yaoqing Yang

Abstract: Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely ad… ▽ More Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing. We implement TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using ResNets, VGGs, and WideResNets with various depths and widths. Our results show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023 Spotlight, first two authors contributed equally

arXiv:2311.15221 [pdf, other]

The Local Landscape of Phase Retrieval Under Limited Samples

Authors: Kaizhao Liu, Zihao Wang, Lei Wu

Abstract: In this paper, we provide a fine-grained analysis of the local landscape of phase retrieval under the regime with limited samples. Our aim is to ascertain the minimal sample size necessary to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish th… ▽ More In this paper, we provide a fine-grained analysis of the local landscape of phase retrieval under the regime with limited samples. Our aim is to ascertain the minimal sample size necessary to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish that when $n=o(d\log d)$, for almost every fixed point in the local ball, the Hessian matrix must have negative eigenvalues as long as $d$ is sufficiently large. Consequently, the local landscape is highly non-convex. We next consider the one-point strong convexity and show that as long as $n=ω(d)$, with high probability, the landscape is one-point strongly convex in the local annulus: $\{w\in\mathbb{R}^d: o_d(1)\leqslant \|w-w^*\|\leqslant c\}$, where $w^*$ is the ground truth and $c$ is an absolute constant. This implies that gradient descent initialized from any point in this domain can converge to an $o_d(1)$-loss solution exponentially fast. Furthermore, we show that when $n=o(d\log d)$, there is a radius of $\widetildeΘ\left(\sqrt{1/d}\right)$ such that one-point convexity breaks in the corresponding smaller local ball. This indicates an impossibility to establish a convergence to exact $w^*$ for gradient descent under limited samples by relying solely on one-point convexity. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 41 pages

arXiv:2310.11736 [pdf, ps, other]

Kernel Learning in Ridge Regression "Automatically" Yields Exact Low Rank Solution

Authors: Yunlu Chen, Yang Li, Keli Liu, Feng Ruan

Abstract: We consider kernels of the form $(x,x') \mapsto φ(\|x-x'\|^2_Σ)$ parametrized by $Σ$. For such kernels, we study a variant of the kernel ridge regression problem which simultaneously optimizes the prediction function and the parameter $Σ$ of the reproducing kernel Hilbert space. The eigenspace of the $Σ$ learned from this kernel ridge regression problem can inform us which directions in covariate… ▽ More We consider kernels of the form $(x,x') \mapsto φ(\|x-x'\|^2_Σ)$ parametrized by $Σ$. For such kernels, we study a variant of the kernel ridge regression problem which simultaneously optimizes the prediction function and the parameter $Σ$ of the reproducing kernel Hilbert space. The eigenspace of the $Σ$ learned from this kernel ridge regression problem can inform us which directions in covariate space are important for prediction. Assuming that the covariates have nonzero explanatory power for the response only through a low dimensional subspace (central mean subspace), we find that the global minimizer of the finite sample kernel learning objective is also low rank with high probability. More precisely, the rank of the minimizing $Σ$ is with high probability bounded by the dimension of the central mean subspace. This phenomenon is interesting because the low rankness property is achieved without using any explicit regularization of $Σ$, e.g., nuclear norm penalization. Our theory makes correspondence between the observed phenomenon and the notion of low rank set identifiability from the optimization literature. The low rankness property of the finite sample solutions exists because the population kernel learning objective grows "sharply" when moving away from its minimizers in any direction perpendicular to the central mean subspace. △ Less

Submitted 27 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: Add code links and correct a figure

arXiv:2309.05925 [pdf, other]

On Regularized Sparse Logistic Regression

Authors: Mengyuan Zhang, Kai Liu

Abstract: Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be na… ▽ More Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be naturally extended to nonconvex regularization term, as long as certain requirement is satisfied. In addition, we also utilize a different line search criteria to guarantee monotone convergence for various regularization terms. Empirical experiments on binary classification tasks with real-world datasets demonstrate our proposed algorithms are capable of performing classification and feature selection effectively at a lower computational cost. △ Less

Submitted 11 October, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted to ICDM2023

arXiv:2307.03034 [pdf, ps, other]

PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models

Authors: Keqin Liu, Chengzhong Zhang

Abstract: In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a coun… ▽ More In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Niño-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance. △ Less

Submitted 3 July, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2305.01034 [pdf, other]

Model-agnostic Measure of Generalization Difficulty

Authors: Akhilan Boopathy, Kevin Liu, Jaedong Hwang, Shu Ge, Asaad Mohammedsaleh, Ila Fiete

Abstract: The measure of a machine learning algorithm is the difficulty of the tasks it can perform, and sufficiently difficult tasks are critical drivers of strong machine learning models. However, quantifying the generalization difficulty of machine learning benchmarks has remained challenging. We propose what is to our knowledge the first model-agnostic measure of the inherent generalization difficulty o… ▽ More The measure of a machine learning algorithm is the difficulty of the tasks it can perform, and sufficiently difficult tasks are critical drivers of strong machine learning models. However, quantifying the generalization difficulty of machine learning benchmarks has remained challenging. We propose what is to our knowledge the first model-agnostic measure of the inherent generalization difficulty of tasks. Our inductive bias complexity measure quantifies the total information required to generalize well on a task minus the information provided by the data. It does so by measuring the fractional volume occupied by hypotheses that generalize on a task given that they fit the training data. It scales exponentially with the intrinsic dimensionality of the space over which the model must generalize but only polynomially in resolution per dimension, showing that tasks which require generalizing over many dimensions are drastically more difficult than tasks involving more detail in fewer dimensions. Our measure can be applied to compute and compare supervised learning, reinforcement learning and meta-learning generalization difficulties against each other. We show that applied empirically, it formally quantifies intuitively expected trends, e.g. that in terms of required inductive bias, MNIST < CIFAR10 < Imagenet and fully observable Markov decision processes (MDPs) < partially observable MDPs. Further, we show that classification of complex images < few-shot meta-learning with simple images. Our measure provides a quantitative metric to guide the construction of more complex tasks requiring greater inductive bias, and thereby encourages the development of more sophisticated architectures and learning algorithms with more powerful generalization capabilities. △ Less

Submitted 2 June, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: Published at ICML 2023, 28 pages, 6 figures

arXiv:2303.00288

The Race of mRNA therapy: Evidence from Patent Landscape

Authors: Jianxiong Ren, Xiaoming Zhang, Xingyong Si, Xiangjun Kong, **yu Cong, **** Wang, Xiang Li, Qianru Zhang, Peifen Yao, Mengyao Li, Yuanqi Cai, Zhaocai Sun, Kunmeng Liu, Benzheng Wei

Abstract: mRNA therapy is gaining worldwide attention as an emerging therapeutic approach. The widespread use of mRNA vaccines during the COVID-19 outbreak has demonstrated the potential of mRNA therapy. As mRNA-based drugs have expanded and their indications have broadened, more patents for mRNA innovations have emerged. The global patent landscape for mRNA therapy has not yet been analyzed, indicating a r… ▽ More mRNA therapy is gaining worldwide attention as an emerging therapeutic approach. The widespread use of mRNA vaccines during the COVID-19 outbreak has demonstrated the potential of mRNA therapy. As mRNA-based drugs have expanded and their indications have broadened, more patents for mRNA innovations have emerged. The global patent landscape for mRNA therapy has not yet been analyzed, indicating a research gap in need of filling, from new technology to productization. This study uses social network analysis with the patent quality assessment to investigate the temporal trends, citation relationship, and significant litigation for 16,101 mRNA therapy patents and summarizes the hot topics and potential future directions for this industry. The information obtained in this study not only may be utilized as a tool of knowledge for researchers in a comprehensive and integrated way but can also provide inspiration for efficient production methods for mRNA drugs. This study shows that infectious diseases and cancer are currently the primary applications for mRNA drugs. Emerging patent activity and lawsuits in this field are demonstrating that delivery technology remains one of the key challenges in the field and that drug-targeting research in combination with vector technology will be one of the major directions for the industry going forward. With significant funding, new organizations have developed novel delivery technologies in an attempt to break into the patent thicket established by companies such as Arbutus. The global mRNA therapeutic landscape is undergoing a multifaceted development pattern, and the monopoly of giant companies is being challenged. △ Less

Submitted 15 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: I have received requests from co-authors and funding agencies to withdraw the manuscript

arXiv:2212.03515 [pdf, other]

FPGA Implementation of Multi-Layer Machine Learning Equalizer with On-Chip Training

Authors: Keren Liu, Erik Börjeson, Christian Häger, Per Larsson-Edefors

Abstract: We design and implement an adaptive machine learning equalizer that alternates multiple linear and nonlinear computational layers on an FPGA. On-chip training via gradient backpropagation is shown to allow for real-time adaptation to time-varying channel impairments. We design and implement an adaptive machine learning equalizer that alternates multiple linear and nonlinear computational layers on an FPGA. On-chip training via gradient backpropagation is shown to allow for real-time adaptation to time-varying channel impairments. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: To be presented at the 2023 Optical Fiber Communication Conference (OFC)

arXiv:2210.02192 [pdf, other]

Are All Losses Created Equal: A Neural Collapse Perspective

Authors: **xin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, Zhihui Zhu

Abstract: While cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so o… ▽ More While cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so on. This paper studies the choice of loss function by examining the last-layer features of deep networks, drawing inspiration from a recent line work showing that the global optimal solution of CE and mean-square-error (MSE) losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large networks trained until convergence, (i) all features of the same class collapse to the corresponding class mean and (ii) the means associated with different classes are in a configuration where their pairwise distances are all equal and maximized. We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence, all relevant losses(i.e., CE, LS, FL, MSE) produce equivalent features on training data. Based on the unconstrained feature model assumption, we provide either the global landscape analysis for LS loss or the local landscape analysis for FL loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions either in the global scope for LS loss or in the local scope for FL loss near the optimal solution. The experiments further show that Neural Collapse features obtained from all relevant losses lead to largely identical performance on test data as well, provided that the network is sufficiently large and trained until convergence. △ Less

Submitted 8 October, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: 32 page, 10 figures, NeurIPS 2022

arXiv:2209.05356 [pdf, ps, other]

The E-Bayesian Estimation and its E-MSE of Lomax distribution under different loss functions

Authors: Kaiwei Liu, Yuxuan Zhang

Abstract: This paper studies the E-Bayesian (expectation of the Bayesian estimation) estimation of the parameter of Lomax distribution based on different loss functions. Under different loss functions, we calculate the Bayesian estimation of the parameter and then calculate the expectation of the estimated value to get the E-Bayesian estimation. To measure the estimated error, the E-MSE (expected mean squar… ▽ More This paper studies the E-Bayesian (expectation of the Bayesian estimation) estimation of the parameter of Lomax distribution based on different loss functions. Under different loss functions, we calculate the Bayesian estimation of the parameter and then calculate the expectation of the estimated value to get the E-Bayesian estimation. To measure the estimated error, the E-MSE (expected mean squared error) is introduced. And the formulas of E-Bayesian estimation and E-MSE are given. By applying Markov Chain Monte Carlo technology, we analyze the performances of the proposed methods. Results are compared on the basis of E-MSE. Then, cases of samples in real data sets are presented for illustration. In order to test whether the Lomax distribution can be used in analyzing the datasets, Kolmogorov Smirnov tests are conducted. Using real data, we can get the maximum likelihood estimation at the same time and compare it with E-Bayesian estimation. At last, we get the results of the comparison between Bayesian and E-Bayesian estimation methods under three different loss functions. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2202.08695 [pdf, other]

Article's Scientific Prestige: measuring the impact of individual articles in the Web of Science

Authors: Ying Chen, Thorsten Koch, Nazgul Zakiyeva, Kailiang Liu, Zhitong Xu, Chun-houh Chen, Junji Nakano, Keisuke Honda

Abstract: We performed a citation analysis on the Web of Science publications consisting of more than 63 million articles and 1.45 billion citations on 254 subjects from 1981 to 2020. We proposed the Article's Scientific Prestige (ASP) metric and compared this metric to number of citations (#Cit) and journal grade in measuring the scientific impact of individual articles in the large-scale hierarchical and… ▽ More We performed a citation analysis on the Web of Science publications consisting of more than 63 million articles and 1.45 billion citations on 254 subjects from 1981 to 2020. We proposed the Article's Scientific Prestige (ASP) metric and compared this metric to number of citations (#Cit) and journal grade in measuring the scientific impact of individual articles in the large-scale hierarchical and multi-disciplined citation network. In contrast to #Cit, ASP, that is computed based on the eigenvector centrality, considers both direct and indirect citations, and provides steady-state evaluation cross different disciplines. We found that ASP and #Cit are not aligned for most articles, with a growing mismatch amongst the less cited articles. While both metrics are reliable for evaluating the prestige of articles such as Nobel Prize winning articles, ASP tends to provide more persuasive rankings than #Cit when the articles are not highly cited. The journal grade, that is eventually determined by a few highly cited articles, is unable to properly reflect the scientific impact of individual articles. The number of references and coauthors are less relevant to scientific impact, but subjects do make a difference. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2111.09473 [pdf]

Number of New Top 2% Researchers from China and USA Over Time

Authors: Lei Liu, Song Yao, Kevin Liu

Abstract: In this paper we compare the numbers of new top 2% researchers from China and USA annually since 1980. We find that the log ratio of the numbers decreases almost linearly over time. As early as 2009, the total number of new top 2% researchers across all subfields from China exceeds that of USA. In particular, such trend is more striking in many subfields, e.g., Engineering, Chemistry, and Enabling… ▽ More In this paper we compare the numbers of new top 2% researchers from China and USA annually since 1980. We find that the log ratio of the numbers decreases almost linearly over time. As early as 2009, the total number of new top 2% researchers across all subfields from China exceeds that of USA. In particular, such trend is more striking in many subfields, e.g., Engineering, Chemistry, and Enabling & Strategic Technologies. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2110.05852 [pdf, other]

On the Self-Penalization Phenomenon in Feature Selection

Authors: Michael I. Jordan, Keli Liu, Feng Ruan

Abstract: We describe an implicit sparsity-inducing mechanism based on minimization over a family of kernels: \begin{equation*} \min_{β, f}~\widehat{\mathbb{E}}[L(Y, f(β^{1/q} \odot X)] + λ_n \|f\|_{\mathcal{H}_q}^2~~\text{subject to}~~β\ge 0, \end{equation*} where $L$ is the loss, $\odot$ is coordinate-wise multiplication and $\mathcal{H}_q$ is the reproducing kernel Hilbert space based on the kernel… ▽ More We describe an implicit sparsity-inducing mechanism based on minimization over a family of kernels: \begin{equation*} \min_{β, f}~\widehat{\mathbb{E}}[L(Y, f(β^{1/q} \odot X)] + λ_n \|f\|_{\mathcal{H}_q}^2~~\text{subject to}~~β\ge 0, \end{equation*} where $L$ is the loss, $\odot$ is coordinate-wise multiplication and $\mathcal{H}_q$ is the reproducing kernel Hilbert space based on the kernel $k_q(x, x') = h(\|x-x'\|_q^q)$, where $\|\cdot\|_q$ is the $\ell_q$ norm. Using gradient descent to optimize this objective with respect to $β$ leads to exactly sparse stationary points with high probability. The sparsity is achieved without using any of the well-known explicit sparsification techniques such as penalization (e.g., $\ell_1$), early stop** or post-processing (e.g., clip**). As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: 54 pages

arXiv:2106.09387 [pdf, other]

Taming Nonconvexity in Kernel Feature Selection -- Favorable Properties of the Laplace Kernel

Authors: Feng Ruan, Keli Liu, Michael I. Jordan

Abstract: Kernel-based feature selection is an important tool in nonparametric statistics. Despite many practical applications of kernel-based feature selection, there is little statistical theory available to support the method. A core challenge is the objective function of the optimization problems used to define kernel-based feature selection are nonconvex. The literature has only studied the statistical… ▽ More Kernel-based feature selection is an important tool in nonparametric statistics. Despite many practical applications of kernel-based feature selection, there is little statistical theory available to support the method. A core challenge is the objective function of the optimization problems used to define kernel-based feature selection are nonconvex. The literature has only studied the statistical properties of the \emph{global optima}, which is a mismatch, given that the gradient-based algorithms available for nonconvex optimization are only able to guarantee convergence to local minima. Studying the full landscape associated with kernel-based methods, we show that feature selection objectives using the Laplace kernel (and other $\ell_1$ kernels) come with statistical guarantees that other kernels, including the ubiquitous Gaussian kernel (or other $\ell_2$ kernels) do not possess. Based on a sharp characterization of the gradient of the objective function, we show that $\ell_1$ kernels eliminate unfavorable stationary points that appear when using an $\ell_2$ kernel. Armed with this insight, we establish statistical guarantees for $\ell_1$ kernel-based feature selection which do not require reaching the global minima. In particular, we establish model-selection consistency of $\ell_1$-kernel-based feature selection in recovering main effects and hierarchical interactions in the nonparametric setting with $n \sim \log p$ samples. △ Less

Submitted 25 May, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: 26 pages main text; 74 pages total; appendix rewritten (typo fixed; proof structure reorganized)

arXiv:2105.09557 [pdf, other]

Power-law escape rate of SGD

Authors: Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda

Abstract: Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. Using this formalism, we show that the log loss barrier $Δ\log L=\log[L(θ^s)/L(θ^*)]$ between a local minimum $θ^*$ and a saddle $θ^s$ determines th… ▽ More Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. Using this formalism, we show that the log loss barrier $Δ\log L=\log[L(θ^s)/L(θ^*)]$ between a local minimum $θ^*$ and a saddle $θ^s$ determines the escape rate of SGD from the local minimum, contrary to the previous results borrowing from physics that the linear loss barrier $ΔL=L(θ^s)-L(θ^*)$ decides the escape rate. Our escape-rate formula strongly depends on the typical magnitude $h^*$ and the number $n$ of the outlier eigenvalues of the Hessian. This result explains an empirical fact that SGD prefers flat minima with low effective dimensions, giving an insight into implicit biases of SGD. △ Less

Submitted 29 January, 2022; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: 17+8 pages

arXiv:2102.05375 [pdf, other]

Strength of Minibatch Noise in SGD

Authors: Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda

Abstract: The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning. This work presents the first systematic study of the SGD noise and fluctuations close to a local minimum. We first analyze the SGD noise in linear regression in detail and then derive a general formula for approximating SGD noise in different types o… ▽ More The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning. This work presents the first systematic study of the SGD noise and fluctuations close to a local minimum. We first analyze the SGD noise in linear regression in detail and then derive a general formula for approximating SGD noise in different types of minima. For application, our results (1) provide insight into the stability of training a neural network, (2) suggest that a large learning rate can help generalization by introducing an implicit regularization, (3) explain why the linear learning rate-batchsize scaling law fails at a large learning rate or at a small batchsize and (4) can provide an understanding of how discrete-time nature of SGD affects the recently discovered power-law phenomenon of SGD. △ Less

Submitted 8 March, 2022; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: ICLR 2022 spotlight

arXiv:2012.03636 [pdf, other]

Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

Authors: Kangqiao Liu, Liu Ziyin, Masahito Ueda

Abstract: In the vanishing learning rate regime, stochastic gradient descent (SGD) is now relatively well understood. In this work, we propose to study the basic properties of SGD and its variants in the non-vanishing learning rate regime. The focus is on deriving exactly solvable results and discussing their implications. The main contributions of this work are to derive the stationary distribution for dis… ▽ More In the vanishing learning rate regime, stochastic gradient descent (SGD) is now relatively well understood. In this work, we propose to study the basic properties of SGD and its variants in the non-vanishing learning rate regime. The focus is on deriving exactly solvable results and discussing their implications. The main contributions of this work are to derive the stationary distribution for discrete-time SGD in a quadratic loss function with and without momentum; in particular, one implication of our result is that the fluctuation caused by discrete-time dynamics takes a distorted shape and is dramatically larger than a continuous-time theory could predict. Examples of applications of the proposed theory considered in this work include the approximation error of variants of SGD, the effect of minibatch noise, the optimal Bayesian inference, the escape rate from a sharp minimum, and the stationary covariance of a few second-order methods including damped Newton's method, natural gradient descent, and Adam. △ Less

Submitted 11 June, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: Camera-ready version for the Thirty-eighth International Conference on Machine Learning (ICML 2021). 12 + 14 pages, 6 + 3 figures, 1 + 0 table. *First two authors contributed equally

arXiv:2011.12215 [pdf, other]

A Self-Penalizing Objective Function for Scalable Interaction Detection

Authors: Keli Liu, Feng Ruan

Abstract: We tackle the problem of nonparametric variable selection with a focus on discovering interactions between variables. With $p$ variables there are $O(p^s)$ possible order-$s$ interactions making exhaustive search infeasible. It is nonetheless possible to identify the variables involved in interactions with only linear computation cost, $O(p)$. The trick is to maximize a class of parametrized nonpa… ▽ More We tackle the problem of nonparametric variable selection with a focus on discovering interactions between variables. With $p$ variables there are $O(p^s)$ possible order-$s$ interactions making exhaustive search infeasible. It is nonetheless possible to identify the variables involved in interactions with only linear computation cost, $O(p)$. The trick is to maximize a class of parametrized nonparametric dependence measures which we call metric learning objectives; the landscape of these nonconvex objective functions is sensitive to interactions but the objectives themselves do not explicitly model interactions. Three properties make metric learning objectives highly attractive: (a) The stationary points of the objective are automatically sparse (i.e. performs selection) -- no explicit $\ell_1$ penalization is needed. (b) All stationary points of the objective exclude noise variables with high probability. (c) Guaranteed recovery of all signal variables without needing to reach the objective's global maxima or special stationary points. The second and third properties mean that all our theoretical results apply in the practical case where one uses gradient ascent to maximize the metric learning objective. While not all metric learning objectives enjoy good statistical power, we design an objective based on $\ell_1$ kernels that does exhibit favorable power: it recovers (i) main effects with $n \sim \log p$ samples, (ii) hierarchical interactions with $n \sim \log p$ samples and (iii) order-$s$ pure interactions with $n \sim p^{2(s-1)}\log p$ samples. △ Less

Submitted 12 December, 2020; v1 submitted 24 November, 2020; originally announced November 2020.

Comments: 34 pages; the Appendix can be found on the authors' personal websites (the url is in the pdf)

arXiv:2010.02506 [pdf, other]

Interactive Reinforcement Learning for Feature Selection with Decision Tree in the Loop

Authors: Wei Fan, Kunpeng Liu, Hao Liu, Yong Ge, Hui Xiong, Yanjie Fu

Abstract: We study the problem of balancing effectiveness and efficiency in automated feature selection. After exploring many feature selection methods, we observe a computational dilemma: 1) traditional feature selection is mostly efficient, but difficult to identify the best subset; 2) the emerging reinforced feature selection automatically navigates to the best subset, but is usually inefficient. Can we… ▽ More We study the problem of balancing effectiveness and efficiency in automated feature selection. After exploring many feature selection methods, we observe a computational dilemma: 1) traditional feature selection is mostly efficient, but difficult to identify the best subset; 2) the emerging reinforced feature selection automatically navigates to the best subset, but is usually inefficient. Can we bridge the gap between effectiveness and efficiency under automation? Motivated by this dilemma, we aim to develop a novel feature space navigation method. In our preliminary work, we leveraged interactive reinforcement learning to accelerate feature selection by external trainer-agent interaction. In this journal version, we propose a novel interactive and closed-loop architecture to simultaneously model interactive reinforcement learning (IRL) and decision tree feedback (DTF). Specifically, IRL is to create an interactive feature selection loop and DTF is to feed structured feature knowledge back to the loop. First, the tree-structured feature hierarchy from decision tree is leveraged to improve state representation. In particular, we represent the selected feature subset as an undirected graph of feature-feature correlations and a directed tree of decision features. We propose a new embedding method capable of empowering graph convolutional network to jointly learn state representation from both the graph and the tree. Second, the tree-structured feature hierarchy is exploited to develop a new reward scheme. In particular, we personalize reward assignment of agents based on decision tree feature importance. In addition, observing agents' actions can be feedback, we devise another reward scheme, to weigh and assign reward based on the feature selected frequency ratio in historical action records. Finally, we present extensive experiments on real-world datasets to show the improved performance. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2008.12001

arXiv:2009.10337 [pdf, other]

Learning Task-Agnostic Action Spaces for Movement Optimization

Authors: Amin Babadi, Michiel van de Panne, C. Karen Liu, Perttu Hämäläinen

Abstract: We propose a novel method for exploring the dynamics of physically based animated characters, and learning a task-agnostic action space that makes movement optimization easier. Like several previous papers, we parameterize actions as target states, and learn a short-horizon goal-conditioned low-level control policy that drives the agent's state towards the targets. Our novel contribution is that w… ▽ More We propose a novel method for exploring the dynamics of physically based animated characters, and learning a task-agnostic action space that makes movement optimization easier. Like several previous papers, we parameterize actions as target states, and learn a short-horizon goal-conditioned low-level control policy that drives the agent's state towards the targets. Our novel contribution is that with our exploration data, we are able to learn the low-level policy in a generic manner and without any reference movement data. Trained once for each agent or simulation environment, the policy improves the efficiency of optimizing both trajectories and high-level policies across multiple tasks and optimization algorithms. We also contribute novel visualizations that show how using target states as actions makes optimized trajectories more robust to disturbances; this manifests as wider optima that are easy to find. Due to its simplicity and generality, our proposed approach should provide a building block that can improve a large variety of movement optimization methods and applications. △ Less

Submitted 23 July, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: Accepted as a regular paper by IEEE Transactions on Visualization and Computer Graphics (TVCG) in July 2021

arXiv:2009.09283 [pdf, other]

Subverting Privacy-Preserving GANs: Hiding Secrets in Sanitized Images

Authors: Kang Liu, Benjamin Tan, Siddharth Garg

Abstract: Unprecedented data collection and sharing have exacerbated privacy concerns and led to increasing interest in privacy-preserving tools that remove sensitive attributes from images while maintaining useful information for other tasks. Currently, state-of-the-art approaches use privacy-preserving generative adversarial networks (PP-GANs) for this purpose, for instance, to enable reliable facial expr… ▽ More Unprecedented data collection and sharing have exacerbated privacy concerns and led to increasing interest in privacy-preserving tools that remove sensitive attributes from images while maintaining useful information for other tasks. Currently, state-of-the-art approaches use privacy-preserving generative adversarial networks (PP-GANs) for this purpose, for instance, to enable reliable facial expression recognition without leaking users' identity. However, PP-GANs do not offer formal proofs of privacy and instead rely on experimentally measuring information leakage using classification accuracy on the sensitive attributes of deep learning (DL)-based discriminators. In this work, we question the rigor of such checks by subverting existing privacy-preserving GANs for facial expression recognition. We show that it is possible to hide the sensitive identification data in the sanitized output images of such PP-GANs for later extraction, which can even allow for reconstruction of the entire input images, while satisfying privacy checks. We demonstrate our approach via a PP-GAN-based architecture and provide qualitative and quantitative evaluations using two public datasets. Our experimental results raise fundamental questions about the need for more rigorous privacy checks of PP-GANs, and we provide insights into the social impact of these. △ Less

Submitted 19 September, 2020; originally announced September 2020.

arXiv:2009.09230 [pdf, other]

Simplifying Reinforced Feature Selection via Restructured Choice Strategy of Single Agent

Authors: Xiaosa Zhao, Kunpeng Liu, Wei Fan, Lu Jiang, Xiaowei Zhao, Minghao Yin, Yanjie Fu

Abstract: Feature selection aims to select a subset of features to optimize the performances of downstream predictive tasks. Recently, multi-agent reinforced feature selection (MARFS) has been introduced to automate feature selection, by creating agents for each feature to select or deselect corresponding features. Although MARFS enjoys the automation of the selection process, MARFS suffers from not just th… ▽ More Feature selection aims to select a subset of features to optimize the performances of downstream predictive tasks. Recently, multi-agent reinforced feature selection (MARFS) has been introduced to automate feature selection, by creating agents for each feature to select or deselect corresponding features. Although MARFS enjoys the automation of the selection process, MARFS suffers from not just the data complexity in terms of contents and dimensionality, but also the exponentially-increasing computational costs with regard to the number of agents. The raised concern leads to a new research question: Can we simplify the selection process of agents under reinforcement learning context so as to improve the efficiency and costs of feature selection? To address the question, we develop a single-agent reinforced feature selection approach integrated with restructured choice strategy. Specifically, the restructured choice strategy includes: 1) we exploit only one single agent to handle the selection task of multiple features, instead of using multiple agents. 2) we develop a scanning method to empower the single agent to make multiple selection/deselection decisions in each round of scanning. 3) we exploit the relevance to predictive labels of features to prioritize the scanning orders of the agent for multiple features. 4) we propose a convolutional auto-encoder algorithm, integrated with the encoded index information of features, to improve state representation. 5) we design a reward scheme that take into account both prediction accuracy and feature redundancy to facilitate the exploration process. Finally, we present extensive experimental results to demonstrate the efficiency and effectiveness of the proposed method. △ Less

Submitted 19 September, 2020; originally announced September 2020.

arXiv:2008.12001 [pdf, other]

AutoFS: Automated Feature Selection via Diversity-aware Interactive Reinforcement Learning

Authors: Wei Fan, Kunpeng Liu, Hao Liu, Pengyang Wang, Yong Ge, Yanjie Fu

Abstract: In this paper, we study the problem of balancing effectiveness and efficiency in automated feature selection. Feature selection is a fundamental intelligence for machine learning and predictive analysis. After exploring many feature selection methods, we observe a computational dilemma: 1) traditional feature selection methods (e.g., mRMR) are mostly efficient, but difficult to identify the best s… ▽ More In this paper, we study the problem of balancing effectiveness and efficiency in automated feature selection. Feature selection is a fundamental intelligence for machine learning and predictive analysis. After exploring many feature selection methods, we observe a computational dilemma: 1) traditional feature selection methods (e.g., mRMR) are mostly efficient, but difficult to identify the best subset; 2) the emerging reinforced feature selection methods automatically navigate feature space to explore the best subset, but are usually inefficient. Are automation and efficiency always apart from each other? Can we bridge the gap between effectiveness and efficiency under automation? Motivated by such a computational dilemma, this study is to develop a novel feature space navigation method. To that end, we propose an Interactive Reinforced Feature Selection (IRFS) framework that guides agents by not just self-exploration experience, but also diverse external skilled trainers to accelerate learning for feature exploration. Specifically, we formulate the feature selection problem into an interactive reinforcement learning framework. In this framework, we first model two trainers skilled at different searching strategies: (1) KBest based trainer; (2) Decision Tree based trainer. We then develop two strategies: (1) to identify assertive and hesitant agents to diversify agent training, and (2) to enable the two trainers to take the teaching role in different stages to fuse the experiences of the trainers and diversify teaching process. Such a hybrid teaching strategy can help agents to learn broader knowledge, and, thereafter, be more effective. Finally, we present extensive experiments on real-world datasets to demonstrate the improved performances of our method: more efficient than existing reinforced selection and more effective than classic selection. △ Less

Submitted 16 September, 2020; v1 submitted 27 August, 2020; originally announced August 2020.

Comments: Accepted by ICDM 2020. In this version, we revised some typos or mistakes for camera-ready

arXiv:2008.03392 [pdf, other]

Grou** effects of sparse CCA models in variable selection

Authors: Kefei Liu, Qi Long, Li Shen

Abstract: The sparse canonical correlation analysis (SCCA) is a bi-multivariate association model that finds sparse linear combinations of two sets of variables that are maximally correlated with each other. In addition to the standard SCCA model, a simplified SCCA criterion which maixmizes the cross-covariance between a pair of canonical variables instead of their cross-correlation, is widely used in the l… ▽ More The sparse canonical correlation analysis (SCCA) is a bi-multivariate association model that finds sparse linear combinations of two sets of variables that are maximally correlated with each other. In addition to the standard SCCA model, a simplified SCCA criterion which maixmizes the cross-covariance between a pair of canonical variables instead of their cross-correlation, is widely used in the literature due to its computational simplicity. However, the behaviors/properties of the solutions of these two models remain unknown in theory. In this paper, we analyze the grou** effect of the standard and simplified SCCA models in variable selection. In high-dimensional settings, the variables often form groups with high within-group correlation and low between-group correlation. Our theoretical analysis shows that for grouped variable selection, the simplified SCCA jointly selects or deselects a group of variables together, while the standard SCCA randomly selects a few dominant variables from each relevant group of correlated variables. Empirical results on synthetic data and real imaging genetics data verify the finding of our theoretical analysis. △ Less

Submitted 7 August, 2020; originally announced August 2020.

arXiv:2007.03383 [pdf, other]

RGCF: Refined Graph Convolution Collaborative Filtering with concise and expressive embedding

Authors: Kang Liu, Feng Xue, Richang Hong

Abstract: Graph Convolution Network (GCN) has attracted significant attention and become the most popular method for learning graph representations. In recent years, many efforts have been focused on integrating GCN into the recommender tasks and have made remarkable progress. At its core is to explicitly capture high-order connectivities between the nodes in user-item bipartite graph. However, we theoretic… ▽ More Graph Convolution Network (GCN) has attracted significant attention and become the most popular method for learning graph representations. In recent years, many efforts have been focused on integrating GCN into the recommender tasks and have made remarkable progress. At its core is to explicitly capture high-order connectivities between the nodes in user-item bipartite graph. However, we theoretically and empirically find an inherent drawback existed in these GCN-based recommendation methods, where GCN is directly applied to aggregate neighboring nodes will introduce noise and information redundancy. Consequently, the these models' capability of capturing high-order connectivities among different nodes is limited, leading to suboptimal performance of the recommender tasks. The main reason is that the the nonlinear network layer inside GCN structure is not suitable for extracting non-sematic features(such as one-hot ID feature) in the collaborative filtering scenarios. In this work, we develop a new GCN-based Collaborative Filtering model, named Refined Graph convolution Collaborative Filtering(RGCF), where the construction of the embeddings of users (items) are delicately redesigned from several aspects during the aggregation on the graph. Compared to the state-of-the-art GCN-based recommendation, RGCF is more capable for capturing the implicit high-order connectivities inside the graph and the resultant vector representations are more expressive. We conduct extensive experiments on three public million-size datasets, demonstrating that our RGCF significantly outperforms state-of-the-art models. We release our code at https://github.com/hfutmars/RGCF. △ Less

Submitted 11 July, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

arXiv:2006.12715 [pdf, other]

doi 10.1145/3394486.3403358

Hybrid Spatio-Temporal Graph Convolutional Network: Improving Traffic Prediction with Navigation Data

Authors: Rui Dai, Shenkun Xu, Qian Gu, Chenguang Ji, Kaikui Liu

Abstract: Traffic forecasting has recently attracted increasing interest due to the popularity of online navigation services, ridesharing and smart city projects. Owing to the non-stationary nature of road traffic, forecasting accuracy is fundamentally limited by the lack of contextual information. To address this issue, we propose the Hybrid Spatio-Temporal Graph Convolutional Network (H-STGCN), which is a… ▽ More Traffic forecasting has recently attracted increasing interest due to the popularity of online navigation services, ridesharing and smart city projects. Owing to the non-stationary nature of road traffic, forecasting accuracy is fundamentally limited by the lack of contextual information. To address this issue, we propose the Hybrid Spatio-Temporal Graph Convolutional Network (H-STGCN), which is able to "deduce" future travel time by exploiting the data of upcoming traffic volume. Specifically, we propose an algorithm to acquire the upcoming traffic volume from an online navigation engine. Taking advantage of the piecewise-linear flow-density relationship, a novel transformer structure converts the upcoming volume into its equivalent in travel time. We combine this signal with the commonly-utilized travel-time signal, and then apply graph convolution to capture the spatial dependency. Particularly, we construct a compound adjacency matrix which reflects the innate traffic proximity. We conduct extensive experiments on real-world datasets. The results show that H-STGCN remarkably outperforms state-of-the-art methods in various metrics, especially for the prediction of non-recurring congestion. △ Less

Submitted 22 June, 2020; originally announced June 2020.

arXiv:2004.12492 [pdf, other]

Bias Busters: Robustifying DL-based Lithographic Hotspot Detectors Against Backdooring Attacks

Authors: Kang Liu, Benjamin Tan, Gaurav Rajavendra Reddy, Siddharth Garg, Yiorgos Makris, Ramesh Karri

Abstract: Deep learning (DL) offers potential improvements throughout the CAD tool-flow, one promising application being lithographic hotspot detection. However, DL techniques have been shown to be especially vulnerable to inference and training time adversarial attacks. Recent work has demonstrated that a small fraction of malicious physical designers can stealthily "backdoor" a DL-based hotspot detector d… ▽ More Deep learning (DL) offers potential improvements throughout the CAD tool-flow, one promising application being lithographic hotspot detection. However, DL techniques have been shown to be especially vulnerable to inference and training time adversarial attacks. Recent work has demonstrated that a small fraction of malicious physical designers can stealthily "backdoor" a DL-based hotspot detector during its training phase such that it accurately classifies regular layout clips but predicts hotspots containing a specially crafted trigger shape as non-hotspots. We propose a novel training data augmentation strategy as a powerful defense against such backdooring attacks. The defense works by eliminating the intentional biases introduced in the training data but does not require knowledge of which training samples are poisoned or the nature of the backdoor trigger. Our results show that the defense can drastically reduce the attack success rate from 84% to ~0%. △ Less

Submitted 26 April, 2020; originally announced April 2020.

arXiv:2003.10484 [pdf, other]

Large-P Variable Selection in Two-Stage Models

Authors: Haim Bar, Kangyan Liu

Abstract: Model selection in the large-P small-N scenario is discussed in the framework of two-stage models. Two specific models are considered, namely, two-stage least squares (TSLS) involving instrumental variables (IVs), and mediation models. In both cases, the number of putative variables (e.g. instruments or mediators) is large, but only a small subset should be included in the two-stage model. We use… ▽ More Model selection in the large-P small-N scenario is discussed in the framework of two-stage models. Two specific models are considered, namely, two-stage least squares (TSLS) involving instrumental variables (IVs), and mediation models. In both cases, the number of putative variables (e.g. instruments or mediators) is large, but only a small subset should be included in the two-stage model. We use two variable selection methods which are designed for high-dimensional settings, and compare their performance in terms of their ability to find the true IVs or mediators. Our approach is demonstrated via simulations and case studies. △ Less

Submitted 23 March, 2020; originally announced March 2020.

arXiv:2002.07613 [pdf, other]

An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization

Authors: Yiqiu Shen, Nan Wu, Jason Phang, Jungkyu Park, Kangning Liu, Sudarshini Tyagi, Laura Heacock, S. Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras

Abstract: Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical im… ▽ More Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical images. This model first uses a low-capacity, yet memory-efficient, network on the whole image to identify the most informative regions. It then applies another higher-capacity network to collect details from chosen regions. Finally, it employs a fusion module that aggregates global and local information to make a final prediction. While existing methods often require lesion segmentation during training, our model is trained with only image-level labels and can generate pixel-level saliency maps indicating possible malignant findings. We apply the model to screening mammography interpretation: predicting the presence or absence of benign and malignant lesions. On the NYU Breast Cancer Screening Dataset, consisting of more than one million images, our model achieves an AUC of 0.93 in classifying breasts with malignant findings, outperforming ResNet-34 and Faster R-CNN. Compared to ResNet-34, our model is 4.1x faster for inference while using 78.4% less GPU memory. Furthermore, we demonstrate, in a reader study, that our model surpasses radiologist-level AUC by a margin of 0.11. The proposed model is available online: https://github.com/nyukat/GMIC. △ Less

Submitted 13 February, 2020; originally announced February 2020.

arXiv:2002.03419 [pdf, other]

The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up

Authors: Razvan V. Marinescu, Neil P. Oxtoby, Alexandra L. Young, Esther E. Bron, Arthur W. Toga, Michael W. Weiner, Frederik Barkhof, Nick C. Fox, Arman Eshaghi, Tina Toni, Marcin Salaterski, Veronika Lunina, Manon Ansart, Stanley Durrleman, Pascal Lu, Samuel Iddi, Dan Li, Wesley K. Thompson, Michael C. Donohue, Aviv Nahon, Yarden Levy, Dan Halbersberg, Mariya Cohen, Huiling Liao, Tengfei Li , et al. (71 additional authors not shown)

Abstract: We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcome… ▽ More We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcomes: clinical diagnosis, Alzheimer's Disease Assessment Scale Cognitive Subdomain (ADAS-Cog13), and total volume of the ventricles. The methods used by challenge participants included multivariate linear regression, machine learning methods such as support vector machines and deep neural networks, as well as disease progression models. No single submission was best at predicting all three outcomes. For clinical diagnosis and ventricle volume prediction, the best algorithms strongly outperform simple baselines in predictive ability. However, for ADAS-Cog13 no single submitted prediction method was significantly better than random guesswork. Two ensemble methods based on taking the mean and median over all predictions, obtained top scores on almost all tasks. Better than average performance at diagnosis prediction was generally associated with the additional inclusion of features from cerebrospinal fluid (CSF) samples and diffusion tensor imaging (DTI). On the other hand, better performance at ventricle volume prediction was associated with inclusion of summary statistics, such as the slope or maxima/minima of biomarkers. TADPOLE's unique results suggest that current prediction algorithms provide sufficient accuracy to exploit biomarkers related to clinical diagnosis and ventricle volume, for cohort refinement in clinical trials for Alzheimer's disease. However, results call into question the usage of cognitive test scores for patient selection and as a primary endpoint in clinical trials. △ Less

Submitted 27 December, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

Comments: Presents final results of the TADPOLE competition. 60 pages, 7 tables, 14 figures

Journal ref: Machine Learning for Biomedical Imaging (MELBA), Dec 2021

arXiv:2001.07646 [pdf]

How Fast You Can Actually Fly: A Comparative Investigation of Flight Airborne Time in China and the U.S

Authors: Ke Liu, Zhe Zheng, Bo Zou, Mark Hansen

Abstract: Actual airborne time (AAT) is the time between wheels-off and wheels-on of a flight. Understanding the behavior of AAT is increasingly important given the ever growing demand for air travel and flight delays becoming more rampant. As no research on AAT exists, this paper performs the first empirical analysis of AAT behavior, comparatively for the U.S. and China. The focus is on how AAT is affected… ▽ More Actual airborne time (AAT) is the time between wheels-off and wheels-on of a flight. Understanding the behavior of AAT is increasingly important given the ever growing demand for air travel and flight delays becoming more rampant. As no research on AAT exists, this paper performs the first empirical analysis of AAT behavior, comparatively for the U.S. and China. The focus is on how AAT is affected by scheduled block time (SBT), origin-destination (OD) distance, and the possible pressure to reduce AAT from other parts of flight operations. Multiple econometric models are developed. The estimation results show that in both countries AAT is highly correlated with SBT and OD distance. Flights in the U.S. are faster than in China. On the other hand, facing ground delay prior to takeoff, a flight has limited capability to speed up. The pressure from short turnaround time after landing to reduce AAT is immaterial. Sensitivity analysis of AAT to flight length and aircraft utilization is further conducted. Given the more abundant airspace, flexible routing networks, and efficient ATFM procedures, a counterfactual that the AAT behavior in the U.S. were adopted in China is examined. We find that by doing so significant efficiency gains could be achieved in the Chinese air traffic system. On average, 11.8 minutes of AAT per flight would be saved, coming from both reduction in SBT and reduction in AAT relative to the new SBT. Systemwide fuel saving would amount to over 300 million gallons with direct airline operating cost saving of nearly $1.3 billion nationwide in 2016. △ Less

Submitted 21 January, 2020; originally announced January 2020.

Comments: 44 pages, 11 figures

MSC Class: 62P30

arXiv:1911.05142 [pdf, other]

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

Authors: Zhiyuan Liu, Huazheng Wang, Fan Shen, Kai Liu, Lijun Chen

Abstract: We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward. We seek to understand the impact of this drifted reward feedback by analyzing the performance of three instantiations of the incentivized MAB algorithm: UCB, $\varepsilon$-Greedy, and Thompson Sa… ▽ More We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward. We seek to understand the impact of this drifted reward feedback by analyzing the performance of three instantiations of the incentivized MAB algorithm: UCB, $\varepsilon$-Greedy, and Thompson Sampling. Our results show that they all achieve $\mathcal{O}(\log T)$ regret and compensation under the drifted reward, and are therefore effective in incentivizing exploration. Numerical examples are provided to complement the theoretical analysis. △ Less

Submitted 15 December, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

Comments: 10 pages, 2 figures, AAAI 2020

arXiv:1909.07869 [pdf, other]

Visualizing Movement Control Optimization Landscapes

Authors: Perttu Hämäläinen, Juuso Toikka, Amin Babadi, C. Karen Liu

Abstract: A large body of animation research focuses on optimization of movement control, either as action sequences or policy parameters. However, as closed-form expressions of the objective functions are often not available, our understanding of the optimization problems is limited. Building on recent work on analyzing neural network training, we contribute novel visualizations of high-dimensional control… ▽ More A large body of animation research focuses on optimization of movement control, either as action sequences or policy parameters. However, as closed-form expressions of the objective functions are often not available, our understanding of the optimization problems is limited. Building on recent work on analyzing neural network training, we contribute novel visualizations of high-dimensional control optimization landscapes; this yields insights into why control optimization is hard and why common practices like early termination and spline-based action parameterizations make optimization easier. For example, our experiments show how trajectory optimization can become increasingly ill-conditioned with longer trajectories, but parameterizing control as partial target states---e.g., target angles converted to torques using a PD-controller---can act as an efficient preconditioner. Both our visualizations and quantitative empirical data also indicate that neural network policy optimization scales better than trajectory optimization for long planning horizons. Our work advances the understanding of movement optimization and our visualizations should also provide value in educational use. △ Less

Submitted 22 August, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

Comments: Accepted to IEEE Transactions on Visualization and Computer Graphics (IEEE TVCG)

arXiv:1908.10999 [pdf, other]

Spectral Regularization for Combating Mode Collapse in GANs

Authors: Kanglin Liu, Wenming Tang, Fei Zhou, Guo** Qiu

Abstract: Despite excellent progress in recent years, mode collapse remains a major unsolved problem in generative adversarial networks (GANs).In this paper, we present spectral regularization for GANs (SR-GANs), a new and robust method for combating the mode collapse problem in GANs. Theoretical analysis shows that the optimal solution to the discriminator has a strong relationship to the spectral distribu… ▽ More Despite excellent progress in recent years, mode collapse remains a major unsolved problem in generative adversarial networks (GANs).In this paper, we present spectral regularization for GANs (SR-GANs), a new and robust method for combating the mode collapse problem in GANs. Theoretical analysis shows that the optimal solution to the discriminator has a strong relationship to the spectral distributions of the weight matrix.Therefore, we monitor the spectral distribution in the discriminator of spectral normalized GANs (SN-GANs), and discover a phenomenon which we refer to as spectral collapse, where a large number of singular values of the weight matrices drop dramatically when mode collapse occurs. We show that there are strong evidence linking mode collapse to spectral collapse; and based on this link, we set out to tackle spectral collapse as a surrogate of mode collapse. We have developed a spectral regularization method where we compensate the spectral distributions of the weight matrices to prevent them from collapsing, which in turn successfully prevents mode collapse in GANs. We provide theoretical explanations for why SR-GANs are more stable and can provide better performances than SN-GANs. We also present extensive experimental results and analysis to show that SR-GANs not only always outperform SN-GANs but also always succeed in combating mode collapse where SN-GANs fail. The code is available at https://github.com/max-liu-112/SRGANs-Spectral-Regularization-GANs-. △ Less

Submitted 12 October, 2019; v1 submitted 28 August, 2019; originally announced August 2019.

Comments: 24 pages, 33 figures

arXiv:1908.01052 [pdf, other]

Weight Friction: A Simple Method to Overcome Catastrophic Forgetting and Enable Continual Learning

Authors: Gabrielle K. Liu

Abstract: In recent years, deep neural networks have found success in replicating human-level cognitive skills, yet they suffer from several major obstacles. One significant limitation is the inability to learn new tasks without forgetting previously learned tasks, a shortcoming known as catastrophic forgetting. In this research, we propose a simple method to overcome catastrophic forgetting and enable cont… ▽ More In recent years, deep neural networks have found success in replicating human-level cognitive skills, yet they suffer from several major obstacles. One significant limitation is the inability to learn new tasks without forgetting previously learned tasks, a shortcoming known as catastrophic forgetting. In this research, we propose a simple method to overcome catastrophic forgetting and enable continual learning in neural networks. We draw inspiration from principles in neurology and physics to develop the concept of weight friction. Weight friction operates by a modification to the update rule in the gradient descent optimization method. It converges at a rate comparable to that of the stochastic gradient descent algorithm and can operate over multiple task domains. It performs comparably to current methods while offering improvements in computation and memory efficiency. △ Less

Submitted 17 August, 2019; v1 submitted 2 August, 2019; originally announced August 2019.

Comments: 9 pages, 6 figures, 1 table

arXiv:1907.07129 [pdf, other]

Topology Based Scalable Graph Kernels

Authors: Kin Sum Liu, Chien-Chun Ni, Yu-Yao Lin, Jie Gao

Abstract: We propose a new graph kernel for graph classification and comparison using Ollivier Ricci curvature. The Ricci curvature of an edge in a graph describes the connectivity in the local neighborhood. An edge in a densely connected neighborhood has positive curvature and an edge serving as a local bridge has negative curvature. We use the edge curvature distribution to form a graph kernel which is th… ▽ More We propose a new graph kernel for graph classification and comparison using Ollivier Ricci curvature. The Ricci curvature of an edge in a graph describes the connectivity in the local neighborhood. An edge in a densely connected neighborhood has positive curvature and an edge serving as a local bridge has negative curvature. We use the edge curvature distribution to form a graph kernel which is then used to compare and cluster graphs. The curvature kernel uses purely the graph topology and thereby works for settings when node attributes are not available. △ Less

Submitted 14 July, 2019; originally announced July 2019.

arXiv:1906.10773 [pdf, other]

doi 10.1145/3408288

Are Adversarial Perturbations a Showstopper for ML-Based CAD? A Case Study on CNN-Based Lithographic Hotspot Detection

Authors: Kang Liu, Haoyu Yang, Yuzhe Ma, Benjamin Tan, Bei Yu, Evangeline F. Y. Young, Ramesh Karri, Siddharth Garg

Abstract: There is substantial interest in the use of machine learning (ML) based techniques throughout the electronic computer-aided design (CAD) flow, particularly those based on deep learning. However, while deep learning methods have surpassed state-of-the-art performance in several applications, they have exhibited intrinsic susceptibility to adversarial perturbations --- small but deliberate alteratio… ▽ More There is substantial interest in the use of machine learning (ML) based techniques throughout the electronic computer-aided design (CAD) flow, particularly those based on deep learning. However, while deep learning methods have surpassed state-of-the-art performance in several applications, they have exhibited intrinsic susceptibility to adversarial perturbations --- small but deliberate alterations to the input of a neural network, precipitating incorrect predictions. In this paper, we seek to investigate whether adversarial perturbations pose risks to ML-based CAD tools, and if so, how these risks can be mitigated. To this end, we use a motivating case study of lithographic hotspot detection, for which convolutional neural networks (CNN) have shown great promise. In this context, we show the first adversarial perturbation attacks on state-of-the-art CNN-based hotspot detectors; specifically, we show that small (on average 0.5% modified area), functionality preserving and design-constraint satisfying changes to a layout can nonetheless trick a CNN-based hotspot detector into predicting the modified layout as hotspot free (with up to 99.7% success). We propose an adversarial retraining strategy to improve the robustness of CNN-based hotspot detection and show that this strategy significantly improves robustness (by a factor of ~3) against adversarial attacks without compromising classification accuracy. △ Less

Submitted 25 June, 2019; originally announced June 2019.

Journal ref: ACM Trans. Des. Autom. Electron. Syst. 25, 5, Article 48 (August 2020)

arXiv:1904.13007 [pdf, other]

Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks

Authors: Yichen Zhang, Shanshan Jia, Ya**g Zheng, Zhaofei Yu, Yonghong Tian, Siwei Ma, Tiejun Huang, Jian K. Liu

Abstract: Neural coding is one of the central questions in systems neuroscience for understanding how the brain processes stimulus from the environment, moreover, it is also a cornerstone for designing algorithms of brain-machine interface, where decoding incoming stimulus is highly demanded for better performance of physical devices. Traditionally researchers have focused on functional magnetic resonance i… ▽ More Neural coding is one of the central questions in systems neuroscience for understanding how the brain processes stimulus from the environment, moreover, it is also a cornerstone for designing algorithms of brain-machine interface, where decoding incoming stimulus is highly demanded for better performance of physical devices. Traditionally researchers have focused on functional magnetic resonance imaging (fMRI) data as the neural signals of interest for decoding visual scenes. However, our visual perception operates in a fast time scale of millisecond in terms of an event termed neural spike. There are few studies of decoding by using spikes. Here we fulfill this aim by develo** a novel decoding framework based on deep neural networks, named spike-image decoder (SID), for reconstructing natural visual scenes, including static images and dynamic videos, from experimentally recorded spikes of a population of retinal ganglion cells. The SID is an end-to-end decoder with one end as neural spikes and the other end as images, which can be trained directly such that visual scenes are reconstructed from spikes in a highly accurate fashion. Our SID also outperforms on the reconstruction of visual stimulus compared to existing fMRI decoding models. In addition, with the aid of a spike encoder, we show that SID can be generalized to arbitrary visual scenes by using the image datasets of MNIST, CIFAR10, and CIFAR100. Furthermore, with a pre-trained SID, one can decode any dynamic videos to achieve real-time encoding and decoding of visual scenes by spikes. Altogether, our results shed new light on neuromorphic computing for artificial visual systems, such as event-based visual cameras and visual neuroprostheses. △ Less

Submitted 28 January, 2020; v1 submitted 29 April, 2019; originally announced April 2019.

Comments: 35 pages, 10 figures

ACM Class: I.2.6

arXiv:1903.06877 [pdf, other]

Spherical Principal Component Analysis

Authors: Kai Liu, Qiuwei Li, Hua Wang, Gongguo Tang

Abstract: Principal Component Analysis (PCA) is one of the most important methods to handle high dimensional data. However, most of the studies on PCA aim to minimize the loss after projection, which usually measures the Euclidean distance, though in some fields, angle distance is known to be more important and critical for analysis. In this paper, we propose a method by adding constraints on factors to uni… ▽ More Principal Component Analysis (PCA) is one of the most important methods to handle high dimensional data. However, most of the studies on PCA aim to minimize the loss after projection, which usually measures the Euclidean distance, though in some fields, angle distance is known to be more important and critical for analysis. In this paper, we propose a method by adding constraints on factors to unify the Euclidean distance and angle distance. However, due to the nonconvexity of the objective and constraints, the optimized solution is not easy to obtain. We propose an alternating linearized minimization method to solve it with provable convergence rate and guarantee. Experiments on synthetic data and real-world datasets have validated the effectiveness of our method and demonstrated its advantages over state-of-art clustering methods. △ Less

Submitted 16 March, 2019; originally announced March 2019.

arXiv:1903.01048 [pdf, other]

Early Detection of Influenza outbreaks in the United States

Authors: Kai Liu, Ravi Srinivasan, Lauren Ancel Meyers

Abstract: Public health surveillance systems often fail to detect emerging infectious diseases, particularly in resource limited settings. By integrating relevant clinical and internet-source data, we can close critical gaps in coverage and accelerate outbreak detection. Here, we present a multivariate algorithm that uses freely available online data to provide early warning of emerging influenza epidemics… ▽ More Public health surveillance systems often fail to detect emerging infectious diseases, particularly in resource limited settings. By integrating relevant clinical and internet-source data, we can close critical gaps in coverage and accelerate outbreak detection. Here, we present a multivariate algorithm that uses freely available online data to provide early warning of emerging influenza epidemics in the US. We evaluated 240 candidate predictors and found that the most predictive combination does \textit{not} include surveillance or electronic health records data, but instead consists of eight Google search and Wikipedia pageview time series reflecting changing levels of interest in influenza-related topics. In cross validation on 2010-2016 data, this algorithm sounds alarms an average of 16.4 weeks prior to influenza activity reaching the Center for Disease Control and Prevention (CDC) threshold for declaring the start of the season. In an out-of-sample test on data from the rapidly-emerging fall wave of the 2009 H1N1 pandemic, it recognized the threat five weeks in advance of this surveillance threshold. Simpler algorithms, including fixed week-of-the-year triggers, lag the optimized alarms by only a few weeks when detecting seasonal influenza, but fail to provide early warning in the 2009 pandemic scenario. This demonstrates a robust method for designing next generation outbreak detection algorithms. By combining scan statistics with machine learning, it identifies tractable combinations of data sources (from among thousands of candidates) that can provide early warning of emerging infectious disease threats worldwide. △ Less

Submitted 3 March, 2019; originally announced March 2019.

arXiv:1902.08411 [pdf, other]

doi 10.1016/j.neunet.2020.03.003

Probabilistic Inference of Binary Markov Random Fields in Spiking Neural Networks through Mean-field Approximation

Authors: Ya**g Zheng, Shanshan Jia, Zhaofei Yu, Tiejun Huang, Jian K. Liu, Yonghong Tian

Abstract: Recent studies have suggested that the cognitive process of the human brain is realized as probabilistic inference and can be further modeled by probabilistic graphical models like Markov random fields. Nevertheless, it remains unclear how probabilistic inference can be implemented by a network of spiking neurons in the brain. Previous studies have tried to relate the inference equation of binary… ▽ More Recent studies have suggested that the cognitive process of the human brain is realized as probabilistic inference and can be further modeled by probabilistic graphical models like Markov random fields. Nevertheless, it remains unclear how probabilistic inference can be implemented by a network of spiking neurons in the brain. Previous studies have tried to relate the inference equation of binary Markov random fields to the dynamic equation of spiking neural networks through belief propagation algorithm and reparameterization, but they are valid only for Markov random fields with limited network structure. In this paper, we propose a spiking neural network model that can implement inference of arbitrary binary Markov random fields. Specifically, we design a spiking recurrent neural network and prove that its neuronal dynamics are mathematically equivalent to the inference process of Markov random fields by adopting mean-field theory. Furthermore, our mean-field approach unifies previous works. Theoretical analysis and experimental results, together with the application to image denoising, demonstrate that our proposed spiking neural network can get comparable results to that of mean-field inference. △ Less

Submitted 12 March, 2020; v1 submitted 22 February, 2019; originally announced February 2019.

Comments: Accepted in Neural Networks

arXiv:1811.05642 [pdf, other]

Drop** Symmetry for Fast Symmetric Nonnegative Matrix Factorization

Authors: Zhihui Zhu, Xiao Li, Kai Liu, Qiuwei Li

Abstract: Symmetric nonnegative matrix factorization (NMF), a special but important class of the general NMF, is demonstrated to be useful for data analysis and in particular for various clustering tasks. Unfortunately, designing fast algorithms for Symmetric NMF is not as easy as for the nonsymmetric counterpart, the latter admitting the splitting property that allows efficient alternating-type algorithms.… ▽ More Symmetric nonnegative matrix factorization (NMF), a special but important class of the general NMF, is demonstrated to be useful for data analysis and in particular for various clustering tasks. Unfortunately, designing fast algorithms for Symmetric NMF is not as easy as for the nonsymmetric counterpart, the latter admitting the splitting property that allows efficient alternating-type algorithms. To overcome this issue, we transfer the symmetric NMF to a nonsymmetric one, then we can adopt the idea from the state-of-the-art algorithms for nonsymmetric NMF to design fast algorithms solving symmetric NMF. We rigorously establish that solving nonsymmetric reformulation returns a solution for symmetric NMF and then apply fast alternating based algorithms for the corresponding reformulated problem. Furthermore, we show these fast algorithms admit strong convergence guarantee in the sense that the generated sequence is convergent at least at a sublinear rate and it converges globally to a critical point of the symmetric NMF. We conduct experiments on both synthetic data and image clustering to support our result. △ Less

Submitted 14 November, 2018; originally announced November 2018.

Comments: Accepted in NIPS 2018

MSC Class: 65K10; 90C26; 68Q25; 68W40;

Showing 1–50 of 79 results for author: Liu, K