Search | arXiv e-print repository

OAML: Outlier Aware Metric Learning for OOD Detection Enhancement

Authors: Heng Gao, Zhuolin He, Shoumeng Qiu, Jian Pu

Abstract: Out-of-distribution (OOD) detection methods have been developed to identify objects that a model has not seen during training. The Outlier Exposure (OE) methods use auxiliary datasets to train OOD detectors directly. However, the collection and learning of representative OOD samples may pose challenges. To tackle these issues, we propose the Outlier Aware Metric Learning (OAML) framework. The main… ▽ More Out-of-distribution (OOD) detection methods have been developed to identify objects that a model has not seen during training. The Outlier Exposure (OE) methods use auxiliary datasets to train OOD detectors directly. However, the collection and learning of representative OOD samples may pose challenges. To tackle these issues, we propose the Outlier Aware Metric Learning (OAML) framework. The main idea of our method is to use the k-NN algorithm and Stable Diffusion model to generate outliers for training at the feature level without making any distributional assumptions. To increase feature discrepancies in the semantic space, we develop a mutual information-based contrastive learning approach for learning from OOD data effectively. Both theoretical and empirical results confirm the effectiveness of this contrastive learning technique. Furthermore, we incorporate knowledge distillation into our learning framework to prevent degradation of in-distribution classification accuracy. The combination of contrastive learning and knowledge distillation algorithms significantly enhances the performance of OOD detection. Experimental results across various datasets show that our method significantly outperforms previous OE methods. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2404.01153 [pdf, other]

TransFusion: Covariate-Shift Robust Transfer Learning for High-Dimensional Regression

Authors: Zelin He, Ying Sun, **gyuan Liu, Runze Li

Abstract: The main challenge that sets transfer learning apart from traditional supervised learning is the distribution shift, reflected as the shift between the source and target models and that between the marginal covariate distributions. In this work, we tackle model shifts in the presence of covariate shifts in the high-dimensional regression setting. Specifically, we propose a two-step method with a n… ▽ More The main challenge that sets transfer learning apart from traditional supervised learning is the distribution shift, reflected as the shift between the source and target models and that between the marginal covariate distributions. In this work, we tackle model shifts in the presence of covariate shifts in the high-dimensional regression setting. Specifically, we propose a two-step method with a novel fused-regularizer that effectively leverages samples from source tasks to improve the learning performance on a target task with limited samples. Nonasymptotic bound is provided for the estimation error of the target model, showing the robustness of the proposed method to covariate shifts. We further establish conditions under which the estimator is minimax-optimal. Additionally, we extend the method to a distributed setting, allowing for a pretraining-finetuning strategy, requiring just one round of communication while retaining the estimation rate of the centralized version. Numerical tests validate our theory, highlighting the method's robustness to covariate shifts. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted by the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

arXiv:2404.00481 [pdf, other]

Convolutional Bayesian Filtering

Authors: Wenhan Cao, Shiqi Liu, Chang Liu, Zeyu He, Stephen S. -T. Yau, Shengbo Eben Li

Abstract: Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence proba… ▽ More Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence probability of one event, given the second event. In this paper, we find that by adding an additional event that stipulates an inequality condition, we can transform the conditional probability into a special integration that is analogous to convolution. Based on this transformation, we show that both transition probability and output probability can be generalized to convolutional forms, resulting in a more general filtering framework that we call convolutional Bayesian filtering. This new framework encompasses standard Bayesian filtering as a special case when the distance metric of the inequality condition is selected as Dirac delta function. It also allows for a more nuanced consideration of model mismatch by choosing different types of inequality conditions. For instance, when the distance metric is defined in a distributional sense, the transition probability and output probability can be approximated by simply rescaling them into fractional powers. Under this framework, a robust version of Kalman filter can be constructed by only altering the noise covariance matrix, while maintaining the conjugate nature of Gaussian distributions. Finally, we exemplify the effectiveness of our approach by resha** classic filtering algorithms into convolutional versions, including Kalman filter, extended Kalman filter, unscented Kalman filter and particle filter. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.13565 [pdf, other]

AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression

Authors: Zelin He, Ying Sun, **gyuan Liu, Runze Li

Abstract: We consider the transfer learning problem in the high dimensional setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this… ▽ More We consider the transfer learning problem in the high dimensional setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a novel fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. The non-asymptotic rates are established, which recover existing near-minimax optimal rates in special cases. The effectiveness of the proposed method is validated using both synthetic and real data. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Technical Report

arXiv:2402.13934 [pdf, other]

Do Efficient Transformers Really Save Computation?

Authors: Kai Yang, Jan Ackermann, Zhenyu He, Guhao Feng, Bohang Zhang, Yunzhen Feng, Qiwei Ye, Di He, Liwei Wang

Abstract: As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Transformer alternatives have been proposed, none provide theoretical guarantees that they are a suitable replacement for the standard Transformer. This ma… ▽ More As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Transformer alternatives have been proposed, none provide theoretical guarantees that they are a suitable replacement for the standard Transformer. This makes it challenging to identify when to use a specific model and what directions to prioritize for further investigation. In this paper, we aim to understand the capabilities and limitations of efficient Transformers, specifically the Sparse Transformer and the Linear Transformer. We focus on their reasoning capability as exhibited by Chain-of-Thought (CoT) prompts and follow previous works to model them as Dynamic Programming (DP) problems. Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size. Nonetheless, we identify a class of DP problems for which these models can be more efficient than the standard Transformer. We confirm our theoretical results through experiments on representative DP tasks, adding to the understanding of efficient Transformers' practical strengths and weaknesses. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12724 [pdf, other]

Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression

Authors: Zhaomeng Chen, Zihuai He, Benjamin B. Chu, Jiaqi Gu, Tim Morrison, Chiara Sabatti, Emmanuel Candès

Abstract: Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to a… ▽ More Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs (He et al. [2022]) and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2401.16776 [pdf, other]

Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods

Authors: Xiliang Yang, Yifei Xiong, Zhijian He

Abstract: Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. They are devoted to learning the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As a SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (20… ▽ More Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. They are devoted to learning the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As a SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs notably and scales to high dimensional data. However, the APT method bears the computation of an expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic APT was proposed to solve this by discretizing the normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we propose a nested APT method to estimate the involved nested expectation instead. This facilitates establishing the convergence analysis. Since the nested estimators for the loss function and its gradient are biased, we make use of unbiased multi-level Monte Carlo (MLMC) estimators for debiasing. To further reduce the excessive variance of the unbiased estimators, this paper also develops some truncated MLMC estimators by taking account of the trade-off between the bias and the average cost. Numerical experiments for approximating complex posteriors with multimodal in moderate dimensions are provided. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 28 pages, 4 figures

arXiv:2401.16421 [pdf, other]

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Authors: Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Liwei Wang, **g**g Xu, Zhi Zhang, Hongxia Yang, Di He

Abstract: In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute pos… ▽ More In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities. △ Less

Submitted 17 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 17 pages, 7 figures, 8 tables; ICML 2024 Camera Ready version; Code: https://github.com/zhenyuhe00/BiPE

arXiv:2401.08941

A Powerful and Precise Feature-level Filter using Group Knockoffs

Authors: Jiaqi Gu, Zihuai He

Abstract: Selecting important features that have substantial effects on the response with provable type-I error rate control is a fundamental concern in statistics, with wide-ranging practical applications. Existing knockoff filters, although shown to provide theoretical guarantee on false discovery rate (FDR) control, often struggle to strike a balance between high power and precision in pinpointing import… ▽ More Selecting important features that have substantial effects on the response with provable type-I error rate control is a fundamental concern in statistics, with wide-ranging practical applications. Existing knockoff filters, although shown to provide theoretical guarantee on false discovery rate (FDR) control, often struggle to strike a balance between high power and precision in pinpointing important features when there exist large groups of strongly correlated features. To address this challenge, we develop a new filter using group knockoffs to achieve both powerful and precise selection of important features. Via experiments of simulated data and analysis of a real Alzheimer's disease genetic dataset, it is found that the proposed filter can not only control the proportion of false discoveries but also identify important features with comparable power and greater precision than the existing group knockoffs filter. △ Less

Submitted 27 February, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: We need a major revision of this paper

arXiv:2401.02154 [pdf, other]

Disentangle Estimation of Causal Effects from Cross-Silo Data

Authors: Yuxuan Liu, Haozhao Wang, Shuang Wang, Zhiming He, Wenchao Xu, Jialiang Zhu, Fan Yang

Abstract: Estimating causal effects among different events is of great importance to critical fields such as drug development. Nevertheless, the data features associated with events may be distributed across various silos and remain private within respective parties, impeding direct information exchange between them. This, in turn, can result in biased estimations of local causal effects, which rely on the… ▽ More Estimating causal effects among different events is of great importance to critical fields such as drug development. Nevertheless, the data features associated with events may be distributed across various silos and remain private within respective parties, impeding direct information exchange between them. This, in turn, can result in biased estimations of local causal effects, which rely on the characteristics of only a subset of the covariates. To tackle this challenge, we introduce an innovative disentangle architecture designed to facilitate the seamless cross-silo transmission of model parameters, enriched with causal mechanisms, through a combination of shared and private branches. Besides, we introduce global constraints into the equation to effectively mitigate bias within the various missing domains, thereby elevating the accuracy of our causal effect estimation. Extensive experiments conducted on new semi-synthetic datasets show that our method outperforms state-of-the-art baselines. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2401.00461 [pdf, other]

A Penalized Functional Linear Cox Regression Model for Spatially-defined Environmental Exposure with an Estimated Buffer Distance

Authors: Jooyoung Lee, Zhibing He, Charlotte Roscoe, Peter James, Li Xu, Donna Spiegelman, David Zucker, Molin Wang

Abstract: In environmental health research, it is of interest to understand the effect of the neighborhood environment on health. Researchers have shown a protective association between green space around a person's residential address and depression outcomes. In measuring exposure to green space, distance buffers are often used. However, buffer distances differ across studies. Typically, the buffer distanc… ▽ More In environmental health research, it is of interest to understand the effect of the neighborhood environment on health. Researchers have shown a protective association between green space around a person's residential address and depression outcomes. In measuring exposure to green space, distance buffers are often used. However, buffer distances differ across studies. Typically, the buffer distance is determined by researchers a priori. It is unclear how to identify an appropriate buffer distance for exposure assessment. To address geographic uncertainty problem for exposure assessment, we present a domain selection algorithm based on the penalized functional linear Cox regression model. The theoretical properties of our proposed method are studied and simulation studies are conducted to evaluate finite sample performances of our method. The proposed method is illustrated in a study of associations of green space exposure with depression and/or antidepressant use in the Nurses' Health Study. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 27 pages, 5 figures

arXiv:2311.12530 [pdf, other]

An efficient likelihood-free Bayesian inference method based on sequential neural posterior estimation

Authors: Yifei Xiong, Xiliang Yang, Sanguo Zhang, Zhijian He

Abstract: Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. Unlike approximate Bayesian computation, SNPE techniques learn the posterior from sequential simulation using neural network-based conditional density estimators by minimizing a specific loss function. The SNPE method proposed by Lueckmann et a… ▽ More Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. Unlike approximate Bayesian computation, SNPE techniques learn the posterior from sequential simulation using neural network-based conditional density estimators by minimizing a specific loss function. The SNPE method proposed by Lueckmann et al. (2017) used a calibration kernel to boost the sample weights around the observed data, resulting in a concentrated loss function. However, the use of calibration kernels may increase the variances of both the empirical loss and its gradient, making the training inefficient. To improve the stability of SNPE, this paper proposes to use an adaptive calibration kernel and several variance reduction techniques. The proposed method greatly speeds up the process of training, and provides a better approximation of the posterior than the original SNPE method and some existing competitors as confirmed by numerical experiments. △ Less

Submitted 27 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: 30 pages, 7 figures

arXiv:2310.15069 [pdf, other]

Second-order group knockoffs with applications to GWAS

Authors: Benjamin B Chu, Jiaqi Gu, Zhaomeng Chen, Tim Morrison, Emmanuel Candes, Zihuai He, Chiara Sabatti

Abstract: Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying… ▽ More Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance. While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct "group knockoffs." While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank. The described algorithms are implemented in an open-source Julia package Knockoffs.jl, for which both R and Python wrappers are available. △ Less

Submitted 3 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: 46 pages, 10 figures, 2 tables, 3 algorithms

arXiv:2310.09493 [pdf, other]

Summary Statistics Knockoffs Inference with Family-wise Error Rate Control

Authors: Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He

Abstract: Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to sele… ▽ More Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent to the response with provable FWER control. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of proposed method over the existing alternatives in both statistical power and computational efficiency. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: 35 pages

arXiv:2310.04030 [pdf]

Robust inference with GhostKnockoffs in genome-wide association studies

Authors: Xinran Qi, Michael E. Belloy, Jiaqi Gu, Xiaoxia Liu, Hua Tang, Zihuai He

Abstract: Genome-wide association studies (GWASs) have been extensively adopted to depict the underlying genetic architecture of complex diseases. Motivated by GWASs' limitations in identifying small effect loci to understand complex traits' polygenicity and fine-map** putative causal variants from proxy ones, we propose a knockoff-based method which only requires summary statistics from GWASs and demonst… ▽ More Genome-wide association studies (GWASs) have been extensively adopted to depict the underlying genetic architecture of complex diseases. Motivated by GWASs' limitations in identifying small effect loci to understand complex traits' polygenicity and fine-map** putative causal variants from proxy ones, we propose a knockoff-based method which only requires summary statistics from GWASs and demonstrate its validity in the presence of relatedness. We show that GhostKnockoffs inference is robust to its input Z-scores as long as they are from valid marginal association tests and their correlations are consistent with the correlations among the corresponding genetic variants. The property generalizes GhostKnockoffs to other GWASs settings, such as the meta-analysis of multiple overlap** studies and studies based on association test statistics deviated from score tests. We demonstrate GhostKnockoffs' performance using empirical simulation and a meta-analysis of nine European ancestral genome-wide association studies and whole exome/genome sequencing studies. Both results demonstrate that GhostKnockoffs identify more putative causal variants with weak genotype-phenotype associations that are missed by conventional GWASs. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2308.04368 [pdf, other]

Multiple Testing of Local Extrema for Detection of Structural Breaks in Piecewise Linear Models

Authors: Zhibing He, Dan Cheng, Yunpeng Zhao

Abstract: In this paper, we propose a new generic method for detecting the number and locations of structural breaks or change points in piecewise linear models under stationary Gaussian noise. Our method transforms the change point detection problem into identifying local extrema (local maxima and local minima) through kernel smoothing and differentiation of the data sequence. By computing p-values for all… ▽ More In this paper, we propose a new generic method for detecting the number and locations of structural breaks or change points in piecewise linear models under stationary Gaussian noise. Our method transforms the change point detection problem into identifying local extrema (local maxima and local minima) through kernel smoothing and differentiation of the data sequence. By computing p-values for all local extrema based on peak height distributions of smooth Gaussian processes, we utilize the Benjamini-Hochberg procedure to identify significant local extrema as the detected change points. Our method can distinguish between two types of change points: continuous breaks (Type I) and jumps (Type II). We study three scenarios of piecewise linear signals, namely pure Type I, pure Type II and a mixture of Type I and Type II change points. The results demonstrate that our proposed method ensures asymptotic control of the False Discover Rate (FDR) and power consistency, as sequence length, slope changes, and jump size increase. Furthermore, compared to traditional change point detection methods based on recursive segmentation, our approach only requires a single test for all candidate local extrema, thereby achieving the smallest computational complexity proportionate to the data sequence length. Additionally, numerical studies illustrate that our method maintains FDR control and power consistency, even in non-asymptotic cases when the size of slope changes or jumps is not large. We have implemented our method in the R package "dSTEM" (available from https://cran.r-project.org/web/packages/dSTEM). △ Less

Submitted 4 December, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

arXiv:2308.03785 [pdf, other]

Network Inference Using the Hub Model and Variants

Authors: Zhibing He, Yunpeng Zhao, Peter Bickel, Charles Weko, Dan Cheng, Jirui Wang

Abstract: Statistical network analysis primarily focuses on inferring the parameters of an observed network. In many applications, especially in the social sciences, the observed data is the groups formed by individual subjects. In these applications, the network is itself a parameter of a statistical model. Zhao and Weko (2019) propose a model-based approach, called the hub model, to infer implicit network… ▽ More Statistical network analysis primarily focuses on inferring the parameters of an observed network. In many applications, especially in the social sciences, the observed data is the groups formed by individual subjects. In these applications, the network is itself a parameter of a statistical model. Zhao and Weko (2019) propose a model-based approach, called the hub model, to infer implicit networks from grou** behavior. The hub model assumes that each member of the group is brought together by a member of the group called the hub. The set of members which can serve as a hub is called the hub set. The hub model belongs to the family of Bernoulli mixture models. Identifiability of Bernoulli mixture model parameters is a notoriously difficult problem. This paper proves identifiability of the hub model parameters and estimation consistency under mild conditions. Furthermore, this paper generalizes the hub model by introducing a model component that allows hubless groups in which individual nodes spontaneously appear independent of any other individual. We refer to this additional component as the null component. The new model bridges the gap between the hub model and the degenerate case of the mixture model -- the Bernoulli product. Identifiability and consistency are also proved for the new model. In addition, a penalized likelihood approach is proposed to estimate the hub set when it is unknown. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2004.09709

arXiv:2307.12189 [pdf, other]

Speed Limit: Obey, or Not Obey?

Authors: Zhengbing He, Mirco Nanni, Luca Pappalardo, Paolo Santi, Carlo Ratti

Abstract: It is commonly expected that drivers maintain a driving speed that is lower than or around the posted speed limit, as failure to obey may result in safety risks and fines. By taking randomly selected road segments as examples, this study compares the percentages of speeding vehicles in five countries worldwide, namely, two European countries (Germany and Italy), two Asian countries (Japan and Chin… ▽ More It is commonly expected that drivers maintain a driving speed that is lower than or around the posted speed limit, as failure to obey may result in safety risks and fines. By taking randomly selected road segments as examples, this study compares the percentages of speeding vehicles in five countries worldwide, namely, two European countries (Germany and Italy), two Asian countries (Japan and China), and one North American country (the United States). Contrary to expectations, our results show that more than 80% of drivers violate the posted speed limits in the studied road segments in Italy, Japan, and the United States. In particular, a significant portion (45.3%) of drivers in Italy exceed the posted speed limit by a substantial margin (30 km/h), while few speeding vehicles are observed in the road segment examined in China. Meanwhile, it is found that drivers on low-speed-limit roads are more likely to exceed the posted speed limit, particularly when there are fewer on-road vehicles. The comparison of different countries' speeding fines indicates that for the purpose of preventing speeding, increasing fines (as Italy has done) is less effective than enhancing supervision (as China has done). The findings remind law enforcement agencies and traffic authorities of the importance of the supervision of driver's behavior and the necessity of revisiting the rationale for the current speed limit settings. △ Less

Submitted 27 November, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

arXiv:2307.07346 [pdf, other]

A testing-based approach to assess the clusterability of categorical data

Authors: Lianyu Hu, Junjie Dong, Mudi Jiang, Yan Liu, Zengyou He

Abstract: The objective of clusterability evaluation is to check whether a clustering structure exists within the data set. As a crucial yet often-overlooked issue in cluster analysis, it is essential to conduct such a test before applying any clustering algorithm. If a data set is unclusterable, any subsequent clustering analysis would not yield valid results. Despite its importance, the majority of existi… ▽ More The objective of clusterability evaluation is to check whether a clustering structure exists within the data set. As a crucial yet often-overlooked issue in cluster analysis, it is essential to conduct such a test before applying any clustering algorithm. If a data set is unclusterable, any subsequent clustering analysis would not yield valid results. Despite its importance, the majority of existing studies focus on numerical data, leaving the clusterability evaluation issue for categorical data as an open problem. Here we present TestCat, a testing-based approach to assess the clusterability of categorical data in terms of an analytical $p$-value. The key idea underlying TestCat is that clusterable categorical data possess many strongly correlated attribute pairs and hence the sum of chi-squared statistics of all attribute pairs is employed as the test statistic for $p$-value calculation. We apply our method to a set of benchmark categorical data sets, showing that TestCat outperforms those solutions based on existing clusterability evaluation methods for numeric data. To the best of our knowledge, our work provides the first way to effectively recognize the clusterability of categorical data in a statistically sound manner. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: 19 pages, 13 figures

arXiv:2303.04095 [pdf, other]

Investigating and modeling day-to-day route choices based on laboratory experiments. Part II: A route-dependent attraction-based stochastic process model

Authors: Hang Qi, Ning Jia, Xiaobo Qu, Zhengbing He

Abstract: To explain day-to-day (DTD) route-choice behaviors and traffic dynamics observed in a series of lab experiments, Part I of this research proposed a discrete choice-based analytical dynamic model (Qi et al., 2023). Although the deterministic model could well reproduce the experimental observations, it converges to a stable equilibrium of route flow while the observed DTD evolution is apparently wit… ▽ More To explain day-to-day (DTD) route-choice behaviors and traffic dynamics observed in a series of lab experiments, Part I of this research proposed a discrete choice-based analytical dynamic model (Qi et al., 2023). Although the deterministic model could well reproduce the experimental observations, it converges to a stable equilibrium of route flow while the observed DTD evolution is apparently with random oscillations. To overcome the limitation, the paper proposes a route-dependent attraction-based stochastic process (RDAB-SP) model based on the same behavioral assumptions in Part I of this research. Through careful comparison between the model-based estimation and experimental observations, it is demonstrated that the proposed RDAB-SP model can accurately reproduce the random oscillations both in terms of flow switching and route flow evolution. To the best of our knowledge, this is the first attempt to explain and model experimental observations by using stochastic process DTD models, and it is interesting to find that the seemingly unanticipated phenomena (i.e., random route switching behavior) is actually dominated by simple rules, i.e., independent and probability-based route-choice behavior. Finally, an approximated model is developed to help simulate the stochastic process and evaluate the equilibrium distribution in a simple and efficient manner, making the proposed model a useful and practical tool in transportation policy design. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.04088 [pdf, other]

doi 10.1016/j.tra.2022.11.013

Investigating day-to-day route choices based on multi-scenario laboratory experiments. Part I: Route-dependent attraction and its modeling

Authors: Hang Qi, Ning Jia, Xiaobo Qu, Zhengbing He

Abstract: In the area of urban transportation networks, a growing number of day-to-day (DTD) traffic dynamic theories have been proposed to describe the network flow evolution, and an increasing amount of laboratory experiments have been conducted to observe travelers' behavior regularities. However, the "communication" between theorists and experimentalists has not been made well. This paper devotes to 1)… ▽ More In the area of urban transportation networks, a growing number of day-to-day (DTD) traffic dynamic theories have been proposed to describe the network flow evolution, and an increasing amount of laboratory experiments have been conducted to observe travelers' behavior regularities. However, the "communication" between theorists and experimentalists has not been made well. This paper devotes to 1) detecting unanticipated behavior regularities by conducting a series of laboratory experiments, and 2) improving existing DTD dynamics theories by embedding the observed behavior regularities into a route choice model. First, 312 subjects participated in one of the eight decision-making scenarios and make route choices repeatedly in congestible parallel-route networks. Second, three route-switching behavior patterns that cannot be fully explained by the classic route-choice models are observed. Third, to enrich the explanation power of a discrete route-choice model, behavioral assumptions of route-dependent attractions, i.e., route-dependent inertia and preference, are introduced. An analytical DTD dynamic model is accordingly proposed and proven to steadily converge to a unique equilibrium state. Finally, the proposed DTD model could satisfactorily reproduce the observations in various datasets. The research results can help transportation science theorists to make the best use of laboratory experimentation and to build network equilibrium or DTD dynamic models with both real behavioral basis and neat mathematical properties. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Journal ref: Transportation Research Part A, 2023

arXiv:2211.03956 [pdf, other]

Significance-Based Categorical Data Clustering

Authors: Lianyu Hu, Mudi Jiang, Yan Liu, Zengyou He

Abstract: Although numerous algorithms have been proposed to solve the categorical data clustering problem, how to access the statistical significance of a set of categorical clusters remains unaddressed. To fulfill this void, we employ the likelihood ratio test to derive a test statistic that can serve as a significance-based objective function in categorical data clustering. Consequently, a new clustering… ▽ More Although numerous algorithms have been proposed to solve the categorical data clustering problem, how to access the statistical significance of a set of categorical clusters remains unaddressed. To fulfill this void, we employ the likelihood ratio test to derive a test statistic that can serve as a significance-based objective function in categorical data clustering. Consequently, a new clustering algorithm is proposed in which the significance-based objective function is optimized via a Monte Carlo search procedure. As a by-product, we can further calculate an empirical $p$-value to assess the statistical significance of a set of clusters and develop an improved gap statistic for estimating the cluster number. Extensive experimental studies suggest that our method is able to achieve comparable performance to state-of-the-art categorical data clustering algorithms. Moreover, the effectiveness of such a significance-based formulation on statistical cluster validation and cluster number estimation is demonstrated through comprehensive empirical results. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: 36 pages, 6 figures

arXiv:2206.00381 [pdf, ps, other]

The statistical nature of h-index of a network node

Authors: Yan Liu, Mudi Jiang, Lianyu Hu, Zengyou He

Abstract: Evaluating the importance of a network node is a crucial task in network science and graph data mining. H-index is a popular centrality measure for this task, however, there is still a lack of its interpretation from a rigorous statistical aspect. Here we show the statistical nature of h-index from the perspective of order statistics, and we obtain a new family of centrality indices by generalizin… ▽ More Evaluating the importance of a network node is a crucial task in network science and graph data mining. H-index is a popular centrality measure for this task, however, there is still a lack of its interpretation from a rigorous statistical aspect. Here we show the statistical nature of h-index from the perspective of order statistics, and we obtain a new family of centrality indices by generalizing the h-index along this direction. The theoretical and empirical evidences show that such a statistical interpretation enables us to obtain a general and versatile framework for quantifying the importance of a network node. Under this framework, many new centrality indices can be derived and some of which can be more accurate and robust than h-index. We believe that this research opens up new avenues for develo** more effective indices for node importance quantification from a viewpoint that still remains unexplored. △ Less

Submitted 19 May, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

arXiv:2202.10991 [pdf]

Temporal Subty** of Alzheimer's Disease Using Medical Conditions Preceding Alzheimer's Disease Onset in Electronic Health Records

Authors: Zhe He, Shubo Tian, Arslan Erdengasileng, Neil Charness, Jiang Bian

Abstract: Subty** of Alzheimer's disease (AD) can facilitate diagnosis, treatment, prognosis and disease management. It can also support the testing of new prevention and treatment strategies through clinical trials. In this study, we employed spectral clustering to cluster 29,922 AD patients in the OneFlorida Data Trust using their longitudinal EHR data of diagnosis and conditions into four subtypes. The… ▽ More Subty** of Alzheimer's disease (AD) can facilitate diagnosis, treatment, prognosis and disease management. It can also support the testing of new prevention and treatment strategies through clinical trials. In this study, we employed spectral clustering to cluster 29,922 AD patients in the OneFlorida Data Trust using their longitudinal EHR data of diagnosis and conditions into four subtypes. These subtypes exhibit different patterns of progression of other conditions prior to the first AD diagnosis. In addition, according to the results of various statistical tests, these subtypes are also significantly different with respect to demographics, mortality, and prescription medications after the AD diagnosis. This study could potentially facilitate early detection and personalized treatment of AD as well as data-driven generalizability assessment of clinical trials for AD. △ Less

Submitted 22 February, 2022; originally announced February 2022.

Comments: 10 pages

arXiv:2109.14719 [pdf]

Deep neural networks with controlled variable selection for the identification of putative causal genetic variants

Authors: Peyman H. Kassani, Fred Lu, Yann Le Guen, Zihuai He

Abstract: Deep neural networks (DNN) have been used successfully in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. In this paper, we consider the problem of scalable, robust variable selection in DNN for the identification of putative causal genetic variants in genome sequencing studies. We identif… ▽ More Deep neural networks (DNN) have been used successfully in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. In this paper, we consider the problem of scalable, robust variable selection in DNN for the identification of putative causal genetic variants in genome sequencing studies. We identified a pronounced randomness in feature selection in DNN due to its stochastic nature, which may hinder interpretability and give rise to misleading results. We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies. The merit of the proposed method includes: (1) flexible modelling of the non-linear effect of genetic variants to improve statistical power; (2) multiple knockoffs in the input layer to rigorously control false discovery rate; (3) hierarchical layers to substantially reduce the number of weight parameters and activations to improve computational efficiency; (4) de-randomized feature selection to stabilize identified signals. We evaluated the proposed method in extensive simulation studies and applied it to the analysis of Alzheimer disease genetics. We showed that the proposed method, when compared to conventional linear and nonlinear methods, can lead to substantially more discoveries. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2106.00173 [pdf, ps, other]

doi 10.1007/s00521-021-05888-w

Enhancing Trajectory Prediction using Sparse Outputs: Application to Team Sports

Authors: Brandon Victor, Aiden Nibali, Zhen He, David L. Carey

Abstract: Sophisticated trajectory prediction models that effectively mimic team dynamics have many potential uses for sports coaches, broadcasters and spectators. However, through experiments on soccer data we found that it can be surprisingly challenging to train a deep learning model for player trajectory prediction which outperforms linear extrapolation on average distance between predicted and true fut… ▽ More Sophisticated trajectory prediction models that effectively mimic team dynamics have many potential uses for sports coaches, broadcasters and spectators. However, through experiments on soccer data we found that it can be surprisingly challenging to train a deep learning model for player trajectory prediction which outperforms linear extrapolation on average distance between predicted and true future trajectories. We propose and test a novel method for improving training by predicting a sparse trajectory and interpolating using constant acceleration, which improves performance for several models. This interpolation can also be used on models that aren't trained with sparse outputs, and we find that this consistently improves performance for all tested models. Additionally, we find that the accuracy of predicted trajectories for a subset of players can be improved by conditioning on the full trajectories of the other players, and that this is further improved when combined with sparse predictions. We also propose a novel architecture using graph networks and multi-head attention (GraN-MA) which achieves better performance than other tested state-of-the-art models on our dataset and is trivially adapted for both sparse trajectories and full-trajectory conditioned trajectory prediction. △ Less

Submitted 31 May, 2021; originally announced June 2021.

Comments: 10 pages (not including references), 7 figures. Published in Neural Computing and Applications on 20 March 2021

ACM Class: I.2.6

arXiv:2104.12476 [pdf, other]

EigenGAN: Layer-Wise Eigen-Learning for GANs

Authors: Zhenliang He, Meina Kan, Shiguang Shan

Abstract: Recent studies on Generative Adversarial Network (GAN) reveal that different layers of a generative CNN hold different semantics of the synthesized images. However, few GAN models have explicit dimensions to control the semantic attributes represented in a specific layer. This paper proposes EigenGAN which is able to unsupervisedly mine interpretable and controllable dimensions from different gene… ▽ More Recent studies on Generative Adversarial Network (GAN) reveal that different layers of a generative CNN hold different semantics of the synthesized images. However, few GAN models have explicit dimensions to control the semantic attributes represented in a specific layer. This paper proposes EigenGAN which is able to unsupervisedly mine interpretable and controllable dimensions from different generator layers. Specifically, EigenGAN embeds one linear subspace with orthogonal basis into each generator layer. Via generative adversarial training to learn a target distribution, these layer-wise subspaces automatically discover a set of "eigen-dimensions" at each layer corresponding to a set of semantic attributes or interpretable variations. By traversing the coefficient of a specific eigen-dimension, the generator can produce samples with continuous changes corresponding to a specific semantic attribute. Taking the human face for example, EigenGAN can discover controllable dimensions for high-level concepts such as pose and gender in the subspace of deep layers, as well as low-level concepts such as hue and color in the subspace of shallow layers. Moreover, in the linear case, we theoretically prove that our algorithm derives the principal components as PCA does. Codes can be found in https://github.com/LynnHo/EigenGAN-Tensorflow. △ Less

Submitted 9 August, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

Comments: ICCV 2021. Code: https://github.com/LynnHo/EigenGAN-Tensorflow

arXiv:2103.17236 [pdf, other]

High-Dimensional Uncertainty Quantification via Tensor Regression with Rank Determination and Adaptive Sampling

Authors: Zichang He, Zheng Zhang

Abstract: Fabrication process variations can significantly influence the performance and yield of nano-scale electronic and photonic circuits. Stochastic spectral methods have achieved great success in quantifying the impact of process variations, but they suffer from the curse of dimensionality. Recently, low-rank tensor methods have been developed to mitigate this issue, but two fundamental challenges rem… ▽ More Fabrication process variations can significantly influence the performance and yield of nano-scale electronic and photonic circuits. Stochastic spectral methods have achieved great success in quantifying the impact of process variations, but they suffer from the curse of dimensionality. Recently, low-rank tensor methods have been developed to mitigate this issue, but two fundamental challenges remain open: how to automatically determine the tensor rank and how to adaptively pick the informative simulation samples. This paper proposes a novel tensor regression method to address these two challenges. We use a $\ell_{q}/ \ell_{2}$ group-sparsity regularization to determine the tensor rank. The resulting optimization problem can be efficiently solved via an alternating minimization solver. We also propose a two-stage adaptive sampling method to reduce the simulation cost. Our method considers both exploration and exploitation via the estimated Voronoi cell volume and nonlinearity measurement respectively. The proposed model is verified with synthetic and some realistic circuit benchmarks, on which our method can well capture the uncertainty caused by 19 to 100 random variables with only 100 to 600 simulation samples. △ Less

Submitted 27 June, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

Comments: 12 pages, accepted by IEEE Trans. Components, Packaging and Manufacturing Technology

arXiv:2103.12345 [pdf, other]

The Success of AdaBoost and Its Application in Portfolio Management

Authors: Yijian Chuan, Chaoyi Zhao, Zhenrui He, Lan Wu

Abstract: We develop a novel approach to explain why AdaBoost is a successful classifier. By introducing a measure of the influence of the noise points (ION) in the training data for the binary classification problem, we prove that there is a strong connection between the ION and the test error. We further identify that the ION of AdaBoost decreases as the iteration number or the complexity of the base lear… ▽ More We develop a novel approach to explain why AdaBoost is a successful classifier. By introducing a measure of the influence of the noise points (ION) in the training data for the binary classification problem, we prove that there is a strong connection between the ION and the test error. We further identify that the ION of AdaBoost decreases as the iteration number or the complexity of the base learners increases. We confirm that it is impossible to obtain a consistent classifier without deep trees as the base learners of AdaBoost in some complicated situations. We apply AdaBoost in portfolio management via empirical studies in the Chinese market, which corroborates our theoretical propositions. △ Less

Submitted 23 March, 2021; originally announced March 2021.

arXiv:2007.13140

Fully Bayesian Analysis of the Relevance Vector Machine Classification for Imbalanced Data

Authors: Wenyang Wang, Dongchu Sun, Zhuoqiong He

Abstract: Relevance Vector Machine (RVM) is a supervised learning algorithm extended from Support Vector Machine (SVM) based on the Bayesian sparsity model. Compared with the regression problem, RVM classification is difficult to be conducted because there is no closed-form solution for the weight parameter posterior. Original RVM classification algorithm used Newton's method in optimization to obtain the m… ▽ More Relevance Vector Machine (RVM) is a supervised learning algorithm extended from Support Vector Machine (SVM) based on the Bayesian sparsity model. Compared with the regression problem, RVM classification is difficult to be conducted because there is no closed-form solution for the weight parameter posterior. Original RVM classification algorithm used Newton's method in optimization to obtain the mode of weight parameter posterior then approximated it by a Gaussian distribution in Laplace's method. It would work but just applied the frequency methods in a Bayesian framework. This paper proposes a Generic Bayesian approach for the RVM classification. We conjecture that our algorithm achieves convergent estimates of the quantities of interest compared with the nonconvergent estimates of the original RVM classification algorithm. Furthermore, a Fully Bayesian approach with the hierarchical hyperprior structure for RVM classification is proposed, which improves the classification performance, especially in the imbalanced data problem. By the numeric studies, our proposed algorithms obtain high classification accuracy rates. The Fully Bayesian hierarchical hyperprior method outperforms the Generic one for the imbalanced data classification. △ Less

Submitted 27 October, 2022; v1 submitted 26 July, 2020; originally announced July 2020.

Comments: The extended and final version of this paper has been published with open access modality in the CAAI Transactions on Intelligence Technology and can be found at link https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/cit2.12111. Please refer to the TRIT published version in your scientific papers

arXiv:2007.12336 [pdf, other]

T-BFA: Targeted Bit-Flip Adversarial Weight Attack

Authors: Adnan Siraj Rakin, Zhezhi He, **gtao Li, Fan Yao, Chaitali Chakrabarti, Deliang Fan

Abstract: Traditional Deep Neural Network (DNN) security is mostly related to the well-known adversarial input example attack. Recently, another dimension of adversarial attack, namely, attack on DNN weight parameters, has been shown to be very powerful. As a representative one, the Bit-Flip-based adversarial weight Attack (BFA) injects an extremely small amount of faults into weight parameters to hijack th… ▽ More Traditional Deep Neural Network (DNN) security is mostly related to the well-known adversarial input example attack. Recently, another dimension of adversarial attack, namely, attack on DNN weight parameters, has been shown to be very powerful. As a representative one, the Bit-Flip-based adversarial weight Attack (BFA) injects an extremely small amount of faults into weight parameters to hijack the executing DNN function. Prior works of BFA focus on un-targeted attack that can hack all inputs into a random output class by flip** a very small number of weight bits stored in computer memory. This paper proposes the first work of targeted BFA based (T-BFA) adversarial weight attack on DNNs, which can intentionally mislead selected inputs to a target output class. The objective is achieved by identifying the weight bits that are highly associated with classification of a targeted output through a class-dependent weight bit ranking algorithm. Our proposed T-BFA performance is successfully demonstrated on multiple DNN architectures for image classification tasks. For example, by merely flip** 27 out of 88 million weight bits of ResNet-18, our T-BFA can misclassify all the images from 'Hen' class into 'Goose' class (i.e., 100 % attack success rate) in ImageNet dataset, while maintaining 59.35 % validation accuracy. Moreover, we successfully demonstrate our T-BFA attack in a real computer prototype system running DNN computation, with Ivy Bridge-based Intel i7 CPU and 8GB DDR3 memory. △ Less

Submitted 7 January, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

arXiv:2004.03481 [pdf, other]

doi 10.1007/s11067-021-09542-9

Routine pattern discovery and anomaly detection in individual travel behavior

Authors: Lijun Sun, Xinyu Chen, Zhaocheng He, Luis F. Miranda-Moreno

Abstract: Discovering patterns and detecting anomalies in individual travel behavior is a crucial problem in both research and practice. In this paper, we address this problem by building a probabilistic framework to model individual spatiotemporal travel behavior data (e.g., trip records and trajectory data). We develop a two-dimensional latent Dirichlet allocation (LDA) model to characterize the generativ… ▽ More Discovering patterns and detecting anomalies in individual travel behavior is a crucial problem in both research and practice. In this paper, we address this problem by building a probabilistic framework to model individual spatiotemporal travel behavior data (e.g., trip records and trajectory data). We develop a two-dimensional latent Dirichlet allocation (LDA) model to characterize the generative mechanism of spatiotemporal trip records of each traveler. This model introduces two separate factor matrices for the spatial dimension and the temporal dimension, respectively, and use a two-dimensional core structure at the individual level to effectively model the joint interactions and complex dependencies. This model can efficiently summarize travel behavior patterns on both spatial and temporal dimensions from very sparse trip sequences in an unsupervised way. In this way, complex travel behavior can be modeled as a mixture of representative and interpretable spatiotemporal patterns. By applying the trained model on future/unseen spatiotemporal records of a traveler, we can detect her behavior anomalies by scoring those observations using perplexity. We demonstrate the effectiveness of the proposed modeling framework on a real-world license plate recognition (LPR) data set. The results confirm the advantage of statistical learning methods in modeling sparse individual travel behavior data. This type of pattern discovery and anomaly detection applications can provide useful insights for traffic monitoring, law enforcement, and individual travel behavior profiling. △ Less

Submitted 5 April, 2020; originally announced April 2020.

Journal ref: Networks and Spatial Economics (2021)

arXiv:2004.02359 [pdf, other]

Deep Neural Network in Cusp Catastrophe Model

Authors: Ranadeep Daw, Zhuoqiong He

Abstract: Catastrophe theory was originally proposed to study dynamical systems that exhibit sudden shifts in behavior arising from small changes in input. These models can generate reasonable explanation behind abrupt jumps in nonlinear dynamic models. Among the different catastrophe models, the Cusp Catastrophe model attracted the most attention due to it's relatively simpler dynamics and rich domain of a… ▽ More Catastrophe theory was originally proposed to study dynamical systems that exhibit sudden shifts in behavior arising from small changes in input. These models can generate reasonable explanation behind abrupt jumps in nonlinear dynamic models. Among the different catastrophe models, the Cusp Catastrophe model attracted the most attention due to it's relatively simpler dynamics and rich domain of application. Due to the complex behavior of the response, the parameter space becomes highly non-convex and hence it becomes very hard to optimize to figure out the generating parameters. Instead of solving for these generating parameters, we demonstrated how a Machine learning model can be trained to learn the dynamics of the Cusp catastrophe models, without ever really solving for the generating model parameters. Simulation studies and application on a few famous datasets are used to validate our approach. To our knowledge, this is the first paper of such kind where a neural network based approach has been applied in Cusp Catastrophe model. △ Less

Submitted 21 April, 2020; v1 submitted 5 April, 2020; originally announced April 2020.

arXiv:2002.12663 [pdf, other]

HOTCAKE: Higher Order Tucker Articulated Kernels for Deeper CNN Compression

Authors: Rui Lin, Ching-Yun Ko, Zhuolun He, Cong Chen, Yuan Cheng, Hao Yu, Graziano Chesi, Ngai Wong

Abstract: The emerging edge computing has promoted immense interests in compacting a neural network without sacrificing much accuracy. In this regard, low-rank tensor decomposition constitutes a powerful tool to compress convolutional neural networks (CNNs) by decomposing the 4-way kernel tensor into multi-stage smaller ones. Building on top of Tucker-2 decomposition, we propose a generalized Higher Order T… ▽ More The emerging edge computing has promoted immense interests in compacting a neural network without sacrificing much accuracy. In this regard, low-rank tensor decomposition constitutes a powerful tool to compress convolutional neural networks (CNNs) by decomposing the 4-way kernel tensor into multi-stage smaller ones. Building on top of Tucker-2 decomposition, we propose a generalized Higher Order Tucker Articulated Kernels (HOTCAKE) scheme comprising four steps: input channel decomposition, guided Tucker rank selection, higher order Tucker decomposition and fine-tuning. By subjecting each CONV layer to HOTCAKE, a highly compressed CNN model with graceful accuracy trade-off is obtained. Experiments show HOTCAKE can compress even pre-compressed models and produce state-of-the-art lightweight networks. △ Less

Submitted 28 February, 2020; originally announced February 2020.

Comments: 6 pages, 5 figures

arXiv:2001.06325 [pdf, other]

Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet

Authors: Sizhe Chen, Zhengbao He, Cheng** Sun, Jie Yang, Xiaolin Huang

Abstract: Adversarial attacks on deep neural networks (DNNs) have been found for several years. However, the existing adversarial attacks have high success rates only when the information of the victim DNN is well-known or could be estimated by the structure similarity or massive queries. In this paper, we propose to Attack on Attention (AoA), a semantic property commonly shared by DNNs. AoA enjoys a signif… ▽ More Adversarial attacks on deep neural networks (DNNs) have been found for several years. However, the existing adversarial attacks have high success rates only when the information of the victim DNN is well-known or could be estimated by the structure similarity or massive queries. In this paper, we propose to Attack on Attention (AoA), a semantic property commonly shared by DNNs. AoA enjoys a significant increase in transferability when the traditional cross entropy loss is replaced with the attention loss. Since AoA alters the loss function only, it could be easily combined with other transferability-enhancement techniques and then achieve SOTA performance. We apply AoA to generate 50000 adversarial samples from ImageNet validation set to defeat many neural networks, and thus name the dataset as DAmageNet. 13 well-trained DNNs are tested on DAmageNet, and all of them have an error rate over 85%. Even with defenses or adversarial training, most models still maintain an error rate over 70% on DAmageNet. DAmageNet is the first universal adversarial dataset. It could be downloaded freely and serve as a benchmark for robustness testing and adversarial training. △ Less

Submitted 21 October, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

Comments: accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:1912.07160 [pdf, other]

DAmageNet: A Universal Adversarial Dataset

Authors: Sizhe Chen, Xiaolin Huang, Zhengbao He, Cheng** Sun

Abstract: It is now well known that deep neural networks (DNNs) are vulnerable to adversarial attack. Adversarial samples are similar to the clean ones, but are able to cheat the attacked DNN to produce incorrect predictions in high confidence. But most of the existing adversarial attacks have high success rate only when the information of the attacked DNN is well-known or could be estimated by massive quer… ▽ More It is now well known that deep neural networks (DNNs) are vulnerable to adversarial attack. Adversarial samples are similar to the clean ones, but are able to cheat the attacked DNN to produce incorrect predictions in high confidence. But most of the existing adversarial attacks have high success rate only when the information of the attacked DNN is well-known or could be estimated by massive queries. A promising way is to generate adversarial samples with high transferability. By this way, we generate 96020 transferable adversarial samples from original ones in ImageNet. The average difference, measured by root means squared deviation, is only around 3.8 on average. However, the adversarial samples are misclassified by various models with an error rate up to 90\%. Since the images are generated independently with the attacked DNNs, this is essentially zero-query adversarial attack. We call the dataset \emph{DAmageNet}, which is the first universal adversarial dataset that beats many models trained in ImageNet. By finding the drawbacks, DAmageNet could serve as a benchmark to study and improve robustness of DNNs. DAmageNet could be downloaded in http://www.pami.sjtu.edu.cn/Show/56/122. △ Less

Submitted 15 December, 2019; originally announced December 2019.

arXiv:1910.11148 [pdf]

Learning Priors in High-frequency Domain for Inverse Imaging Reconstruction

Authors: Zhuonan He, **jie Zhou, Dong Liang, Yuhao Wang, Qiegen Liu

Abstract: Ill-posed inverse problems in imaging remain an active research topic in several decades, with new approaches constantly emerging. Recognizing that the popular dictionary learning and convolutional sparse coding are both essentially modeling the high-frequency component of an image, which convey most of the semantic information such as texture details, in this work we propose a novel multi-profile… ▽ More Ill-posed inverse problems in imaging remain an active research topic in several decades, with new approaches constantly emerging. Recognizing that the popular dictionary learning and convolutional sparse coding are both essentially modeling the high-frequency component of an image, which convey most of the semantic information such as texture details, in this work we propose a novel multi-profile high-frequency transform-guided denoising autoencoder as prior (HF-DAEP). To achieve this goal, we first extract a set of multi-profile high-frequency components via a specific transformation and add the artificial Gaussian noise to these high-frequency components as training samples. Then, as the high-frequency prior information is learned, we incorporate it into classical iterative reconstruction process by proximal gradient descent technique. Preliminary results on highly under-sampled magnetic resonance imaging and sparse-view computed tomography reconstruction demonstrate that the proposed method can efficiently reconstruct feature details and present advantages over state-of-the-arts. △ Less

Submitted 23 October, 2019; originally announced October 2019.

arXiv:1910.10897 [pdf, other]

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Authors: Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Avnish Narayan, Hayden Shively, Adithya Bellathur, Karol Hausman, Chelsea Finn, Sergey Levine

Abstract: Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as d… ▽ More Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as different tasks. When policies are meta-trained on such narrow task distributions, they cannot possibly generalize to more quickly acquire entirely new tasks. Therefore, if the aim of these methods is to enable faster acquisition of entirely new behaviors, we must evaluate them on task distributions that are sufficiently broad to enable generalization to new behaviors. In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. Our aim is to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 7 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as ten distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods. △ Less

Submitted 14 June, 2021; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: This is an update version of a manuscript that originally appeared at CoRL 2019. Videos are here: meta-world.github.io, open-sourced code are available at: https://github.com/rlworkgroup/metaworld, and the baselines can be found at https://github.com/rlworkgroup/garage

arXiv:1909.09148 [pdf, other]

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data

Authors: Zhuoxun He, Lingxi Xie, Xin Chen, Ya Zhang, Yanfeng Wang, Qi Tian

Abstract: Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks. Recently, researchers proposed a few intensive data augmentation techniques, which indeed improved accuracy, yet we notice that these methods augment data have also caused a considerable gap between clean and augmented data. In this paper, we revisit thi… ▽ More Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks. Recently, researchers proposed a few intensive data augmentation techniques, which indeed improved accuracy, yet we notice that these methods augment data have also caused a considerable gap between clean and augmented data. In this paper, we revisit this problem from an analytical perspective, for which we estimate the upper-bound of expected risk using two terms, namely, empirical risk and generalization error, respectively. We develop an understanding of data augmentation as regularization, which highlights the major features. As a result, data augmentation significantly reduces the generalization error, but meanwhile leads to a slightly higher empirical risk. On the assumption that data augmentation helps models converge to a better region, the model can benefit from a lower empirical risk achieved by a simple method, i.e., using less-augmented data to refine the model trained on fully-augmented data. Our approach achieves consistent accuracy gain on a few standard image classification benchmarks, and the gain transfers to object detection. △ Less

Submitted 21 November, 2019; v1 submitted 19 September, 2019; originally announced September 2019.

arXiv:1909.02902 [pdf, other]

Dynamic Spatial-Temporal Representation Learning for Traffic Flow Prediction

Authors: Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, Liang Lin

Abstract: As a crucial component in intelligent transportation systems, traffic flow prediction has recently attracted widespread research interest in the field of artificial intelligence (AI) with the increasing availability of massive traffic mobility data. Its key challenge lies in how to integrate diverse factors (such as temporal rules and spatial dependencies) to infer the evolution trend of traffic f… ▽ More As a crucial component in intelligent transportation systems, traffic flow prediction has recently attracted widespread research interest in the field of artificial intelligence (AI) with the increasing availability of massive traffic mobility data. Its key challenge lies in how to integrate diverse factors (such as temporal rules and spatial dependencies) to infer the evolution trend of traffic flow. To address this problem, we propose a unified neural network called Attentive Traffic Flow Machine (ATFM), which can effectively learn the spatial-temporal feature representations of traffic flow with an attention mechanism. In particular, our ATFM is composed of two progressive Convolutional Long Short-Term Memory (ConvLSTM \cite{xingjian2015convolutional}) units connected with a convolutional layer. Specifically, the first ConvLSTM unit takes normal traffic flow features as input and generates a hidden state at each time-step, which is further fed into the connected convolutional layer for spatial attention map inference. The second ConvLSTM unit aims at learning the dynamic spatial-temporal representations from the attentionally weighted traffic flow features. Further, we develop two deep learning frameworks based on ATFM to predict citywide short-term/long-term traffic flow by adaptively incorporating the sequential and periodic data as well as other external influences. Extensive experiments on two standard benchmarks well demonstrate the superiority of the proposed method for traffic flow prediction. Moreover, to verify the generalization of our method, we also apply the customized framework to forecast the passenger pickup/dropoff demands in traffic prediction and show its superior performance. Our code and data are available at {\color{blue}\url{https://github.com/liulingbo918/ATFM}}. △ Less

Submitted 12 June, 2020; v1 submitted 1 September, 2019; originally announced September 2019.

Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems. arXiv admin note: text overlap with arXiv:1809.00101

arXiv:1908.07232 [pdf, other]

Sensitivity estimation of conditional value at risk using randomized quasi-Monte Carlo

Authors: Zhijian He

Abstract: Conditional value at risk (CVaR) is a popular measure for quantifying portfolio risk. Sensitivity analysis of CVaR is very useful in risk management and gradient-based optimization algorithms. In this paper, we study the infinitesimal perturbation analysis estimator for CVaR sensitivity using randomized quasi-Monte Carlo (RQMC) simulation. We first prove that the RQMC-based estimator is strongly c… ▽ More Conditional value at risk (CVaR) is a popular measure for quantifying portfolio risk. Sensitivity analysis of CVaR is very useful in risk management and gradient-based optimization algorithms. In this paper, we study the infinitesimal perturbation analysis estimator for CVaR sensitivity using randomized quasi-Monte Carlo (RQMC) simulation. We first prove that the RQMC-based estimator is strongly consistent under very mild conditions. Under some technical conditions, RQMC that uses $d$-dimensional points in CVaR sensitivity estimation yields a mean error rate of $O(n^{-1/2-1/(4d-2)+ε})$ for arbitrarily small $ε>0$. The numerical results show that the RQMC method performs better than the Monte Carlo method for all cases. The gain of plain RQMC deteriorates as the dimension $d$ increases, as predicted by the established theoretical error rate. △ Less

Submitted 21 September, 2020; v1 submitted 20 August, 2019; originally announced August 2019.

arXiv:1908.06951 [pdf, ps, other]

Gradient Boosting Machine: A Survey

Authors: Zhiyuan He, Danchen Lin, Thomas Lau, Mike Wu

Abstract: In this survey, we discuss several different types of gradient boosting algorithms and illustrate their mathematical frameworks in detail: 1. introduction of gradient boosting leads to 2. objective function optimization, 3. loss function estimations, and 4. model constructions. 5. application of boosting in ranking. In this survey, we discuss several different types of gradient boosting algorithms and illustrate their mathematical frameworks in detail: 1. introduction of gradient boosting leads to 2. objective function optimization, 3. loss function estimations, and 4. model constructions. 5. application of boosting in ranking. △ Less

Submitted 19 August, 2019; originally announced August 2019.

arXiv:1907.06356 [pdf, other]

Motorway Traffic Flow Prediction using Advanced Deep Learning

Authors: Adriana-Simona Mihaita, Haowen Li, Zongyang He, Marian-Andrei Rizoiu

Abstract: Congestion prediction represents a major priority for traffic management centres around the world to ensure timely incident response handling. The increasing amounts of generated traffic data have been used to train machine learning predictors for traffic, however this is a challenging task due to inter-dependencies of traffic flow both in time and space. Recently, deep learning techniques have sh… ▽ More Congestion prediction represents a major priority for traffic management centres around the world to ensure timely incident response handling. The increasing amounts of generated traffic data have been used to train machine learning predictors for traffic, however this is a challenging task due to inter-dependencies of traffic flow both in time and space. Recently, deep learning techniques have shown significant prediction improvements over traditional models, however open questions remain around their applicability, accuracy and parameter tuning. This paper proposes an advanced deep learning framework for simultaneously predicting the traffic flow on a large number of monitoring stations along a highly circulated motorway in Sydney, Australia, including exit and entry loop count stations, and over varying training and prediction time horizons. The spatial and temporal features extracted from the 36.34 million data points are used in various deep learning architectures that exploit their spatial structure (convolutional neuronal networks), their temporal dynamics (recurrent neuronal networks), or both through a hybrid spatio-temporal modelling (CNN-LSTM). We show that our deep learning models consistently outperform traditional methods, and we conduct a comparative analysis of the optimal time horizon of historical data required to predict traffic flow at different time points in the future. △ Less

Submitted 16 July, 2019; v1 submitted 15 July, 2019; originally announced July 2019.

Comments: Published in the Proceedings of the 22nd IEEE Intelligent Transportation Systems Conference (ITSC'19). Auckland, New Zealand

arXiv:1907.02124 [pdf, other]

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

Authors: Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, Yanzhi Wang

Abstract: Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has high… ▽ More Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly, and has become a "must-do" step for FPGA and ASIC implementations. This paper provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending and enhancing ADMM-NN, a recently proposed joint weight pruning and quantization framework. Second, we develop a methodology for fair and fundamental comparison of non-structured and structured pruning in terms of both storage and computation efficiency. Our results show that ADMM-NN-S consistently outperforms the prior art: (i) it achieves 348x, 36x, and 8x overall weight pruning on LeNet-5, AlexNet, and ResNet-50, respectively, with (almost) zero accuracy loss; (ii) we demonstrate the first fully binarized (for all layers) DNNs can be lossless in accuracy in many cases. These results provide a strong baseline and credibility of our study. Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency. Thus, we conclude that non-structured pruning is considered harmful. We urge the community not to continue the DNN inference acceleration for non-structured sparsity. △ Less

Submitted 7 January, 2020; v1 submitted 3 July, 2019; originally announced July 2019.

arXiv:1906.10163 [pdf]

Assessing the Validity of a a priori Patient-Trial Generalizability Score using Real-world Data from a Large Clinical Data Research Network: A Colorectal Cancer Clinical Trial Case Study

Authors: Qian Li, Zhe He, Yi Guo, Hansi Zhang, Thomas J George Jr, William Hogan, Neil Charness, Jiang Bian

Abstract: Existing trials had not taken enough consideration of their population representativeness, which can lower the effectiveness when the treatment is applied in real-world clinical practice. We analyzed the eligibility criteria of Bevacizumab colorectal cancer treatment trials, assessed their a priori generalizability, and examined how it affects patient outcomes when applied in real-world clinical s… ▽ More Existing trials had not taken enough consideration of their population representativeness, which can lower the effectiveness when the treatment is applied in real-world clinical practice. We analyzed the eligibility criteria of Bevacizumab colorectal cancer treatment trials, assessed their a priori generalizability, and examined how it affects patient outcomes when applied in real-world clinical settings. To do so, we extracted patient-level data from a large collection of electronic health records (EHRs) from the OneFlorida consortium. We built a zero-inflated negative binomial model using a composite patient-trial generalizability (cPTG) score to predict patients clinical outcomes (i.e., number of serious adverse events, (SAEs)). Our study results provide a body of evidence that 1) the cPTG scores can predict patient outcomes; and 2) patients who are more similar to the study population in the trials that were used to develop the treatment will have a significantly lower possibility to experience serious adverse events. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1906.04734 [pdf]

Incremental Classifier Learning Based on PEDCC-Loss and Cosine Distance

Authors: Qiuyu Zhu, Zikuang He, Xin Ye

Abstract: The main purpose of incremental learning is to learn new knowledge while not forgetting the knowledge which have been learned before. At present, the main challenge in this area is the catastrophe forgetting, namely the network will lose their performance in the old tasks after training for new tasks. In this paper, we introduce an ensemble method of incremental classifier to alleviate this proble… ▽ More The main purpose of incremental learning is to learn new knowledge while not forgetting the knowledge which have been learned before. At present, the main challenge in this area is the catastrophe forgetting, namely the network will lose their performance in the old tasks after training for new tasks. In this paper, we introduce an ensemble method of incremental classifier to alleviate this problem, which is based on the cosine distance between the output feature and the pre-defined center, and can let each task to be preserved in different networks. During training, we make use of PEDCC-Loss to train the CNN network. In the stage of testing, the prediction is determined by the cosine distance between the network latent features and pre-defined center. The experimental results on EMINST and CIFAR100 show that our method outperforms the recent LwF method, which use the knowledge distillation, and iCaRL method, which keep some old samples while training for new task. The method can achieve the goal of not forgetting old knowledge while training new classes, and solve the problem of catastrophic forgetting better. △ Less

Submitted 11 June, 2019; originally announced June 2019.

arXiv:1905.12469 [pdf]

Understanding Perceptions and Attitudes in Breast Cancer Discussions on Twitter

Authors: Francois Modave, Yunpeng Zhao, Janice Krieger, Zhe He, Yi Guo, **hai Huo, Mattia Prosperi, Jiang Bian

Abstract: Among American women, the rate of breast cancer is only second to lung cancer. An estimated 12.4% women will develop breast cancer over the course of their lifetime. The widespread use of social media across the socio-economic spectrum offers unparalleled ways to facilitate information sharing, in particular as it pertains to health. Social media is also used by many healthcare stakeholders, rangi… ▽ More Among American women, the rate of breast cancer is only second to lung cancer. An estimated 12.4% women will develop breast cancer over the course of their lifetime. The widespread use of social media across the socio-economic spectrum offers unparalleled ways to facilitate information sharing, in particular as it pertains to health. Social media is also used by many healthcare stakeholders, ranging from government agencies to healthcare industry, to disseminate health information and to engage patients. The purpose of this study is to investigate people's perceptions and attitudes relate to breast cancer, especially those that are related to physical activities, on Twitter. To achieve this, we first identified and collected tweets related to breast cancer; and then used topic modeling and sentiment analysis techniques to understanding discussion themes and quantify Twitter users' perceptions and emotions w.r.t breast cancer to answer 5 research questions. △ Less

Submitted 22 May, 2019; originally announced May 2019.

Comments: 5 pages, 10 figures, The 17th World Congress of Medical and Health Informatics

arXiv:1905.07188 [pdf, other]

doi 10.1109/ACCESS.2020.3042757

Reference-Based Sequence Classification

Authors: Zengyou He, Guangyao Xu, Chaohua Sheng, Bo Xu, Quan Zou

Abstract: Sequence classification is an important data mining task in many real world applications. Over the past few decades, many sequence classification methods have been proposed from different aspects. In particular, the pattern-based method is one of the most important and widely studied sequence classification methods in the literature. In this paper, we present a reference-based sequence classificat… ▽ More Sequence classification is an important data mining task in many real world applications. Over the past few decades, many sequence classification methods have been proposed from different aspects. In particular, the pattern-based method is one of the most important and widely studied sequence classification methods in the literature. In this paper, we present a reference-based sequence classification framework, which can unify existing pattern-based sequence classification methods under the same umbrella. More importantly, this framework can be used as a general platform for develo** new sequence classification algorithms. By utilizing this framework as a tool, we propose new sequence classification algorithms that are quite different from existing solutions. Experimental results show that new methods developed under the proposed framework are capable of achieving comparable classification accuracy to those state-of-the-art sequence classification algorithms. △ Less

Submitted 13 December, 2020; v1 submitted 17 May, 2019; originally announced May 2019.

Journal ref: in IEEE Access, vol. 8, pp. 218199-218214, 2020

arXiv:1905.05849 [pdf, other]

Consensus-based Interpretable Deep Neural Networks with Application to Mortality Prediction

Authors: Shaeke Salman, Seyedeh Neelufar Payrovnaziri, Xiuwen Liu, Pablo Rengifo-Moreno, Zhe He

Abstract: Deep neural networks have achieved remarkable success in various challenging tasks. However, the black-box nature of such networks is not acceptable to critical applications, such as healthcare. In particular, the existence of adversarial examples and their overgeneralization to irrelevant, out-of-distribution inputs with high confidence makes it difficult, if not impossible, to explain decisions… ▽ More Deep neural networks have achieved remarkable success in various challenging tasks. However, the black-box nature of such networks is not acceptable to critical applications, such as healthcare. In particular, the existence of adversarial examples and their overgeneralization to irrelevant, out-of-distribution inputs with high confidence makes it difficult, if not impossible, to explain decisions by such networks. In this paper, we analyze the underlying mechanism of generalization of deep neural networks and propose an ($n$, $k$) consensus algorithm which is insensitive to adversarial examples and can reliably reject out-of-distribution samples. Furthermore, the consensus algorithm is able to improve classification accuracy by using multiple trained deep neural networks. To handle the complexity of deep neural networks, we cluster linear approximations of individual models and identify highly correlated clusters among different models to capture feature importance robustly, resulting in improved interpretability. Motivated by the importance of building accurate and interpretable prediction models for healthcare, our experimental results on an ICU dataset show the effectiveness of our algorithm in enhancing both the prediction accuracy and the interpretability of deep neural network models on one-year patient mortality prediction. In particular, while the proposed method maintains similar interpretability as conventional shallow models such as logistic regression, it improves the prediction accuracy significantly. △ Less

Submitted 11 September, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

Comments: 8 pages, 6 figures

arXiv:1904.12383 [pdf]

Enhancing Prediction Models for One-Year Mortality in Patients with Acute Myocardial Infarction and Post Myocardial Infarction Syndrome

Authors: Seyedeh Neelufar Payrovnaziri, Laura A. Barrett, Daniel Bis, Jiang Bian, Zhe He

Abstract: Predicting the risk of mortality for patients with acute myocardial infarction (AMI) using electronic health records (EHRs) data can help identify risky patients who might need more tailored care. In our previous work, we built computational models to predict one-year mortality of patients admitted to an intensive care unit (ICU) with AMI or post myocardial infarction syndrome. Our prior work only… ▽ More Predicting the risk of mortality for patients with acute myocardial infarction (AMI) using electronic health records (EHRs) data can help identify risky patients who might need more tailored care. In our previous work, we built computational models to predict one-year mortality of patients admitted to an intensive care unit (ICU) with AMI or post myocardial infarction syndrome. Our prior work only used the structured clinical data from MIMIC-III, a publicly available ICU clinical database. In this study, we enhanced our work by adding the word embedding features from free-text discharge summaries. Using a richer set of features resulted in significant improvement in the performance of our deep learning models. The average accuracy of our deep learning models was 92.89% and the average F-measure was 0.928. We further reported the impact of different combinations of features extracted from structured and/or unstructured data on the performance of the deep learning models. △ Less

Submitted 28 April, 2019; originally announced April 2019.

Showing 1–50 of 75 results for author: He, Z