Search | arXiv e-print repository

Monte Carlo Expectation-Maximization algorithm to detect imprinting and maternal effects for discordant sib-pair data

Authors: Ruwani Herath, Alex Trindade, Fangyuan Zhang

Abstract: Numerous statistical methods have been developed to explore genomic imprinting and maternal effects, which are causes of parent-of-origin patterns in complex human diseases. Most of the methods, however, either only model one of these two confounded epigenetic effects, or make strong yet unrealistic assumptions about the population to avoid over-parameterization. A recent partial likelihood method… ▽ More Numerous statistical methods have been developed to explore genomic imprinting and maternal effects, which are causes of parent-of-origin patterns in complex human diseases. Most of the methods, however, either only model one of these two confounded epigenetic effects, or make strong yet unrealistic assumptions about the population to avoid over-parameterization. A recent partial likelihood method (LIMEDSP ) can identify both epigenetic effects based on discordant sibpair family data without those assumptions. Theoretical and empirical studies have shown its validity and robustness. As LIMEDSP method obtains parameter estimation by maximizing partial likelihood, it is interesting to compare its efficiency with full likelihood maximizer. To overcome the difficulty in over-parameterization when using full likelihood, this study proposes a discordant sib-pair design based Monte Carlo Expectation Maximization (MCEMDSP ) method to detect imprinting and maternal effects jointly. Those unknown mating type probabilities, the nuisance parameters, are considered as latent variables in EM algorithm. Monte Carlo samples are used to numerically approximate the expectation function that cannot be solved algebraically. Our simulation results show that though this MCEMDSP algorithm takes longer computation time, it can generally detect both epigenetic effects with higher power, which demonstrates that it can be a good complement of LIMEDSP method △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2401.00517 [pdf, other]

Detecting Imprinting and Maternal Effects Using Monte Carlo Expectation Maximization Algorithm

Authors: Pooya Aavani, Alexandre Trindade, Fangyuan Zhang

Abstract: Numerous statistical methods have been developed to explore genomic imprinting and maternal effects, which are causes of parent-of-origin patterns in complex human diseases. However, most of them either only model one of these two confounded epigenetic effects, or make strong yet unrealistic assumptions about the population to avoid over-parameterization. A recent partial likelihood method (LIME)… ▽ More Numerous statistical methods have been developed to explore genomic imprinting and maternal effects, which are causes of parent-of-origin patterns in complex human diseases. However, most of them either only model one of these two confounded epigenetic effects, or make strong yet unrealistic assumptions about the population to avoid over-parameterization. A recent partial likelihood method (LIME) can identify both epigenetic effects based on case-control family data without those assumptions. Theoretical and empirical studies have shown its validity and robustness. However, because LIME obtains parameter estimation by maximizing partial likelihood, it is interesting to compare its efficiency with full likelihood maximizer. To overcome the difficulty in over-parameterization when using full likelihood, in this study we propose a Monte Carlo Expectation Maximization (MCEM) method to detect imprinting and maternal effects jointly. Those unknown mating type probabilities, the nuisance parameters, can be considered as latent variables in EM algorithm. Monte Carlo samples are used to numerically approximate the expectation function that cannot be solved algebraically. Our simulation results show that though this MCEM algorithm takes longer computational time, and can give higher bias in some simulations compared to LIME, it can generally detect both epigenetic effects with higher power and smaller standard error which demonstrates that it can be a good complement of LIME method. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.07636 [pdf, other]

Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply

Authors: Chengting Yu, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Er** Li

Abstract: Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and paral… ▽ More Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and parallel training methods that substantially reduce memory cost. However, empirical experiments reveal that as the number of segmentations of the gradient-isolated module increases, the performance of the local learning scheme degrades substantially, severely limiting its expansibility. To avoid this issue, we theoretically analyze the greedy local learning from the standpoint of information theory and propose a ContSup scheme, which incorporates context supply between isolated modules to compensate for information loss. Experiments on benchmark datasets (i.e. CIFAR, SVHN, STL-10) achieve SOTA results and indicate that our proposed method can significantly improve the performance of greedy local learning with minimal memory and computational overhead, allowing for the boost of the number of isolated modules. Our codes are available at https://github.com/Tab-ct/ContSup. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 9 figures, 12 tables

arXiv:2310.17531 [pdf, ps, other]

Learning Regularized Graphon Mean-Field Games with Unknown Graphons

Authors: Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang

Abstract: We design and analyze reinforcement learning algorithms for Graphon Mean-Field Games (GMFGs). In contrast to previous works that require the precise values of the graphons, we aim to learn the Nash Equilibrium (NE) of the regularized GMFGs when the graphons are unknown. Our contributions are threefold. First, we propose the Proximal Policy Optimization for GMFG (GMFG-PPO) algorithm and show that i… ▽ More We design and analyze reinforcement learning algorithms for Graphon Mean-Field Games (GMFGs). In contrast to previous works that require the precise values of the graphons, we aim to learn the Nash Equilibrium (NE) of the regularized GMFGs when the graphons are unknown. Our contributions are threefold. First, we propose the Proximal Policy Optimization for GMFG (GMFG-PPO) algorithm and show that it converges at a rate of $O(T^{-1/3})$ after $T$ iterations with an estimation oracle, improving on a previous work by Xie et al. (ICML, 2021). Second, using kernel embedding of distributions, we design efficient algorithms to estimate the transition kernels, reward functions, and graphons from sampled agents. Convergence rates are then derived when the positions of the agents are either known or unknown. Results for the combination of the optimization algorithm GMFG-PPO and the estimation algorithm are then provided. These algorithms are the first specifically designed for learning graphons from sampled agents. Finally, the efficacy of the proposed algorithms are corroborated through simulations. These simulations demonstrate that learning the unknown graphons reduces the exploitability effectively. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.08089 [pdf, other]

Learning Regularized Monotone Graphon Mean-Field Games

Authors: Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang

Abstract: This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $λ$-regularized GMFG (for $λ\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($λ=0$) and $λ$-regularized MFGs, which are special cases of GMFGs. Second, we propose provab… ▽ More This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $λ$-regularized GMFG (for $λ\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($λ=0$) and $λ$-regularized MFGs, which are special cases of GMFGs. Second, we propose provably efficient algorithms to learn the NE in weakly monotone GMFGs, motivated by Lasry and Lions [2007]. Previous literature either only analyzed continuous-time algorithms or required extra conditions to analyze discrete-time algorithms. In contrast, we design a discrete-time algorithm and derive its convergence rate solely under weakly monotone conditions. Furthermore, we develop and analyze the action-value function estimation procedure during the online learning process, which is absent from algorithms for monotone GMFGs. This serves as a sub-module in our optimization algorithm. The efficiency of the designed algorithm is corroborated by empirical evaluations. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2307.13371 [pdf, other]

Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation

Authors: Fengxue Zhang, Jialin Song, James Bowden, Alexander Ladd, Yisong Yue, Thomas A. Desautels, Yuxin Chen

Abstract: We study Bayesian optimization (BO) in high-dimensional and non-stationary scenarios. Existing algorithms for such scenarios typically require extensive hyperparameter tuning, which limits their practical effectiveness. We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a… ▽ More We study Bayesian optimization (BO) in high-dimensional and non-stationary scenarios. Existing algorithms for such scenarios typically require extensive hyperparameter tuning, which limits their practical effectiveness. We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a Gaussian process (GP). Our approach is easy to tune, and is able to focus on local region of the optimization space that can be tackled by existing BO methods. The key idea is to use two probabilistic models: a coarse GP to identify the ROI, and a localized GP for optimization within the ROI. We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO without ROI filtering. We demonstrate empirically the effectiveness of BALLET on both synthetic and real-world optimization tasks. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.01389 [pdf, other]

Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

Authors: Haixing Dai, Mengxuan Hu, Qing Li, Lu Zhang, Lin Zhao, Dajiang Zhu, Ibai Diez, Jorge Sepulcre, Fan Zhang, Xingyu Gao, Manhua Liu, Quanzheng Li, Sheng Li, Tianming Liu, Xiang Li

Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-bet… ▽ More Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-beta accumulation and AD pathophysiology remains unclear, and causal inference approaches are needed to uncover how amyloid-beta levels can impact AD development. In this paper, we propose a graph varying coefficient neural network (GVCNet) for estimating the individual treatment effect with continuous treatment levels using a graph convolutional neural network. We highlight the potential of causal inference approaches, including GVCNet, for measuring the regional causal connections between amyloid-beta accumulation and AD pathophysiology, which may serve as a robust tool for early diagnosis and tailored care. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2305.19420 [pdf, ps, other]

What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization

Authors: Yufeng Zhang, Fengzhuo Zhang, Zhuoran Yang, Zhaoran Wang

Abstract: In this paper, we conduct a comprehensive study of In-Context Learning (ICL) by addressing several open questions: (a) What type of ICL estimator is learned by large language models? (b) What is a proper performance metric for ICL and what is the error rate? (c) How does the transformer architecture enable ICL? To answer these questions, we adopt a Bayesian view and formulate ICL as a problem of p… ▽ More In this paper, we conduct a comprehensive study of In-Context Learning (ICL) by addressing several open questions: (a) What type of ICL estimator is learned by large language models? (b) What is a proper performance metric for ICL and what is the error rate? (c) How does the transformer architecture enable ICL? To answer these questions, we adopt a Bayesian view and formulate ICL as a problem of predicting the response corresponding to the current covariate, given a number of examples drawn from a latent variable model. To answer (a), we show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm, which is proven to be approximately parameterized by the attention mechanism. For (b), we analyze the ICL performance from an online learning perspective and establish a $\mathcal{O}(1/T)$ regret bound for perfectly pretrained ICL, where $T$ is the number of examples in the prompt. To answer (c), we show that, in addition to encoding Bayesian model averaging via attention, the transformer architecture also enables a fine-grained statistical analysis of pretraining under realistic assumptions. In particular, we prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error, where the former decays to zero exponentially as the depth grows, and the latter decays to zero sublinearly with the number of tokens in the pretraining dataset. Our results provide a unified understanding of the transformer and its ICL ability with bounds on ICL regret, approximation, and generalization, which deepens our knowledge of these essential aspects of modern language models. △ Less

Submitted 10 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.02552 [pdf, other]

Understand Waiting Time in Transaction Fee Mechanism: An Interdisciplinary Perspective

Authors: Luyao Zhang, Fan Zhang

Abstract: Blockchain enables peer-to-peer transactions in cyberspace without a trusted third party. The rapid growth of Ethereum and smart contract blockchains generally calls for well-designed Transaction Fee Mechanisms (TFMs) to allocate limited storage and computation resources. However, existing research on TFMs must consider the waiting time for transactions, which is essential for computer security an… ▽ More Blockchain enables peer-to-peer transactions in cyberspace without a trusted third party. The rapid growth of Ethereum and smart contract blockchains generally calls for well-designed Transaction Fee Mechanisms (TFMs) to allocate limited storage and computation resources. However, existing research on TFMs must consider the waiting time for transactions, which is essential for computer security and economic efficiency. Integrating data from the Ethereum blockchain and memory pool (mempool), we explore how two types of events affect transaction latency. First, we apply regression discontinuity design (RDD) to study the causal inference of the Merge, the most recent significant upgrade of Ethereum. Our results show that the Merge significantly reduces the long waiting time, network loads, and market congestion. In addition, we verify our results' robustness by inspecting other compounding factors, such as censorship and unobserved delays of transactions via private changes. Second, examining three major protocol changes during the merge, we identify block interval shortening as the most plausible cause for our empirical results. Furthermore, in a mathematical model, we show block interval as a unique mechanism design choice for EIP1559 TFM to achieve better security and efficiency, generally applicable to the market congestion caused by demand surges. Finally, we apply time series analysis to research the interaction of Non-Fungible token (NFT) drops and market congestion using Facebook Prophet, an open-source algorithm for generating time-series models. Our study identified NFT drops as a unique source of market congestion -- holiday effects -- beyond trend and season effects. Finally, we envision three future research directions of TFM. △ Less

Submitted 4 May, 2023; originally announced May 2023.

ACM Class: J.4

arXiv:2303.02566 [pdf, other]

MFAI: A Scalable Bayesian Matrix Factorization Approach to Leveraging Auxiliary Information

Authors: Zhiwei Wang, Fa Zhang, Cong Zheng, Xianghong Hu, Mingxuan Cai, Can Yang

Abstract: In various practical situations, matrix factorization methods suffer from poor data quality, such as high data sparsity and low signal-to-noise ratio (SNR). Here, we consider a matrix factorization problem by utilizing auxiliary information, which is massively available in real-world applications, to overcome the challenges caused by poor data quality. Unlike existing methods that mainly rely on s… ▽ More In various practical situations, matrix factorization methods suffer from poor data quality, such as high data sparsity and low signal-to-noise ratio (SNR). Here, we consider a matrix factorization problem by utilizing auxiliary information, which is massively available in real-world applications, to overcome the challenges caused by poor data quality. Unlike existing methods that mainly rely on simple linear models to combine auxiliary information with the main data matrix, we propose to integrate gradient boosted trees in the probabilistic matrix factorization framework to effectively leverage auxiliary information (MFAI). Thus, MFAI naturally inherits several salient features of gradient boosted trees, such as the capability of flexibly modeling nonlinear relationships and robustness to irrelevant features and missing values in auxiliary information. The parameters in MFAI can be automatically determined under the empirical Bayes framework, making it adaptive to the utilization of auxiliary information and immune to overfitting. Moreover, MFAI is computationally efficient and scalable to large datasets by exploiting variational inference. We demonstrate the advantages of MFAI through comprehensive numerical results from simulation studies and real data analyses. Our approach is implemented in the R package mfair available at https://github.com/YangLabHKUST/mfair. △ Less

Submitted 12 February, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

arXiv:2212.08018 [pdf, ps, other]

Privately Estimating a Gaussian: Efficient, Robust and Optimal

Authors: Daniel Alabi, Pravesh K. Kothari, Pranay Tankala, Prayaag Venkat, Fred Zhang

Abstract: In this work, we give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models with optimal dependence on the dimension in the sample complexity. In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error using… ▽ More In this work, we give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models with optimal dependence on the dimension in the sample complexity. In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error using $\widetilde{O}(d^2 \log κ)$ samples while tolerating a constant fraction of adversarial outliers. Here, $κ$ is the condition number of the target covariance matrix. The sample bound matches best non-private estimators in the dependence on the dimension (up to a polylogarithmic factor). We prove a new lower bound on differentially private covariance estimation to show that the dependence on the condition number $κ$ in the above sample bound is also tight. Prior to our work, only identifiability results (yielding inefficient super-polynomial time algorithms) were known for the problem. In the approximate DP setting, we give an efficient algorithm to estimate an unknown Gaussian distribution up to an arbitrarily tiny total variation error using $\widetilde{O}(d^2)$ samples while tolerating a constant fraction of adversarial outliers. Prior to our work, all efficient approximate DP algorithms incurred a super-quadratic sample cost or were not outlier-robust. For the special case of mean estimation, our algorithm achieves the optimal sample complexity of $\widetilde O(d)$, improving on a $\widetilde O(d^{1.5})$ bound from prior work. Our pure DP algorithm relies on a recursive private preconditioning subroutine that utilizes the recent work on private mean estimation [Hopkins et al., 2022]. Our approximate DP algorithms are based on a substantial upgrade of the method of stabilizing convex relaxations introduced in [Kothari et al., 2022]. △ Less

Submitted 1 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

arXiv:2210.01862 [pdf, other]

Composite Likelihoods with Bounded Weights in Extrapolation of Data

Authors: Margaret Gamalo, Yoonji Kim, Fan Zhang, Jun**g Lin

Abstract: Among many efforts to facilitate timely access to safe and effective medicines to children, increased attention has been given to extrapolation. Loosely, it is the leveraging of conclusions or available data from adults or older age groups to draw conclusions for the target pediatric population when it can be assumed that the course of the disease and the expected response to a medicinal product w… ▽ More Among many efforts to facilitate timely access to safe and effective medicines to children, increased attention has been given to extrapolation. Loosely, it is the leveraging of conclusions or available data from adults or older age groups to draw conclusions for the target pediatric population when it can be assumed that the course of the disease and the expected response to a medicinal product would be sufficiently similar in the pediatric and the reference population. Extrapolation then can be characterized as a statistical map** of information from the reference (adults or older age groups) to the target pediatric population. The translation, or loosely map** of information, can be through a composite likelihood approach where the likelihood of the reference population is weighted by exponentiation and that this exponent is related to the value of the mapped information in the target population. The weight is bounded above and below recognizing the fact that similarity (of the disease and the expected response) is still valid despite variability of response between the cohorts. Maximum likelihood approaches are then used for estimation of parameters and asymptotic theory is used to derive distributions of estimates for use in inference. Hence, the estimation of effects in the target population borrows information from reference population. In addition, this manuscript also talks about how this method is related to the Bayesian statistical paradigm. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Comments: 28 pages, 4 figures, 3 tables

arXiv:2209.09845 [pdf, other]

Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL

Authors: Fengzhuo Zhang, Boyi Liu, Kaixin Wang, Vincent Y. F. Tan, Zhuoran Yang, Zhaoran Wang

Abstract: The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transf… ▽ More The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents. Our improved generalization bound may be of independent interest and is applicable to other regression problems related to the transformer beyond MARL. △ Less

Submitted 16 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

arXiv:2203.09611 [pdf, other]

doi 10.1080/13658816.2022.2053980

STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity

Authors: Yuhao Kang, Kunlin Wu, Song Gao, Ignavier Ng, **meng Rao, Shan Ye, Fan Zhang, Teng Fei

Abstract: Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-B… ▽ More Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc. △ Less

Submitted 30 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Journal ref: International Journal of Geographical Information Science, Year 2022

arXiv:2110.14341 [pdf, ps, other]

Active-LATHE: An Active Learning Algorithm for Boosting the Error Exponent for Learning Homogeneous Ising Trees

Authors: Fengzhuo Zhang, Anshoo Tandon, Vincent Y. F. Tan

Abstract: The Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) has been a mainstay for the learning of tree-structured graphical models from i.i.d.\ sampled data vectors. Its theoretical properties have been well-studied and are well-understood. In this paper, we focus on the class of trees that are arguably even more fundamental, namely {\em homogeneous} trees in which each pair of nodes that forms an… ▽ More The Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) has been a mainstay for the learning of tree-structured graphical models from i.i.d.\ sampled data vectors. Its theoretical properties have been well-studied and are well-understood. In this paper, we focus on the class of trees that are arguably even more fundamental, namely {\em homogeneous} trees in which each pair of nodes that forms an edge has the same correlation $ρ$. We ask whether we are able to further reduce the error probability of learning the structure of the homogeneous tree model when {\em active learning} or {\em active sampling of nodes or variables} is allowed. Our figure of merit is the {\em error exponent}, which quantifies the exponential rate of decay of the error probability with an increasing number of data samples. At first sight, an improvement in the error exponent seems impossible, as all the edges are statistically identical. We design and analyze an algorithm Active Learning Algorithm for Trees with Homogeneous Edge (Active-LATHE), which surprisingly boosts the error exponent by at least 40\% when $ρ$ is at least $0.8$. For all other values of $ρ$, we also observe commensurate, but more modest, improvements in the error exponent. Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph in which we are less confident of being correct. △ Less

Submitted 28 October, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

arXiv:2106.00885 [pdf, ps, other]

Robustifying Algorithms of Learning Latent Trees with Vector Variables

Authors: Fengzhuo Zhang, Vincent Y. F. Tan

Abstract: We consider learning the structures of Gaussian latent tree models with vector observations when a subset of them are arbitrarily corrupted. First, we present the sample complexities of Recursive Grou** (RG) and Chow-Liu Recursive Grou** (CLRG) without the assumption that the effective depth is bounded in the number of observed nodes, significantly generalizing the results in Choi et al. (2011… ▽ More We consider learning the structures of Gaussian latent tree models with vector observations when a subset of them are arbitrarily corrupted. First, we present the sample complexities of Recursive Grou** (RG) and Chow-Liu Recursive Grou** (CLRG) without the assumption that the effective depth is bounded in the number of observed nodes, significantly generalizing the results in Choi et al. (2011). We show that Chow-Liu initialization in CLRG greatly reduces the sample complexity of RG from being exponential in the diameter of the tree to only logarithmic in the diameter for the hidden Markov model (HMM). Second, we robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product. These robustified algorithms can tolerate a number of corruptions up to the square root of the number of clean samples. Finally, we derive the first known instance-dependent impossibility result for structure learning of latent trees. The optimalities of the robust version of CLRG and NJ are verified by comparing their sample complexities and the impossibility result. △ Less

Submitted 25 October, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

arXiv:2103.16785 [pdf, other]

Individually Fair Gradient Boosting

Authors: Alexander Vargo, Fan Zhang, Mikhail Yurochkin, Yuekai Sun

Abstract: We consider the task of enforcing individual fairness in gradient boosting. Gradient boosting is a popular method for machine learning from tabular data, which arise often in applications where algorithmic fairness is a concern. At a high level, our approach is a functional gradient descent on a (distributionally) robust loss function that encodes our intuition of algorithmic fairness for the ML t… ▽ More We consider the task of enforcing individual fairness in gradient boosting. Gradient boosting is a popular method for machine learning from tabular data, which arise often in applications where algorithmic fairness is a concern. At a high level, our approach is a functional gradient descent on a (distributionally) robust loss function that encodes our intuition of algorithmic fairness for the ML task at hand. Unlike prior approaches to individual fairness that only work with smooth ML models, our approach also works with non-smooth models such as decision trees. We show that our algorithm converges globally and generalizes. We also demonstrate the efficacy of our algorithm on three ML problems susceptible to algorithmic bias. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: ICLR Camera-Ready Version

arXiv:2103.16451 [pdf, other]

Robustifying Conditional Portfolio Decisions via Optimal Transport

Authors: Viet Anh Nguyen, Fan Zhang, Shanshan Wang, Jose Blanchet, Erick Delage, Yinyu Ye

Abstract: We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariat… ▽ More We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariate-return probability distribution in an optimal transport ambiguity set. Despite the non-linearity of the objective function in the probability measure, we show that the distributionally robust portfolio allocation with side information problem can be reformulated as a finite-dimensional optimization problem. If portfolio decisions are made based on either the mean-variance or the mean-Conditional Value-at-Risk criterion, the resulting reformulation can be further simplified to second-order or semi-definite cone programs. Empirical studies in the US equity market demonstrate the advantage of our integrative framework against other benchmarks. △ Less

Submitted 9 April, 2024; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: 1 figure

arXiv:2010.08601 [pdf]

Information Coefficient as a Performance Measure of Stock Selection Models

Authors: Feng Zhang, Ruite Guo, Honggao Cao

Abstract: Information coefficient (IC) is a widely used metric for measuring investment managers' skills in selecting stocks. However, its adequacy and effectiveness for evaluating stock selection models has not been clearly understood, as IC from a realistic stock selection model can hardly be materially different from zero and is often accompanies with high volatility. In this paper, we investigate the be… ▽ More Information coefficient (IC) is a widely used metric for measuring investment managers' skills in selecting stocks. However, its adequacy and effectiveness for evaluating stock selection models has not been clearly understood, as IC from a realistic stock selection model can hardly be materially different from zero and is often accompanies with high volatility. In this paper, we investigate the behavior of IC as a performance measure of stick selection models. Through simulation and simple statistical modeling, we examine the IC behavior both statically and dynamically. The examination helps us propose two practical procedures that one may use for IC-based ongoing performance monitoring of stock selection models. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: 15 pages, 2 figures, and 8 tables

MSC Class: 91-08; 91-11

arXiv:2010.05373 [pdf, other]

Distributionally Robust Local Non-parametric Conditional Estimation

Authors: Viet Anh Nguyen, Fan Zhang, Jose Blanchet, Erick Delage, Yinyu Ye

Abstract: Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary data), thus they are sensitive to adversarial noise and may perfo… ▽ More Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary data), thus they are sensitive to adversarial noise and may perform poorly under a low sample size. To alleviate these issues, we propose a new distributionally robust estimator that generates non-parametric local estimates by minimizing the worst-case conditional expected loss over all adversarial distributions in a Wasserstein ambiguity set. We show that despite being generally intractable, the local estimator can be efficiently found via convex optimization under broadly applicable settings, and it is robust to the corruption and heterogeneity of the data. Experiments with synthetic and MNIST datasets show the competitive performance of this new class of estimators. △ Less

Submitted 11 October, 2020; originally announced October 2020.

arXiv:2009.03969 [pdf, ps, other]

Convergence Rates of Empirical Bayes Posterior Distributions: A Variational Perspective

Authors: Fengshuo Zhang, Chao Gao

Abstract: We study the convergence rates of empirical Bayes posterior distributions for nonparametric and high-dimensional inference. We show that as long as the hyperparameter set is discrete, the empirical Bayes posterior distribution induced by the maximum marginal likelihood estimator can be regarded as a variational approximation to a hierarchical Bayes posterior distribution. This connection between e… ▽ More We study the convergence rates of empirical Bayes posterior distributions for nonparametric and high-dimensional inference. We show that as long as the hyperparameter set is discrete, the empirical Bayes posterior distribution induced by the maximum marginal likelihood estimator can be regarded as a variational approximation to a hierarchical Bayes posterior distribution. This connection between empirical Bayes and variational Bayes allows us to leverage the recent results in the variational Bayes literature, and directly obtains the convergence rates of empirical Bayes posterior distributions from a variational perspective. For a more general hyperparameter set that is not necessarily discrete, we introduce a new technique called "prior decomposition" to deal with prior distributions that can be written as convex combinations of probability measures whose supports are low-dimensional subspaces. This leads to generalized versions of the classical "prior mass and testing" conditions for the convergence rates of empirical Bayes. Our theory is applied to a number of statistical estimation problems including nonparametric density estimation and sparse linear regression. △ Less

Submitted 8 September, 2020; originally announced September 2020.

arXiv:2007.15839 [pdf, ps, other]

Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization

Authors: Samuel B. Hopkins, Jerry Li, Fred Zhang

Abstract: We study the problem of estimating the mean of a distribution in high dimensions when either the samples are adversarially corrupted or the distribution is heavy-tailed. Recent developments in robust statistics have established efficient and (near) optimal procedures for both settings. However, the algorithms developed on each side tend to be sophisticated and do not directly transfer to the other… ▽ More We study the problem of estimating the mean of a distribution in high dimensions when either the samples are adversarially corrupted or the distribution is heavy-tailed. Recent developments in robust statistics have established efficient and (near) optimal procedures for both settings. However, the algorithms developed on each side tend to be sophisticated and do not directly transfer to the other, with many of them having ad-hoc or complicated analyses. In this paper, we provide a meta-problem and a duality theorem that lead to a new unified view on robust and heavy-tailed mean estimation in high dimensions. We show that the meta-problem can be solved either by a variant of the Filter algorithm from the recent literature on robust estimation or by the quantum entropy scoring scheme (QUE), due to Dong, Hopkins and Li (NeurIPS '19). By leveraging our duality theorem, these results translate into simple and efficient algorithms for both robust and heavy-tailed settings. Furthermore, the QUE-based procedure has run-time that matches the fastest known algorithms on both fronts. Our analysis of Filter is through the classic regret bound of the multiplicative weights update method. This connection allows us to avoid the technical complications in previous works and improve upon the run-time analysis of a gradient-descent-based algorithm for robust mean estimation by Cheng, Diakonikolas, Ge and Soltanolkotabi (ICML '20). △ Less

Submitted 18 January, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

Comments: 40 pages

arXiv:2007.09312 [pdf, other]

DWMD: Dimensional Weighted Orderwise Moment Discrepancy for Domain-specific Hidden Representation Matching

Authors: Rongzhe Wei, Fa Zhang, Bo Dong, Qinghua Zheng

Abstract: Knowledge transfer from a source domain to a different but semantically related target domain has long been an important topic in the context of unsupervised domain adaptation (UDA). A key challenge in this field is establishing a metric that can exactly measure the data distribution discrepancy between two homogeneous domains and adopt it in distribution alignment, especially in the matching of f… ▽ More Knowledge transfer from a source domain to a different but semantically related target domain has long been an important topic in the context of unsupervised domain adaptation (UDA). A key challenge in this field is establishing a metric that can exactly measure the data distribution discrepancy between two homogeneous domains and adopt it in distribution alignment, especially in the matching of feature representations in the hidden activation space. Existing distribution matching approaches can be interpreted as failing to either explicitly orderwise align higher-order moments or satisfy the prerequisite of certain assumptions in practical uses. We propose a novel moment-based probability distribution metric termed dimensional weighted orderwise moment discrepancy (DWMD) for feature representation matching in the UDA scenario. Our metric function takes advantage of a series for high-order moment alignment, and we theoretically prove that our DWMD metric function is error-free, which means that it can strictly reflect the distribution differences between domains and is valid without any feature distribution assumption. In addition, since the discrepancies between probability distributions in each feature dimension are different, dimensional weighting is considered in our function. We further calculate the error bound of the empirical estimate of the DWMD metric in practical applications. Comprehensive experiments on benchmark datasets illustrate that our method yields state-of-the-art distribution metrics. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2006.08218 [pdf, other]

doi 10.1109/TKDE.2021.3090866

Self-supervised Learning: Generative or Contrastive

Authors: Xiao Liu, Fan** Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, **g Zhang, Jie Tang

Abstract: Deep supervised learning has achieved great success in the last decade. However, its deficiencies of dependence on manual labels and vulnerability to attacks have driven people to explore a better solution. As an alternative, self-supervised learning attracts many researchers for its soaring performance on representation learning in the last several years. Self-supervised representation learning l… ▽ More Deep supervised learning has achieved great success in the last decade. However, its deficiencies of dependence on manual labels and vulnerability to attacks have driven people to explore a better solution. As an alternative, self-supervised learning attracts many researchers for its soaring performance on representation learning in the last several years. Self-supervised representation learning leverages input data itself as supervision and benefits almost all types of downstream tasks. In this survey, we take a look into new self-supervised learning methods for representation in computer vision, natural language processing, and graph learning. We comprehensively review the existing empirical methods and summarize them into three main categories according to their objectives: generative, contrastive, and generative-contrastive (adversarial). We further investigate related theoretical analysis work to provide deeper thoughts on how self-supervised learning works. Finally, we briefly discuss open problems and future directions for self-supervised learning. An outline slide for the survey is provided. △ Less

Submitted 20 March, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 24 pages, 19 figures

arXiv:2006.05630 [pdf, other]

Distributionally Robust Batch Contextual Bandits

Authors: Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet

Abstract: Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the s… ▽ More Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset. △ Less

Submitted 11 September, 2023; v1 submitted 9 June, 2020; originally announced June 2020.

Comments: The short version has been accepted in ICML 2020

arXiv:2006.00234 [pdf, other]

Integrating global spatial features in CNN based Hyperspectral/SAR imagery classification

Authors: Fan Zhang, MinChao Yan, Chen Hu, Jun Ni, Fei Ma

Abstract: The land cover classification has played an important role in remote sensing because it can intelligently identify things in one huge remote sensing image to reduce the work of humans. However, a lot of classification methods are designed based on the pixel feature or limited spatial feature of the remote sensing image, which limits the classification accuracy and universality of their methods. Th… ▽ More The land cover classification has played an important role in remote sensing because it can intelligently identify things in one huge remote sensing image to reduce the work of humans. However, a lot of classification methods are designed based on the pixel feature or limited spatial feature of the remote sensing image, which limits the classification accuracy and universality of their methods. This paper proposed a novel method to take into the information of remote sensing image, i.e., geographic latitude-longitude information. In addition, a dual-branch convolutional neural network (CNN) classification method is designed in combination with the global information to mine the pixel features of the image. Then, the features of the two neural networks are fused with another fully neural network to realize the classification of remote sensing images. Finally, two remote sensing images are used to verify the effectiveness of our method, including hyperspectral imaging (HSI) and polarimetric synthetic aperture radar (PolSAR) imagery. The result of the proposed method is superior to the traditional single-channel convolutional neural network. △ Less

Submitted 15 June, 2020; v1 submitted 30 May, 2020; originally announced June 2020.

arXiv:2005.12154 [pdf, other]

doi 10.1109/TCYB.2015.2415032

Adversarial Feature Selection against Evasion Attacks

Authors: Fei Zhang, Patrick P. K. Chan, Battista Biggio, Daniel S. Yeung, Fabio Roli

Abstract: Pattern recognition and machine learning techniques have been increasingly adopted in adversarial settings such as spam, intrusion and malware detection, although their security against well-crafted attacks that aim to evade detection by manipulating data at test time has not yet been thoroughly assessed. While previous work has been mainly focused on devising adversary-aware classification algori… ▽ More Pattern recognition and machine learning techniques have been increasingly adopted in adversarial settings such as spam, intrusion and malware detection, although their security against well-crafted attacks that aim to evade detection by manipulating data at test time has not yet been thoroughly assessed. While previous work has been mainly focused on devising adversary-aware classification algorithms to counter evasion attempts, only few authors have considered the impact of using reduced feature sets on classifier security against the same attacks. An interesting, preliminary result is that classifier security to evasion may be even worsened by the application of feature selection. In this paper, we provide a more detailed investigation of this aspect, shedding some light on the security properties of feature selection against evasion attacks. Inspired by previous work on adversary-aware classifiers, we propose a novel adversary-aware feature selection model that can improve classifier security against evasion attacks, by incorporating specific assumptions on the adversary's data manipulation strategy. We focus on an efficient, wrapper-based implementation of our approach, and experimentally validate its soundness on different application examples, including spam and malware detection. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Journal ref: IEEE Transactions on Cybernetics, vol. 46, no. 3, March 2016

arXiv:2003.01575 [pdf, other]

Evaluation Framework For Large-scale Federated Learning

Authors: Lifeng Liu, Fengda Zhang, Jun Xiao, Chao Wu

Abstract: Federated learning is proposed as a machine learning setting to enable distributed edge devices, such as mobile phones, to collaboratively learn a shared prediction model while kee** all the training data on device, which can not only take full advantage of data distributed across millions of nodes to train a good model but also protect data privacy. However, learning in scenario above poses new… ▽ More Federated learning is proposed as a machine learning setting to enable distributed edge devices, such as mobile phones, to collaboratively learn a shared prediction model while kee** all the training data on device, which can not only take full advantage of data distributed across millions of nodes to train a good model but also protect data privacy. However, learning in scenario above poses new challenges. In fact, data across a massive number of unreliable devices is likely to be non-IID (identically and independently distributed), which may make the performance of models trained by federated learning unstable. In this paper, we introduce a framework designed for large-scale federated learning which consists of approaches to generating dataset and modular evaluation framework. Firstly, we construct a suite of open-source non-IID datasets by providing three respects including covariate shift, prior probability shift, and concept shift, which are grounded in real-world assumptions. In addition, we design several rigorous evaluation metrics including the number of network nodes, the size of datasets, the number of communication rounds and communication resources etc. Finally, we present an open-source benchmark for large-scale federated learning research. △ Less

Submitted 11 March, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

arXiv:2002.07349 [pdf, other]

Correlation-aware Deep Generative Model for Unsupervised Anomaly Detection

Authors: Haoyi Fan, Fengbin Zhang, Ruidong Wang, Liang Xi, Zuoyong Li

Abstract: Unsupervised anomaly detection aims to identify anomalous samples from highly complex and unstructured data, which is pervasive in both fundamental research and industrial applications. However, most existing methods neglect the complex correlation among data samples, which is important for capturing normal patterns from which the abnormal ones deviate. In this paper, we propose a method of Correl… ▽ More Unsupervised anomaly detection aims to identify anomalous samples from highly complex and unstructured data, which is pervasive in both fundamental research and industrial applications. However, most existing methods neglect the complex correlation among data samples, which is important for capturing normal patterns from which the abnormal ones deviate. In this paper, we propose a method of Correlation aware unsupervised Anomaly detection via Deep Gaussian Mixture Model (CADGMM), which captures the complex correlation among data points for high-quality low-dimensional representation learning. Specifically, the relations among data samples are correlated firstly in forms of a graph structure, in which, the node denotes the sample and the edge denotes the correlation between two samples from the feature space. Then, a dual-encoder that consists of a graph encoder and a feature encoder, is employed to encode both the feature and correlation information of samples into the low-dimensional latent space jointly, followed by a decoder for data reconstruction. Finally, a separate estimation network as a Gaussian Mixture Model is utilized to estimate the density of the learned latent vector, and the anomalies can be detected by measuring the energy of the samples. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method. △ Less

Submitted 19 October, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

Comments: (Updating code and data) Accepted by PAKDD2020. Copyright (c) 2020 Springer. The source code and dataset are available at https://haoyfan.github.io/. Only personal use of these materials is permitted

MSC Class: 68T30 ACM Class: I.5.4

arXiv:2002.03665 [pdf, other]

AnomalyDAE: Dual autoencoder for anomaly detection on attributed networks

Authors: Haoyi Fan, Fengbin Zhang, Zuoyong Li

Abstract: Anomaly detection on attributed networks aims at finding nodes whose patterns deviate significantly from the majority of reference nodes, which is pervasive in many applications such as network intrusion detection and social spammer detection. However, most existing methods neglect the complex cross-modality interactions between network structure and node attribute. In this paper, we propose a dee… ▽ More Anomaly detection on attributed networks aims at finding nodes whose patterns deviate significantly from the majority of reference nodes, which is pervasive in many applications such as network intrusion detection and social spammer detection. However, most existing methods neglect the complex cross-modality interactions between network structure and node attribute. In this paper, we propose a deep joint representation learning framework for anomaly detection through a dual autoencoder (AnomalyDAE), which captures the complex interactions between network structure and node attribute for high-quality embeddings. Specifically, AnomalyDAE consists of a structure autoencoder and an attribute autoencoder to learn both node embedding and attribute embedding jointly in latent space. Moreover, attention mechanism is employed in structure encoder to learn the importance between a node and its neighbors for an effective capturing of structure pattern, which is important to anomaly detection. Besides, by taking both the node embedding and attribute embedding as inputs of attribute decoder, the cross-modality interactions between network structure and node attribute are learned during the reconstruction of node attribute. Finally, anomalies can be detected by measuring the reconstruction errors of nodes from both the structure and attribute perspectives. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method. △ Less

Submitted 12 February, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

MSC Class: 68T30 ACM Class: I.5.4

arXiv:1911.05441 [pdf, other]

Regression via Arbitrary Quantile Modeling

Authors: Faen Zhang, Xinyu Fan, Hui Xu, Pengcheng Zhou, Yujian He, Junlong Liu

Abstract: In the regression problem, L1 and L2 are the most commonly used loss functions, which produce mean predictions with different biases. However, the predictions are neither robust nor adequate enough since they only capture a few conditional distributions instead of the whole distribution, especially for small datasets. To address this problem, we proposed arbitrary quantile modeling to regulate the… ▽ More In the regression problem, L1 and L2 are the most commonly used loss functions, which produce mean predictions with different biases. However, the predictions are neither robust nor adequate enough since they only capture a few conditional distributions instead of the whole distribution, especially for small datasets. To address this problem, we proposed arbitrary quantile modeling to regulate the prediction, which achieved better performance compared to traditional loss functions. More specifically, a new distribution regression method, Deep Distribution Regression (DDR), is proposed to estimate arbitrary quantiles of the response variable. Our DDR method consists of two models: a Q model, which predicts the corresponding value for arbitrary quantile, and an F model, which predicts the corresponding quantile for arbitrary value. Furthermore, the duality between Q and F models enables us to design a novel loss function for joint training and perform a dual inference mechanism. Our experiments demonstrate that our DDR-joint and DDR-disjoint methods outperform previous methods such as AdaBoost, random forest, LightGBM, and neural networks both in terms of mean and quantile prediction. △ Less

Submitted 13 November, 2019; originally announced November 2019.

arXiv:1910.09090 [pdf]

A game method for improving the interpretability of convolution neural network

Authors: **wei Zhao, Qizhou Wang, Fuqiang Zhang, Wanli Qiu, Yufei Wang, Yu Liu, Guo Xie, Weigang Ma, Bin Wang, Xinhong Hei

Abstract: Real artificial intelligence always has been focused on by many machine learning researchers, especially in the area of deep learning. However deep neural network is hard to be understood and explained, and sometimes, even metaphysics. The reason is, we believe that: the network is essentially a perceptual model. Therefore, we believe that in order to complete complex intelligent activities from s… ▽ More Real artificial intelligence always has been focused on by many machine learning researchers, especially in the area of deep learning. However deep neural network is hard to be understood and explained, and sometimes, even metaphysics. The reason is, we believe that: the network is essentially a perceptual model. Therefore, we believe that in order to complete complex intelligent activities from simple perception, it is necessary to con-struct another interpretable logical network to form accurate and reasonable responses and explanations to external things. Researchers like Bolei Zhou and Quanshi Zhang have found many explanatory rules for deep feature extraction aimed at the feature extraction stage of convolution neural network. However, although researchers like Marco Gori have also made great efforts to improve the interpretability of the fully connected layers of the network, the problem is also very difficult. This paper firstly analyzes its reason. Then a method of constructing logical network based on the fully connected layers and extracting logical relation between input and output of the layers is proposed. The game process between perceptual learning and logical abstract cognitive learning is implemented to improve the interpretable performance of deep learning process and deep learning model. The benefits of our approach are illustrated on benchmark data sets and in real-world experiments. △ Less

Submitted 20 October, 2019; originally announced October 2019.

arXiv:1909.06730 [pdf, other]

Machine Discovery of Partial Differential Equations from Spatiotemporal Data

Authors: Ye Yuan, Junlin Li, Liang Li, Frank Jiang, Xiuchuan Tang, Fumin Zhang, Sheng Liu, Jorge Goncalves, Henning U. Voss, Xiuting Li, Jürgen Kurths, Han Ding

Abstract: The study presents a general framework for discovering underlying Partial Differential Equations (PDEs) using measured spatiotemporal data. The method, called Sparse Spatiotemporal System Discovery ($\text{S}^3\text{d}$), decides which physical terms are necessary and which can be removed (because they are physically negligible in the sense that they do not affect the dynamics too much) from a poo… ▽ More The study presents a general framework for discovering underlying Partial Differential Equations (PDEs) using measured spatiotemporal data. The method, called Sparse Spatiotemporal System Discovery ($\text{S}^3\text{d}$), decides which physical terms are necessary and which can be removed (because they are physically negligible in the sense that they do not affect the dynamics too much) from a pool of candidate functions. The method is built on the recent development of Sparse Bayesian Learning; which enforces the sparsity in the to-be-identified PDEs, and therefore can balance the model complexity and fitting error with theoretical guarantees. Without leveraging prior knowledge or assumptions in the discovery process, we use an automated approach to discover ten types of PDEs, including the famous Navier-Stokes and sine-Gordon equations, from simulation data alone. Moreover, we demonstrate our data-driven discovery process with the Complex Ginzburg-Landau Equation (CGLE) using data measured from a traveling-wave convection experiment. Our machine discovery approach presents solutions that has the potential to inspire, support and assist physicists for the establishment of physical laws from measured spatiotemporal data, especially in notorious fields that are often too complex to allow a straightforward establishment of physical law, such as biophysics, fluid dynamics, neuroscience or nonlinear optics. △ Less

Submitted 15 September, 2019; originally announced September 2019.

arXiv:1909.00122 [pdf, other]

HM-NAS: Efficient Neural Architecture Search via Hierarchical Masking

Authors: Shen Yan, Biyi Fang, Faen Zhang, Yu Zheng, Xiao Zeng, Hui Xu, Mi Zhang

Abstract: The use of automatic methods, often referred to as Neural Architecture Search (NAS), in designing neural network architectures has recently drawn considerable attention. In this work, we present an efficient NAS approach, named HM- NAS, that generalizes existing weight sharing based NAS approaches. Existing weight sharing based NAS approaches still adopt hand-designed heuristics to generate archit… ▽ More The use of automatic methods, often referred to as Neural Architecture Search (NAS), in designing neural network architectures has recently drawn considerable attention. In this work, we present an efficient NAS approach, named HM- NAS, that generalizes existing weight sharing based NAS approaches. Existing weight sharing based NAS approaches still adopt hand-designed heuristics to generate architecture candidates. As a consequence, the space of architecture candidates is constrained in a subset of all possible architectures, making the architecture search results sub-optimal. HM-NAS addresses this limitation via two innovations. First, HM-NAS incorporates a multi-level architecture encoding scheme to enable searching for more flexible network architectures. Second, it discards the hand-designed heuristics and incorporates a hierarchical masking scheme that automatically learns and determines the optimal architecture. Compared to state-of-the-art weight sharing based approaches, HM-NAS is able to achieve better architecture search performance and competitive model evaluation accuracy. Without the constraint imposed by the hand-designed heuristics, our searched networks contain more flexible and meaningful architectures that existing weight sharing based NAS approaches are not able to discover. △ Less

Submitted 7 September, 2019; v1 submitted 31 August, 2019; originally announced September 2019.

Comments: 9 pages, 6 figures, 6 tables. Nominated for ICCV 2019 Neural Architects Workshop Best Paper Award

arXiv:1908.04468 [pdf, ps, other]

A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates

Authors: Zhixian Lei, Kyle Luh, Prayaag Venkat, Fred Zhang

Abstract: We study the algorithmic problem of estimating the mean of heavy-tailed random vector in $\mathbb{R}^d$, given $n$ i.i.d. samples. The goal is to design an efficient estimator that attains the optimal sub-gaussian error bound, only assuming that the random vector has bounded mean and covariance. Polynomial-time solutions to this problem are known but have high runtime due to their use of semi-defi… ▽ More We study the algorithmic problem of estimating the mean of heavy-tailed random vector in $\mathbb{R}^d$, given $n$ i.i.d. samples. The goal is to design an efficient estimator that attains the optimal sub-gaussian error bound, only assuming that the random vector has bounded mean and covariance. Polynomial-time solutions to this problem are known but have high runtime due to their use of semi-definite programming (SDP). Conceptually, it remains open whether convex relaxation is truly necessary for this problem. In this work, we show that it is possible to go beyond SDP and achieve better computational efficiency. In particular, we provide a spectral algorithm that achieves the optimal statistical performance and runs in time $\widetilde O\left(n^2 d \right)$, improving upon the previous fastest runtime $\widetilde O\left(n^{3.5}+ n^2d\right)$ by Cherapanamjeri el al. (COLT '19). Our algorithm is spectral in that it only requires (approximate) eigenvector computations, which can be implemented very efficiently by, for example, power iteration or the Lanczos method. At the core of our algorithm is a novel connection between the furthest hyperplane problem introduced by Karnin et al. (COLT '12) and a structural lemma on heavy-tailed distributions by Lugosi and Mendelson (Ann. Stat. '19). This allows us to iteratively reduce the estimation error at a geometric rate using only the information derived from the top singular vector of the data matrix, leading to a significantly faster running time. △ Less

Submitted 17 February, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

arXiv:1907.01099 [pdf, other]

Predicting Treatment Initiation from Clinical Time Series Data via Graph-Augmented Time-Sensitive Model

Authors: Fan Zhang, Tong Wu, Yunlong Wang, Yong Cai, Cao Xiao, Emily Zhao, Lucas Glass, Jimeng Sun

Abstract: Many computational models were proposed to extract temporal patterns from clinical time series for each patient and among patient group for predictive healthcare. However, the common relations among patients (e.g., share the same doctor) were rarely considered. In this paper, we represent patients and clinicians relations by bipartite graphs addressing for example from whom a patient get a diagnos… ▽ More Many computational models were proposed to extract temporal patterns from clinical time series for each patient and among patient group for predictive healthcare. However, the common relations among patients (e.g., share the same doctor) were rarely considered. In this paper, we represent patients and clinicians relations by bipartite graphs addressing for example from whom a patient get a diagnosis. We then solve for the top eigenvectors of the graph Laplacian, and include the eigenvectors as latent representations of the similarity between patient-clinician pairs into a time-sensitive prediction model. We conducted experiments using real-world data to predict the initiation of first-line treatment for Chronic Lymphocytic Leukemia (CLL) patients. Results show that relational similarity can improve prediction over multiple baselines, for example a 5% incremental over long-short term memory baseline in terms of area under precision-recall curve. △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: 5 pages, 3 figures, accepted by ICML 2019 Time Series Workshop

arXiv:1906.01198 [pdf, ps, other]

Tensor Restricted Isometry Property Analysis For a Large Class of Random Measurement Ensembles

Authors: Feng Zhang, Wendong Wang, **gyao Hou, Jianjun Wang, Jianwen Huang

Abstract: In previous work, theoretical analysis based on the tensor Restricted Isometry Property (t-RIP) established the robust recovery guarantees of a low-tubal-rank tensor. The obtained sufficient conditions depend strongly on the assumption that the linear measurement maps satisfy the t-RIP. In this paper, by exploiting the probabilistic arguments, we prove that such linear measurement maps exist under… ▽ More In previous work, theoretical analysis based on the tensor Restricted Isometry Property (t-RIP) established the robust recovery guarantees of a low-tubal-rank tensor. The obtained sufficient conditions depend strongly on the assumption that the linear measurement maps satisfy the t-RIP. In this paper, by exploiting the probabilistic arguments, we prove that such linear measurement maps exist under suitable conditions on the number of measurements in terms of the tubal rank r and the size of third-order tensor n1, n2, n3. And the obtained minimal possible number of linear measurements is nearly optimal compared with the degrees of freedom of a tensor with tubal rank r. Specially, we consider a random sub-Gaussian distribution that includes Gaussian, Bernoulli and all bounded distributions and construct a large class of linear maps that satisfy a t-RIP with high probability. Moreover, the validity of the required number of measurements is verified by numerical experiments. △ Less

Submitted 15 September, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

arXiv:1905.11604 [pdf, other]

SGD on Neural Networks Learns Functions of Increasing Complexity

Authors: Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L. Edelman, Fred Zhang, Boaz Barak

Abstract: We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations pro… ▽ More We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity. This hypothesis can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. We also show that the linear classifier learned in the initial stages is "retained" throughout the execution even if training is continued to the point of zero training error, and complement this with a theoretical result in a simplified model. Key to our work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information. △ Less

Submitted 28 May, 2019; originally announced May 2019.

Comments: Submitted to NeurIPS 2019

arXiv:1905.10954 [pdf, other]

doi 10.1145/3219819.3219962

Transcribing Content from Structural Images with Spotlight Mechanism

Authors: Yu Yin, Zhenya Huang, Enhong Chen, Qi Liu, Fuzheng Zhang, Xing Xie, Guo** Hu

Abstract: Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., stru… ▽ More Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by develo** a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework. △ Less

Submitted 26 May, 2019; originally announced May 2019.

Comments: Accepted by KDD2018 Research Track. In proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'18)

arXiv:1905.07845 [pdf, other]

doi 10.1109/WSC40007.2019.9004804

A Distributionally Robust Boosting Algorithm

Authors: Jose Blanchet, Yang Kang, Fan Zhang, Zhangyi Hu

Abstract: Distributionally Robust Optimization (DRO) has been shown to provide a flexible framework for decision making under uncertainty and statistical estimation. For example, recent works in DRO have shown that popular statistical estimators can be interpreted as the solutions of suitable formulated data-driven DRO problems. In turn, this connection is used to optimally select tuning parameters in terms… ▽ More Distributionally Robust Optimization (DRO) has been shown to provide a flexible framework for decision making under uncertainty and statistical estimation. For example, recent works in DRO have shown that popular statistical estimators can be interpreted as the solutions of suitable formulated data-driven DRO problems. In turn, this connection is used to optimally select tuning parameters in terms of a principled approach informed by robustness considerations. This paper contributes to this growing literature, connecting DRO and statistics, by showing how boosting algorithms can be studied via DRO. We propose a boosting type algorithm, named DRO-Boosting, as a procedure to solve our DRO formulation. Our DRO-Boosting algorithm recovers Adaptive Boosting (AdaBoost) in particular, thus showing that AdaBoost is effectively solving a DRO problem. We apply our algorithm to a financial dataset on credit card default payment prediction. We find that our approach compares favorably to alternative boosting methods which are widely used in practice. △ Less

Submitted 19 May, 2019; originally announced May 2019.

Comments: 13 pages, 1 figure

arXiv:1905.04413 [pdf, other]

Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems

Authors: Hongwei Wang, Fuzheng Zhang, Mengdi Zhang, Jure Leskovec, Miao Zhao, Wenjie Li, Zhongyuan Wang

Abstract: Knowledge graphs capture structured information and relations between a set of entities or items. As such knowledge graphs represent an attractive source of information that could help improve recommender systems. However, existing approaches in this domain rely on manual feature engineering and do not allow for an end-to-end training. Here we propose Knowledge-aware Graph Neural Networks with Lab… ▽ More Knowledge graphs capture structured information and relations between a set of entities or items. As such knowledge graphs represent an attractive source of information that could help improve recommender systems. However, existing approaches in this domain rely on manual feature engineering and do not allow for an end-to-end training. Here we propose Knowledge-aware Graph Neural Networks with Label Smoothness regularization (KGNN-LS) to provide better recommendations. Conceptually, our approach computes user-specific item embeddings by first applying a trainable function that identifies important knowledge graph relationships for a given user. This way we transform the knowledge graph into a user-specific weighted graph and then apply a graph neural network to compute personalized item embeddings. To provide better inductive bias, we rely on label smoothness assumption, which posits that adjacent items in the knowledge graph are likely to have similar user relevance labels/scores. Label smoothness provides regularization over the edge weights and we prove that it is equivalent to a label propagation scheme on a graph. We also develop an efficient implementation that shows strong scalability with respect to the knowledge graph size. Experiments on four datasets show that our method outperforms state of the art baselines. KGNN-LS also achieves strong performance in cold-start scenarios where user-item interactions are sparse. △ Less

Submitted 13 June, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

arXiv:1904.13036 [pdf, other]

doi 10.1109/TGRS.2018.2828161

Optimal Clustering Framework for Hyperspectral Band Selection

Authors: Qi Wang, Fahong Zhang, Xuelong Li

Abstract: Band selection, by choosing a set of representative bands in hyperspectral image (HSI), is an effective method to reduce the redundant information without compromising the original contents. Recently, various unsupervised band selection methods have been proposed, but most of them are based on approximation algorithms which can only obtain suboptimal solutions toward a specific objective function.… ▽ More Band selection, by choosing a set of representative bands in hyperspectral image (HSI), is an effective method to reduce the redundant information without compromising the original contents. Recently, various unsupervised band selection methods have been proposed, but most of them are based on approximation algorithms which can only obtain suboptimal solutions toward a specific objective function. This paper focuses on clustering-based band selection, and proposes a new framework to solve the above dilemma, claiming the following contributions: 1) An optimal clustering framework (OCF), which can obtain the optimal clustering result for a particular form of objective function under a reasonable constraint. 2) A rank on clusters strategy (RCS), which provides an effective criterion to select bands on existing clustering structure. 3) An automatic method to determine the number of the required bands, which can better evaluate the distinctive information produced by certain number of bands. In experiments, the proposed algorithm is compared to some state-of-the-art competitors. According to the experimental results, the proposed algorithm is robust and significantly outperform the other methods on various data sets. △ Less

Submitted 29 April, 2019; originally announced April 2019.

Journal ref: IEEE Trans. Geoscience and Remote Sensing, vol. 56, no. 10, pp. 5910-5922, 2018

arXiv:1901.08907 [pdf, other]

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Authors: Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo

Abstract: Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge gr… ▽ More Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge graph enhanced Recommendation. MKR is a deep end-to-end framework that utilizes knowledge graph embedding task to assist recommendation task. The two tasks are associated by cross&compress units, which automatically share latent features and learn high-order interactions between items in recommender systems and entities in the knowledge graph. We prove that cross&compress units have sufficient capability of polynomial approximation, and show that MKR is a generalized framework over several representative methods of recommender systems and multi-task learning. Through extensive experiments on real-world datasets, we demonstrate that MKR achieves substantial gains in movie, book, music, and news recommendation, over state-of-the-art baselines. MKR is also shown to be able to maintain a decent performance even if user-item interactions are sparse. △ Less

Submitted 23 January, 2019; originally announced January 2019.

Comments: In Proceedings of The 2019 Web Conference (WWW 2019)

arXiv:1901.08150 [pdf, other]

Hypergraph Convolution and Hypergraph Attention

Authors: Song Bai, Feihu Zhang, Philip H. S. Torr

Abstract: Recently, graph neural networks have attracted great attention and achieved prominent performance in various research fields. Most of those algorithms have assumed pairwise relationships of objects of interest. However, in many real applications, the relationships between objects are in higher-order, beyond a pairwise formulation. To efficiently learn deep embeddings on the high-order graph-struct… ▽ More Recently, graph neural networks have attracted great attention and achieved prominent performance in various research fields. Most of those algorithms have assumed pairwise relationships of objects of interest. However, in many real applications, the relationships between objects are in higher-order, beyond a pairwise formulation. To efficiently learn deep embeddings on the high-order graph-structured data, we introduce two end-to-end trainable operators to the family of graph neural networks, i.e., hypergraph convolution and hypergraph attention. Whilst hypergraph convolution defines the basic formulation of performing convolution on a hypergraph, hypergraph attention further enhances the capacity of representation learning by leveraging an attention module. With the two operators, a graph neural network is readily extended to a more flexible model and applied to diverse applications where non-pairwise relationships are observed. Extensive experimental results with semi-supervised node classification demonstrate the effectiveness of hypergraph convolution and hypergraph attention. △ Less

Submitted 10 October, 2020; v1 submitted 23 January, 2019; originally announced January 2019.

Comments: Accepted by Pattern Recognition

arXiv:1810.02225 [pdf, other]

Memristor-based Deep Convolution Neural Network: A Case Study

Authors: Fan Zhang, Miao Hu

Abstract: In this paper, we firstly introduce a method to efficiently implement large-scale high-dimensional convolution with realistic memristor-based circuit components. An experiment verified simulator is adapted for accurate prediction of analog crossbar behavior. An improved conversion algorithm is developed to convert convolution kernels to memristor-based circuits, which minimizes the error with cons… ▽ More In this paper, we firstly introduce a method to efficiently implement large-scale high-dimensional convolution with realistic memristor-based circuit components. An experiment verified simulator is adapted for accurate prediction of analog crossbar behavior. An improved conversion algorithm is developed to convert convolution kernels to memristor-based circuits, which minimizes the error with consideration of the data and kernel patterns in CNNs. With circuit simulation for all convolution layers in ResNet-20, we found that 8-bit ADC/DAC is necessary to preserve software level classification accuracy. △ Less

Submitted 14 September, 2018; originally announced October 2018.

arXiv:1807.08125 [pdf, other]

FDR-HS: An Empirical Bayesian Identification of Heterogenous Features in Neuroimage Analysis

Authors: Xinwei Sun, Ling**g Hu, Fandong Zhang, Yuan Yao, Yizhou Wang

Abstract: Recent studies found that in voxel-based neuroimage analysis, detecting and differentiating "procedural bias" that are introduced during the preprocessing steps from lesion features, not only can help boost accuracy but also can improve interpretability. To the best of our knowledge, GSplit LBI is the first model proposed in the literature to simultaneously capture both procedural bias and lesion… ▽ More Recent studies found that in voxel-based neuroimage analysis, detecting and differentiating "procedural bias" that are introduced during the preprocessing steps from lesion features, not only can help boost accuracy but also can improve interpretability. To the best of our knowledge, GSplit LBI is the first model proposed in the literature to simultaneously capture both procedural bias and lesion features. Despite the fact that it can improve prediction power by leveraging the procedural bias, it may select spurious features due to the multicollinearity in high dimensional space. Moreover, it does not take into account the heterogeneity of these two types of features. In fact, the procedural bias and lesion features differ in terms of volumetric change and spatial correlation pattern. To address these issues, we propose a "two-groups" Empirical-Bayes method called "FDR-HS" (False-Discovery-Rate Heterogenous Smoothing). Such method is able to not only avoid multicollinearity, but also exploit the heterogenous spatial patterns of features. In addition, it enjoys the simplicity in implementation by introducing hidden variables, which turns the problem into a convex optimization scheme and can be solved efficiently by the expectation-maximum (EM) algorithm. Empirical experiments have been evaluated on the Alzheimer's Disease Neuroimage Initiative (ADNI) database. The advantage of the proposed model is verified by improved interpretability and prediction power using selected features by FDR-HS. △ Less

Submitted 21 July, 2018; originally announced July 2018.

Comments: Accepted in Miccai, 2018

arXiv:1807.00943 [pdf, other]

Segmented correspondence curve regression model for quantifying reproducibility of high-throughput experiments

Authors: Feipeng Zhang, Frank Shen, Tao Yang, Qunhua Li

Abstract: The reliability of a high-throughput biological experiment relies highly on the settings of the operational factors in its experimental and data-analytic procedures. Understanding how operational factors influence the reproducibility of the experimental outcome is critical for constructing robust workflows and obtaining reliable results. One challenge in this area is that candidates at different l… ▽ More The reliability of a high-throughput biological experiment relies highly on the settings of the operational factors in its experimental and data-analytic procedures. Understanding how operational factors influence the reproducibility of the experimental outcome is critical for constructing robust workflows and obtaining reliable results. One challenge in this area is that candidates at different levels of significance may respond to the operational factors differently. To model this heterogeneity, we develop a novel segmented regression model, based on the rank concordance between candidates from different replicate samples, to characterize the varying effects of operational factors for candidates at different levels of significance. A grid search method is developed to identify the change point in response to the operational factors and estimate the covariate effects accounting for the change. A sup-likelihood-ratio-type test is proposed to test the existence of a change point. Simulation studies show that our method yields a well-calibrated type I error, is powerful in detecting the difference in reproducibility, and achieves a better model fitting than the existing method. An application on a ChIP-seq dataset reveals interesting insights on how sequencing depth affects the reproducibility of experimental results, demonstrating the usefulness of our method in designing cost-effective and reliable high-throughput workflows. △ Less

Submitted 2 July, 2018; originally announced July 2018.

arXiv:1805.07777 [pdf, other]

doi 10.1093/bioinformatics/bty241

DLBI: Deep learning guided Bayesian inference for structure reconstruction of super-resolution fluorescence microscopy

Authors: Yu Li, Fan Xu, Fa Zhang, **yong Xu, Mingshu Zhang, Ming Fan, Lihua Li, Xin Gao, Renmin Han

Abstract: Super-resolution fluorescence microscopy, with a resolution beyond the diffraction limit of light, has become an indispensable tool to directly visualize biological structures in living cells at a nanometer-scale resolution. Despite advances in high-density super-resolution fluorescent techniques, existing methods still have bottlenecks, including extremely long execution time, artificial thinning… ▽ More Super-resolution fluorescence microscopy, with a resolution beyond the diffraction limit of light, has become an indispensable tool to directly visualize biological structures in living cells at a nanometer-scale resolution. Despite advances in high-density super-resolution fluorescent techniques, existing methods still have bottlenecks, including extremely long execution time, artificial thinning and thickening of structures, and lack of ability to capture latent structures. Here we propose a novel deep learning guided Bayesian inference approach, DLBI, for the time-series analysis of high-density fluorescent images. Our method combines the strength of deep learning and statistical inference, where deep learning captures the underlying distribution of the fluorophores that are consistent with the observed time-series fluorescent images by exploring local features and correlation along time-axis, and statistical inference further refines the ultrastructure extracted by deep learning and endues physical meaning to the final image. Comprehensive experimental results on both real and simulated datasets demonstrate that our method provides more accurate and realistic local patch and large-field reconstruction than the state-of-the-art method, the 3B analysis, while our method is more than two orders of magnitude faster. The main program is available at https://github.com/lykaust15/DLBI △ Less

Submitted 1 September, 2018; v1 submitted 20 May, 2018; originally announced May 2018.

Comments: Accepted by ISMB 2018

Journal ref: Bioinformatics, Volume 34, Issue 13, 1 July 2018

arXiv:1804.00684 [pdf, other]

Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data

Authors: Bao Wang, Xiyang Luo, Fangbo Zhang, Baichuan Yuan, Andrea L. Bertozzi, P. Jeffrey Brantingham

Abstract: We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discov… ▽ More We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting. △ Less

Submitted 2 April, 2018; originally announced April 2018.

Comments: 9 pages, 19 figures

MSC Class: 65-06

arXiv:1803.07519 [pdf, other]

doi 10.1145/3238147.3238202

DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems

Authors: Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, Yadong Wang

Abstract: Deep learning (DL) defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. We have seen wide adoption of DL in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applie… ▽ More Deep learning (DL) defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. We have seen wide adoption of DL in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applied to real-world applications. Currently, the testing adequacy of a DL system is usually measured by the accuracy of test data. Considering the limitation of accessible high quality test data, good accuracy performance on test data can hardly provide confidence to the testing adequacy and generality of DL systems. Unlike traditional software systems that have clear and controllable logic and functionality, the lack of interpretability in a DL system makes system analysis and defect detection difficult, which could potentially hinder its real-world deployment. In this paper, we propose DeepGauge, a set of multi-granularity testing criteria for DL systems, which aims at rendering a multi-faceted portrayal of the testbed. The in-depth evaluation of our proposed testing criteria is demonstrated on two well-known datasets, five DL systems, and with four state-of-the-art adversarial attack techniques against DL. The potential usefulness of DeepGauge sheds light on the construction of more generic and robust DL systems. △ Less

Submitted 14 August, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

Comments: The 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2018)

Journal ref: DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 18), September 3-7, 2018, Montpellier, France

Showing 1–50 of 65 results for author: Zhang, F