-
Categorization of 31 computational methods to detect spatially variable genes from spatially resolved transcriptomics data
Authors:
Guanao Yan,
Shuo Harper Hua,
**gyi Jessica Li
Abstract:
In the analysis of spatially resolved transcriptomics data, detecting spatially variable genes (SVGs) is crucial. Numerous computational methods exist, but varying SVG definitions and methodologies lead to incomparable results. We review 31 state-of-the-art methods, categorizing SVGs into three types: overall, cell-type-specific, and spatial-domain-marker SVGs. Our review explains the intuitions u…
▽ More
In the analysis of spatially resolved transcriptomics data, detecting spatially variable genes (SVGs) is crucial. Numerous computational methods exist, but varying SVG definitions and methodologies lead to incomparable results. We review 31 state-of-the-art methods, categorizing SVGs into three types: overall, cell-type-specific, and spatial-domain-marker SVGs. Our review explains the intuitions underlying these methods, summarizes their applications, and categorizes the hypothesis tests they use in the trade-off between generality and specificity for SVG detection. We discuss challenges in SVG detection and propose future directions for improvement. Our review offers insights for method developers and users, advocating for category-specific benchmarking.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
High-Dimensional Directional Brain Network Analysis for Focal Epileptic Seizures
Authors:
Yaotian Wang,
Guofen Yan,
Seiji Tanabe,
Chang-Chia Liu,
Shayan Moosa,
Mark S. Quigg,
Tingting Zhang
Abstract:
The brain is a high-dimensional directional network system consisting of many regions as network nodes that influence each other. The directional influence from one region to another is referred to as directional connectivity. Epilepsy is a directional network disorder, as epileptic activity spreads from a seizure onset zone (SOZ) to many other regions after seizure onset. However, directional net…
▽ More
The brain is a high-dimensional directional network system consisting of many regions as network nodes that influence each other. The directional influence from one region to another is referred to as directional connectivity. Epilepsy is a directional network disorder, as epileptic activity spreads from a seizure onset zone (SOZ) to many other regions after seizure onset. However, directional network studies of epilepsy have been mostly limited to low-dimensional directional networks between the SOZ and contiguous regions due to the lack of efficient methods for analyzing high-dimensional directional brain networks. To address this gap, we study high-dimensional directional networks in epileptic brains by using a clustering-enabled multivariate autoregressive state-space model (MARSS) to analyze multi-channel intracranial EEG recordings of focal seizures. This new MARSS characterizes the SOZ, nearby regions, and many other non-SOZ regions as one integrated high-dimensional directional network system with a clustering feature. Using the new MARSS, we reveal changes in high-dimensional directional brain networks throughout seizure development. We simultaneously identify directional connections and the SOZ cluster, regions most affected by SOZ activity, in different seizure periods. We found that, after seizure onset, the numbers of directional connections of the SOZ and regions in the SOZ cluster increase significantly. We also reveal that many regions outside the SOZ cluster have no changes in directional connections, although these regions' EEG data signal ictal activity. Lastly, we use these high-dimensional network results to localize the SOZ and achieve 100% true positive rates and less than 3% false positive rates for different SOZ locations.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Critical Learning Periods in Federated Learning
Authors:
Gang Yan,
Hao Wang,
Jian Li
Abstract:
Federated learning (FL) is a popular technique to train machine learning (ML) models with decentralized data. Extensive works have studied the performance of the global model; however, it is still unclear how the training process affects the final test accuracy. Exacerbating this problem is the fact that FL executions differ significantly from traditional ML with heterogeneous data characteristics…
▽ More
Federated learning (FL) is a popular technique to train machine learning (ML) models with decentralized data. Extensive works have studied the performance of the global model; however, it is still unclear how the training process affects the final test accuracy. Exacerbating this problem is the fact that FL executions differ significantly from traditional ML with heterogeneous data characteristics across clients, involving more hyperparameters. In this work, we show that the final test accuracy of FL is dramatically affected by the early phase of the training process, i.e., FL exhibits critical learning periods, in which small gradient errors can have irrecoverable impact on the final test accuracy. To further explain this phenomenon, we generalize the trace of the Fisher Information Matrix (FIM) to FL and define a new notion called FedFIM, a quantity reflecting the local curvature of each clients from the beginning of the training in FL. Our findings suggest that the {\em initial learning phase} plays a critical role in understanding the FL performance. This is in contrast to many existing works which generally do not connect the final accuracy of FL to the early phase training. Finally, seizing critical learning periods in FL is of independent interest and could be useful for other problems such as the choices of hyperparameters such as the number of client selected per round, batch size, and more, so as to improve the performance of FL training and testing.
△ Less
Submitted 12 September, 2021;
originally announced September 2021.
-
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers
Authors:
Guojun Xiong,
Gang Yan,
Rahul Singh,
Jian Li
Abstract:
With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each worker maintains a local estimate of the optimal parameter vector, and iteratively updates it by waiting and averaging all estimates obtained from its neighbors…
▽ More
With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each worker maintains a local estimate of the optimal parameter vector, and iteratively updates it by waiting and averaging all estimates obtained from its neighbors, and then corrects it on the basis of its local dataset. However, the synchronization phase can be time consuming due to the need to wait for \textit{stragglers}, i.e., slower workers. An efficient way to mitigate this effect is to let each worker wait only for updates from the fastest neighbors before updating its local parameter. The remaining neighbors are called \textit{backup workers.} To minimize the globally training time over the network, we propose a fully distributed algorithm to dynamically determine the number of backup workers for each worker. We show that our algorithm achieves a linear speedup for convergence (i.e., convergence performance increases linearly with respect to the number of workers). We conduct extensive experiments on MNIST and CIFAR-10 to verify our theoretical results.
△ Less
Submitted 11 February, 2021;
originally announced February 2021.
-
A Bayesian State-Space Approach to Map** Directional Brain Networks
Authors:
Huazhang Li,
Yaotian Wang,
Guofen Yan,
Yinge Sun,
Seiji Tanabe,
Chang-Chia Liu,
Mark Quigg,
Tingting Zhang
Abstract:
The human brain is a directional network system of brain regions involving directional connectivity. Seizures are a directional network phenomenon as abnormal neuronal activities start from a seizure onset zone (SOZ) and propagate to otherwise healthy regions. To localize the SOZ of an epileptic patient, clinicians use iEEG to record the patient's intracranial brain activity in many small regions.…
▽ More
The human brain is a directional network system of brain regions involving directional connectivity. Seizures are a directional network phenomenon as abnormal neuronal activities start from a seizure onset zone (SOZ) and propagate to otherwise healthy regions. To localize the SOZ of an epileptic patient, clinicians use iEEG to record the patient's intracranial brain activity in many small regions. iEEG data are high-dimensional multivariate time series. We build a state-space multivariate autoregression (SSMAR) for iEEG data to model the underlying directional brain network. To produce scientifically interpretable network results, we incorporate into the SSMAR the scientific knowledge that the underlying brain network tends to have a cluster structure. Specifically, we assign to the SSMAR parameters a stochastic-blockmodel-motivated prior, which reflects the cluster structure. We develop a Bayesian framework to estimate the SSMAR, infer directional connections, and identify clusters for the unobserved network edges. The new method is robust to violations of model assumptions and outperforms existing network methods. By applying the new method to an epileptic patient's iEEG data, we reveal seizure initiation and propagation in the patient's brain network. Our method can also accurately localize the SOZ. Overall, this paper provides a tool to study the human brain network.
△ Less
Submitted 20 December, 2020;
originally announced December 2020.
-
Robust Estimation and Shrinkage in Ultrahigh Dimensional Expectile Regression with Heavy Tails and Variance Heterogeneity
Authors:
Jun Zhao,
Guan'ao Yan,
Yi Zhang
Abstract:
High-dimensional data subject to heavy-tailed phenomena and heterogeneity are commonly encountered in various scientific fields and bring new challenges to the classical statistical methods. In this paper, we combine the asymmetric square loss and huber-type robust technique to develop the robust expectile regression for ultrahigh dimensional heavy-tailed heterogeneous data. Different from the cla…
▽ More
High-dimensional data subject to heavy-tailed phenomena and heterogeneity are commonly encountered in various scientific fields and bring new challenges to the classical statistical methods. In this paper, we combine the asymmetric square loss and huber-type robust technique to develop the robust expectile regression for ultrahigh dimensional heavy-tailed heterogeneous data. Different from the classical huber method, we introduce two different tuning parameters on both sides to account for possibly asymmetry and allow them to diverge to reduce bias induced by the robust approximation. In the regularized framework, we adopt the generally folded concave penalty function like the SCAD or MCP penalty for the seek of bias reduction. We investigate the finite sample property of the corresponding estimator and figure out how our method plays its role to trades off the estimation accuracy against the heavy-tailed distribution. Also, noting that the robust asymmetric loss function is everywhere differentiable, based on our theoretical study, we propose an efficient first-order optimization algorithm after locally linear approximation of the non-convex problem. Simulation studies under various distributions demonstrates the satisfactory performances of our method in coefficient estimation, model selection and heterogeneity detection.
△ Less
Submitted 1 October, 2019; v1 submitted 19 September, 2019;
originally announced September 2019.
-
Semiparametric Expectile Regression for High-dimensional Heavy-tailed and Heterogeneous Data
Authors:
Jun Zhao,
Guan'ao Yan,
Yi Zhang
Abstract:
Recently, high-dimensional heterogeneous data have attracted a lot of attention and discussion. Under heterogeneity, semiparametric regression is a popular choice to model data in statistics. In this paper, we take advantages of expectile regression in computation and analysis of heterogeneity, and propose the regularized partially linear additive expectile regression with nonconvex penalty, for e…
▽ More
Recently, high-dimensional heterogeneous data have attracted a lot of attention and discussion. Under heterogeneity, semiparametric regression is a popular choice to model data in statistics. In this paper, we take advantages of expectile regression in computation and analysis of heterogeneity, and propose the regularized partially linear additive expectile regression with nonconvex penalty, for example, SCAD or MCP for such high-dimensional heterogeneous data. We focus on a more realistic scenario: the regression error is heavy-tailed distributed and only has finite moments, which is violated with the classical sub-gaussian distribution assumption and more common in practise. Under some regular conditions, we show that with probability tending to one, the oracle estimator is one of the local minima of our optimization problem. The theoretical study indicates that the dimension cardinality of linear covariates our procedure can handle with is essentially restricted by the moment condition of the regression error. For computation, since the corresponding optimization problem is nonconvex and nonsmooth, we derive a two-step algorithm to solve this problem. Finally, we demonstrate that the proposed method enjoys good performances in estimation accuracy and model selection through Monto Carlo simulation studies and a real data example. What's more, by taking different expectile weights $α$, we are able to detect heterogeneity and explore the entire conditional distribution of the response variable, which indicates the usefulness of our proposed method for analyzing high dimensional heterogeneous data.
△ Less
Submitted 18 August, 2019;
originally announced August 2019.
-
Visual Analytics of Anomalous User Behaviors: A Survey
Authors:
Yang Shi,
Yuyin Liu,
Hanghang Tong,
**grui He,
Gang Yan,
Nan Cao
Abstract:
The increasing accessibility of data provides substantial opportunities for understanding user behaviors. Unearthing anomalies in user behaviors is of particular importance as it helps signal harmful incidents such as network intrusions, terrorist activities, and financial frauds. Many visual analytics methods have been proposed to help understand user behavior-related data in various application…
▽ More
The increasing accessibility of data provides substantial opportunities for understanding user behaviors. Unearthing anomalies in user behaviors is of particular importance as it helps signal harmful incidents such as network intrusions, terrorist activities, and financial frauds. Many visual analytics methods have been proposed to help understand user behavior-related data in various application domains. In this work, we survey the state of art in visual analytics of anomalous user behaviors and classify them into four categories including social interaction, travel, network communication, and transaction. We further examine the research works in each category in terms of data types, anomaly detection techniques, and visualization techniques, and interaction methods. Finally, we discuss the findings and potential research directions.
△ Less
Submitted 21 May, 2019; v1 submitted 13 May, 2019;
originally announced May 2019.
-
AxTrain: Hardware-Oriented Neural Network Training for Approximate Inference
Authors:
Xin He,
Liu Ke,
Wenyan Lu,
Guihai Yan,
Xuan Zhang
Abstract:
The intrinsic error tolerance of neural network (NN) makes approximate computing a promising technique to improve the energy efficiency of NN inference. Conventional approximate computing focuses on balancing the efficiency-accuracy trade-off for existing pre-trained networks, which can lead to suboptimal solutions. In this paper, we propose AxTrain, a hardware-oriented training framework to facil…
▽ More
The intrinsic error tolerance of neural network (NN) makes approximate computing a promising technique to improve the energy efficiency of NN inference. Conventional approximate computing focuses on balancing the efficiency-accuracy trade-off for existing pre-trained networks, which can lead to suboptimal solutions. In this paper, we propose AxTrain, a hardware-oriented training framework to facilitate approximate computing for NN inference. Specifically, AxTrain leverages the synergy between two orthogonal methods---one actively searches for a network parameters distribution with high error tolerance, and the other passively learns resilient weights by numerically incorporating the noise distributions of the approximate hardware in the forward pass during the training phase. Experimental results from various datasets with near-threshold computing and approximation multiplication strategies demonstrate AxTrain's ability to obtain resilient neural network parameters and system energy efficiency improvement.
△ Less
Submitted 21 May, 2018;
originally announced May 2018.