-
Spectral Algorithms on Manifolds through Diffusion
Authors:
Weichun Xia,
Lei Shi
Abstract:
The existing research on spectral algorithms, applied within a Reproducing Kernel Hilbert Space (RKHS), has primarily focused on general kernel functions, often neglecting the inherent structure of the input feature space. Our paper introduces a new perspective, asserting that input data are situated within a low-dimensional manifold embedded in a higher-dimensional Euclidean space. We study the c…
▽ More
The existing research on spectral algorithms, applied within a Reproducing Kernel Hilbert Space (RKHS), has primarily focused on general kernel functions, often neglecting the inherent structure of the input feature space. Our paper introduces a new perspective, asserting that input data are situated within a low-dimensional manifold embedded in a higher-dimensional Euclidean space. We study the convergence performance of spectral algorithms in the RKHSs, specifically those generated by the heat kernels, known as diffusion spaces. Incorporating the manifold structure of the input, we employ integral operator techniques to derive tight convergence upper bounds concerning generalized norms, which indicates that the estimators converge to the target function in strong sense, entailing the simultaneous convergence of the function itself and its derivatives. These bounds offer two significant advantages: firstly, they are exclusively contingent on the intrinsic dimension of the input manifolds, thereby providing a more focused analysis. Secondly, they enable the efficient derivation of convergence rates for derivatives of any k-th order, all of which can be accomplished within the ambit of the same spectral algorithms. Furthermore, we establish minimax lower bounds to demonstrate the asymptotic optimality of these conclusions in specific contexts. Our study confirms that the spectral algorithms are practically significant in the broader context of high-dimensional approximation.
△ Less
Submitted 7 March, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network
Authors:
Yiling Huang,
Weiran Wang,
Guanlong Zhao,
Hank Liao,
Wei Xia,
Quan Wang
Abstract:
While standard speaker diarization attempts to answer the question "who spoken when", most of relevant applications in reality are more interested in determining "who spoken what". Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associat…
▽ More
While standard speaker diarization attempts to answer the question "who spoken when", most of relevant applications in reality are more interested in determining "who spoken what". Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words. In this paper, we propose Word-level End-to-End Neural Diarization (WEEND) with auxiliary network, a multi-task learning algorithm that performs end-to-end ASR and speaker diarization in the same neural architecture. That is, while speech is being recognized, speaker labels are predicted simultaneously for each recognized word. Experimental results demonstrate that WEEND outperforms the turn-based diarization baseline system on all 2-speaker short-form scenarios and has the capability to generalize to audio lengths of 5 minutes. Although 3+speaker conversations are harder, we find that with enough in-domain training data, WEEND has the potential to deliver high quality diarized text.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Early warning indicators via latent stochastic dynamical systems
Authors:
Lingyu Feng,
Ting Gao,
Wang Xiao,
**qiao Duan
Abstract:
Detecting early warning indicators for abrupt dynamical transitions in complex systems or high-dimensional observation data is essential in many real-world applications, such as brain diseases, natural disasters, and engineering reliability. To this end, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in the low-dimensional manifol…
▽ More
Detecting early warning indicators for abrupt dynamical transitions in complex systems or high-dimensional observation data is essential in many real-world applications, such as brain diseases, natural disasters, and engineering reliability. To this end, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in the low-dimensional manifold. Then three effective warning signals (Onsager-Machlup Indicator, Sample Entropy Indicator, and Transition Probability Indicator) are derived through the latent coordinates and the latent stochastic dynamical systems. To validate our framework, we apply this methodology to authentic electroencephalogram (EEG) data. We find that our early warning indicators are capable of detecting the tip** point during state transition. This framework not only bridges the latent dynamics with real-world data but also shows the potential ability for automatic labeling on complex high-dimensional time series.
△ Less
Submitted 5 April, 2024; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Interpreting Neural Policies with Disentangled Tree Representations
Authors:
Tsun-Hsuan Wang,
Wei Xiao,
Tim Seyde,
Ramin Hasani,
Daniela Rus
Abstract:
The advancement of robots, particularly those functioning in complex human-centric environments, relies on control solutions that are driven by machine learning. Understanding how learning-based controllers make decisions is crucial since robots are often safety-critical systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability of robot learning…
▽ More
The advancement of robots, particularly those functioning in complex human-centric environments, relies on control solutions that are driven by machine learning. Understanding how learning-based controllers make decisions is crucial since robots are often safety-critical systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability of robot learning. In this paper, we aim to study interpretability of compact neural policies through the lens of disentangled representation. We leverage decision trees to obtain factors of variation [1] for disentanglement in robot learning; these encapsulate skills, behaviors, or strategies toward solving tasks. To assess how well networks uncover the underlying task dynamics, we introduce interpretability metrics that measure disentanglement of learned neural dynamics from a concentration of decisions, mutual information and modularity perspective. We showcase the effectiveness of the connection between interpretability and disentanglement consistently across extensive experimental analysis.
△ Less
Submitted 12 November, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
A Vertical Federated Learning Framework for Horizontally Partitioned Labels
Authors:
Wensheng Xia,
Ying Li,
Lan Zhang,
Zhonghai Wu,
Xiaoyong Yuan
Abstract:
Vertical federated learning is a collaborative machine learning framework to train deep leaning models on vertically partitioned data with privacy-preservation. It attracts much attention both from academia and industry. Unfortunately, applying most existing vertical federated learning methods in real-world applications still faces two daunting challenges. First, most existing vertical federated l…
▽ More
Vertical federated learning is a collaborative machine learning framework to train deep leaning models on vertically partitioned data with privacy-preservation. It attracts much attention both from academia and industry. Unfortunately, applying most existing vertical federated learning methods in real-world applications still faces two daunting challenges. First, most existing vertical federated learning methods have a strong assumption that at least one party holds the complete set of labels of all data samples, while this assumption is not satisfied in many practical scenarios, where labels are horizontally partitioned and the parties only hold partial labels. Existing vertical federated learning methods can only utilize partial labels, which may lead to inadequate model update in end-to-end backpropagation. Second, computational and communication resources vary in parties. Some parties with limited computational and communication resources will become the stragglers and slow down the convergence of training. Such straggler problem will be exaggerated in the scenarios of horizontally partitioned labels in vertical federated learning. To address these challenges, we propose a novel vertical federated learning framework named Cascade Vertical Federated Learning (CVFL) to fully utilize all horizontally partitioned labels to train neural networks with privacy-preservation. To mitigate the straggler problem, we design a novel optimization objective which can increase straggler's contribution to the trained models. We conduct a series of qualitative experiments to rigorously verify the effectiveness of CVFL. It is demonstrated that CVFL can achieve comparable performance (e.g., accuracy for classification tasks) with centralized training. The new optimization objective can further mitigate the straggler problem comparing with only using the asynchronous aggregation mechanism during training.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Deep Retrieval: Learning A Retrievable Structure for Large-Scale Recommendations
Authors:
Weihao Gao,
Xiangjun Fan,
Chong Wang,
Jiankai Sun,
Kai Jia,
Wenzhi Xiao,
Ruofan Ding,
Xingyan Bin,
Hui Yang,
Xiaobing Liu
Abstract:
One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model, and then use some approximate nearest neighbor (ANN) search algorithm to find top candidates. In this paper, we present Deep Retrieval (DR), to lear…
▽ More
One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model, and then use some approximate nearest neighbor (ANN) search algorithm to find top candidates. In this paper, we present Deep Retrieval (DR), to learn a retrievable structure directly with user-item interaction data (e.g. clicks) without resorting to the Euclidean space assumption in ANN algorithms. DR's structure encodes all candidate items into a discrete latent space. Those latent codes for the candidates are model parameters and learnt together with other neural network parameters to maximize the same objective function. With the model learnt, a beam search over the structure is performed to retrieve the top candidates for reranking. Empirically, we first demonstrate that DR, with sub-linear computational complexity, can achieve almost the same accuracy as the brute-force baseline on two public datasets. Moreover, we show that, in a live production recommendation system, a deployed DR approach significantly outperforms a well-tuned ANN baseline in terms of engagement metrics. To the best of our knowledge, DR is among the first non-ANN algorithms successfully deployed at the scale of hundreds of millions of items for industrial recommendation systems.
△ Less
Submitted 18 May, 2021; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Real-Time Regression with Dividing Local Gaussian Processes
Authors:
Armin Lederer,
Alejandro Jose Ordonez Conejo,
Korbinian Maier,
Wenxin Xiao,
Jonas Umlauft,
Sandra Hirche
Abstract:
The increased demand for online prediction and the growing availability of large data sets drives the need for computationally efficient models. While exact Gaussian process regression shows various favorable theoretical properties (uncertainty estimate, unlimited expressive power), the poor scaling with respect to the training set size prohibits its application in big data regimes in real-time. T…
▽ More
The increased demand for online prediction and the growing availability of large data sets drives the need for computationally efficient models. While exact Gaussian process regression shows various favorable theoretical properties (uncertainty estimate, unlimited expressive power), the poor scaling with respect to the training set size prohibits its application in big data regimes in real-time. Therefore, this paper proposes dividing local Gaussian processes, which are a novel, computationally efficient modeling approach based on Gaussian process regression. Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice, while providing excellent predictive distributions. A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
△ Less
Submitted 30 July, 2021; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Efficient Synthesis of Compact Deep Neural Networks
Authors:
Wenhan Xia,
Hongxu Yin,
Niraj K. Jha
Abstract:
Deep neural networks (DNNs) have been deployed in myriad machine learning applications. However, advances in their accuracy are often achieved with increasingly complex and deep network architectures. These large, deep models are often unsuitable for real-world applications, due to their massive computational cost, high memory bandwidth, and long latency. For example, autonomous driving requires f…
▽ More
Deep neural networks (DNNs) have been deployed in myriad machine learning applications. However, advances in their accuracy are often achieved with increasingly complex and deep network architectures. These large, deep models are often unsuitable for real-world applications, due to their massive computational cost, high memory bandwidth, and long latency. For example, autonomous driving requires fast inference based on Internet-of-Things (IoT) edge devices operating under run-time energy and memory storage constraints. In such cases, compact DNNs can facilitate deployment due to their reduced energy consumption, memory requirement, and inference latency. Long short-term memories (LSTMs) are a type of recurrent neural network that have also found widespread use in the context of sequential data modeling. They also face a model size vs. accuracy trade-off. In this paper, we review major approaches for automatically synthesizing compact, yet accurate, DNN/LSTM models suitable for real-world applications. We also outline some challenges and future areas of exploration.
△ Less
Submitted 18 April, 2020;
originally announced April 2020.
-
Analyses of Multi-collection Corpora via Compound Topic Modeling
Authors:
Clint P. George,
Wei Xia,
George Michailidis
Abstract:
As electronically stored data grow in daily life, obtaining novel and relevant information becomes challenging in text mining. Thus people have sought statistical methods based on term frequency, matrix algebra, or topic modeling for text mining. Popular topic models have centered on one single text collection, which is deficient for comparative text analyses. We consider a setting where one can p…
▽ More
As electronically stored data grow in daily life, obtaining novel and relevant information becomes challenging in text mining. Thus people have sought statistical methods based on term frequency, matrix algebra, or topic modeling for text mining. Popular topic models have centered on one single text collection, which is deficient for comparative text analyses. We consider a setting where one can partition the corpus into subcollections. Each subcollection shares a common set of topics, but there exists relative variation in topic proportions among collections. Including any prior knowledge about the corpus (e.g. organization structure), we propose the compound latent Dirichlet allocation (cLDA) model, improving on previous work, encouraging generalizability, and depending less on user-input parameters. To identify the parameters of interest in cLDA, we study Markov chain Monte Carlo (MCMC) and variational inference approaches extensively, and suggest an efficient MCMC method. We evaluate cLDA qualitatively and quantitatively using both synthetic and real-world corpora. The usability study on some real-world corpora illustrates the superiority of cLDA to explore the underlying topics automatically but also model their connections and variations across multiple collections.
△ Less
Submitted 17 June, 2019;
originally announced July 2019.
-
Biologically-plausible learning algorithms can scale to large datasets
Authors:
Will Xiao,
Honglin Chen,
Qianli Liao,
Tomaso Poggio
Abstract:
The backpropagation (BP) algorithm is often thought to be biologically implausible in the brain. One of the main reasons is that BP requires symmetric weight matrices in the feedforward and feedback pathways. To address this "weight transport problem" (Grossberg, 1987), two more biologically plausible algorithms, proposed by Liao et al. (2016) and Lillicrap et al. (2016), relax BP's weight symmetr…
▽ More
The backpropagation (BP) algorithm is often thought to be biologically implausible in the brain. One of the main reasons is that BP requires symmetric weight matrices in the feedforward and feedback pathways. To address this "weight transport problem" (Grossberg, 1987), two more biologically plausible algorithms, proposed by Liao et al. (2016) and Lillicrap et al. (2016), relax BP's weight symmetry requirements and demonstrate comparable learning capabilities to that of BP on small datasets. However, a recent study by Bartunov et al. (2018) evaluate variants of target-propagation (TP) and feedback alignment (FA) on MINIST, CIFAR, and ImageNet datasets, and find that although many of the proposed algorithms perform well on MNIST and CIFAR, they perform significantly worse than BP on ImageNet. Here, we additionally evaluate the sign-symmetry algorithm (Liao et al., 2016), which differs from both BP and FA in that the feedback and feedforward weights share signs but not magnitudes. We examine the performance of sign-symmetry and feedback alignment on ImageNet and MS COCO datasets using different network architectures (ResNet-18 and AlexNet for ImageNet, RetinaNet for MS COCO). Surprisingly, networks trained with sign-symmetry can attain classification performance approaching that of BP-trained networks. These results complement the study by Bartunov et al. (2018), and establish a new benchmark for future biologically plausible learning algorithms on more difficult datasets and more complex architectures.
△ Less
Submitted 20 December, 2018; v1 submitted 8 November, 2018;
originally announced November 2018.
-
An Online Algorithm for Nonparametric Correlations
Authors:
Wei Xiao
Abstract:
Nonparametric correlations such as Spearman's rank correlation and Kendall's tau correlation are widely applied in scientific and engineering fields. This paper investigates the problem of computing nonparametric correlations on the fly for streaming data. Standard batch algorithms are generally too slow to handle real-world big data applications. They also require too much memory because all the…
▽ More
Nonparametric correlations such as Spearman's rank correlation and Kendall's tau correlation are widely applied in scientific and engineering fields. This paper investigates the problem of computing nonparametric correlations on the fly for streaming data. Standard batch algorithms are generally too slow to handle real-world big data applications. They also require too much memory because all the data need to be stored in the memory before processing. This paper proposes a novel online algorithm for computing nonparametric correlations. The algorithm has O(1) time complexity and O(1) memory cost and is quite suitable for edge devices, where only limited memory and processing power are available. You can seek a balance between speed and accuracy by changing the number of cutpoints specified in the algorithm. The online algorithm can compute the nonparametric correlations 10 to 1,000 times faster than the corresponding batch algorithm, and it can compute them based either on all past observations or on fixed-size sliding windows.
△ Less
Submitted 5 December, 2017;
originally announced December 2017.
-
Online Robust Principal Component Analysis with Change Point Detection
Authors:
Wei Xiao,
Xiaolin Huang,
Jorge Silva,
Saba Emrani,
Arin Chaudhuri
Abstract:
Robust PCA methods are typically batch algorithms which requires loading all observations into memory before processing. This makes them inefficient to process big data. In this paper, we develop an efficient online robust principal component methods, namely online moving window robust principal component analysis (OMWRPCA). Unlike existing algorithms, OMWRPCA can successfully track not only slowl…
▽ More
Robust PCA methods are typically batch algorithms which requires loading all observations into memory before processing. This makes them inefficient to process big data. In this paper, we develop an efficient online robust principal component methods, namely online moving window robust principal component analysis (OMWRPCA). Unlike existing algorithms, OMWRPCA can successfully track not only slowly changing subspace but also abruptly changed subspace. By embedding hypothesis testing into the algorithm, OMWRPCA can detect change points of the underlying subspaces. Extensive simulation studies demonstrate the superior performance of OMWRPCA compared with other state-of-art approaches. We also apply the algorithm for real-time background subtraction of surveillance video.
△ Less
Submitted 20 March, 2017; v1 submitted 18 February, 2017;
originally announced February 2017.
-
Sampling Method for Fast Training of Support Vector Data Description
Authors:
Arin Chaudhuri,
Deovrat Kakde,
Maria Jahja,
Wei Xiao,
Hansi Jiang,
Seunghyun Kong,
Sergiy Peredriy
Abstract:
Support Vector Data Description (SVDD) is a popular outlier detection technique which constructs a flexible description of the input data. SVDD computation time is high for large training datasets which limits its use in big-data process-monitoring applications. We propose a new iterative sampling-based method for SVDD training. The method incrementally learns the training data description at each…
▽ More
Support Vector Data Description (SVDD) is a popular outlier detection technique which constructs a flexible description of the input data. SVDD computation time is high for large training datasets which limits its use in big-data process-monitoring applications. We propose a new iterative sampling-based method for SVDD training. The method incrementally learns the training data description at each iteration by computing SVDD on an independent random sample selected with replacement from the training data set. The experimental results indicate that the proposed method is extremely fast and provides a good data description .
△ Less
Submitted 25 September, 2016; v1 submitted 16 June, 2016;
originally announced June 2016.
-
Robust regression for optimal individualized treatment rules
Authors:
Wei Xiao,
Hao Helen Zhang,
Wenbin Lu
Abstract:
Because different patients may response quite differently to the same drug or treatment, there is increasing interest in discovering individualized treatment rule. In particular, people are eager to find the optimal individualized treatment rules, which if followed by the whole patient population would lead to the "best" outcome. In this paper, we propose new estimators based on robust regression…
▽ More
Because different patients may response quite differently to the same drug or treatment, there is increasing interest in discovering individualized treatment rule. In particular, people are eager to find the optimal individualized treatment rules, which if followed by the whole patient population would lead to the "best" outcome. In this paper, we propose new estimators based on robust regression with general loss functions to estimate the optimal individualized treatment rules. The new estimators possess the following nice properties: first, they are robust against skewed, heterogeneous, heavy-tailed errors or outliers; second, they are robust against misspecification of the baseline function; third, under certain situations, the new estimator coupled with pinball loss approximately maximizes the outcome's conditional quantile instead of conditional mean, which leads to a different optimal individualized treatment rule comparing with traditional Q- and A-learning. Consistency and asymptotic normality of the proposed estimators are established. Their empirical performance is demonstrated via extensive simulation studies and an analysis of an AIDS data.
△ Less
Submitted 13 April, 2016;
originally announced April 2016.
-
A Probabilistic Machine Learning Approach to Detect Industrial Plant Faults
Authors:
Wei Xiao
Abstract:
Fault detection in industrial plants is a hot research area as more and more sensor data are being collected throughout the industrial process. Automatic data-driven approaches are widely needed and seen as a promising area of investment. This paper proposes an effective machine learning algorithm to predict industrial plant faults based on classification methods such as penalized logistic regress…
▽ More
Fault detection in industrial plants is a hot research area as more and more sensor data are being collected throughout the industrial process. Automatic data-driven approaches are widely needed and seen as a promising area of investment. This paper proposes an effective machine learning algorithm to predict industrial plant faults based on classification methods such as penalized logistic regression, random forest and gradient boosted tree. A fault's start time and end time are predicted sequentially in two steps by formulating the original prediction problems as classification problems. The algorithms described in this paper won first place in the Prognostics and Health Management Society 2015 Data Challenge.
△ Less
Submitted 18 March, 2016;
originally announced March 2016.
-
MiRank: A bioinformatics tool for gene/miRNA ranking and pathway profiling with TCGA-KEGG data sets
Authors:
Siddharth G. Reddy,
Weimin Xiao,
Preethi H. Gunaratne
Abstract:
The Cancer Genome Atlas (TCGA) provides researchers with clinicopathological data and genomic characterizations of various carcinomas. These data sets include expression microarrays for genes and microRNAs -- short, non-coding strands of RNA that downregulate gene expression through RNA interference -- as well as days_to_death and days_to_last_followup fields for each tumor sample. Our aim is to d…
▽ More
The Cancer Genome Atlas (TCGA) provides researchers with clinicopathological data and genomic characterizations of various carcinomas. These data sets include expression microarrays for genes and microRNAs -- short, non-coding strands of RNA that downregulate gene expression through RNA interference -- as well as days_to_death and days_to_last_followup fields for each tumor sample. Our aim is to develop a software tool that screens TCGA data sets for genes/miRNAs with functional involvement in specific cancers. Furthermore, our computational pipeline is intended to produce a set of visualizations, or profiles, that place our screened outputs in a pathway-centric context.
We accomplish our 'screening' by ranking genes/miRNAs by the correlation of their expression misregulation with differential patient survival. In other words, if a gene/miRNA is consistently misregulated in patients with poor survival rates and, on the other hand, is expressed more 'normally' in patients with longer survival rates, then it is ranked highly; if its misregulation has no such correlation with good/bad survival in patients, then its rank is low. Our pathway profiling pipeline produces several outputs, which allow us to examine the functional roles played by highly ranked genes discovered by our screening.
Running the OV (ovarian serous cystadenocarcinoma) data set through our analysis pipeline, we find that several highly ranked pathways and functional groups of genes (VEGF, Jun, Fos, etc.) have already been shown to play some part in the development of epithelial ovarian carcinomas. We also observe that the dysfunction of the Wnt signaling pathway, which regulates cell-fate specification and progenitor cell differentiation, has a disproportionate impact on the survival of ovarian cancer patients.
△ Less
Submitted 4 July, 2013; v1 submitted 1 October, 2012;
originally announced October 2012.