Search | arXiv e-print repository

Evaluating the Robustness of Test Selection Methods for Deep Neural Networks

Authors: Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Wei Ma, Mike Papadakis, Yves Le Traon

Abstract: Testing deep learning-based systems is crucial but challenging due to the required time and labor for labeling collected raw data. To alleviate the labeling effort, multiple test selection methods have been proposed where only a subset of test data needs to be labeled while satisfying testing requirements. However, we observe that such methods with reported promising results are only evaluated und… ▽ More Testing deep learning-based systems is crucial but challenging due to the required time and labor for labeling collected raw data. To alleviate the labeling effort, multiple test selection methods have been proposed where only a subset of test data needs to be labeled while satisfying testing requirements. However, we observe that such methods with reported promising results are only evaluated under simple scenarios, e.g., testing on original test data. This brings a question to us: are they always reliable? In this paper, we explore when and to what extent test selection methods fail for testing. Specifically, first, we identify potential pitfalls of 11 selection methods from top-tier venues based on their construction. Second, we conduct a study on five datasets with two model architectures per dataset to empirically confirm the existence of these pitfalls. Furthermore, we demonstrate how pitfalls can break the reliability of these methods. Concretely, methods for fault detection suffer from test data that are: 1) correctly classified but uncertain, or 2) misclassified but confident. Remarkably, the test relative coverage achieved by such methods drops by up to 86.85%. On the other hand, methods for performance estimation are sensitive to the choice of intermediate-layer output. The effectiveness of such methods can be even worse than random selection when using an inappropriate layer. △ Less

Submitted 29 July, 2023; originally announced August 2023.

Comments: 12 pages

arXiv:2304.02688 [pdf, other]

Going Further: Flatness at the Rescue of Early Stop** for Adversarial Example Transferability

Authors: Martin Gubri, Maxime Cordy, Yves Le Traon

Abstract: Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that early stop** the training of the surrogate model substantially increases transferability. A common hypothesis to explain this is that deep neural networks (DNNs) first learn robust features, which are more generic, thus… ▽ More Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that early stop** the training of the surrogate model substantially increases transferability. A common hypothesis to explain this is that deep neural networks (DNNs) first learn robust features, which are more generic, thus a better surrogate. Then, at later epochs, DNNs learn non-robust features, which are more brittle, hence worst surrogate. First, we tend to refute this hypothesis, using transferability as a proxy for representation similarity. We then establish links between transferability and the exploration of the loss landscape in parameter space, focusing on sharpness, which is affected by early stop**. This leads us to evaluate surrogate models trained with seven minimizers that minimize both loss value and loss sharpness. Among them, SAM consistently outperforms early stop** by up to 28.8 percentage points. We discover that the strong SAM regularization from large flat neighborhoods tightly links to transferability. Finally, the best sharpness-aware minimizers prove competitive with other training methods and complement existing transferability techniques. △ Less

Submitted 20 February, 2024; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: Version 2: originally submitted in April 2023 and revised in February 2024

arXiv:2207.13129 [pdf, other]

LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity

Authors: Martin Gubri, Maxime Cordy, Mike Papadakis, Yves Le Traon, Koushik Sen

Abstract: We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects multiple weight sets from a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, models that belon… ▽ More We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects multiple weight sets from a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, models that belong to a wider weight optimum are better surrogates. Second, we identify a subspace able to generate an effective surrogate ensemble among this wider optimum. Through extensive experiments, we show that LGV alone outperforms all (combinations of) four established test-time transformations by 1.8 to 59.9 percentage points. Our findings shed new light on the importance of the geometry of the weight space to explain the transferability of adversarial examples. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: Accepted at ECCV 2022

arXiv:2111.15379 [pdf]

Text classification problems via BERT embedding method and graph convolutional neural network

Authors: Loc Hoang Tran, Tuan Tran, An Mai

Abstract: This paper presents the novel way combining the BERT embedding method and the graph convolutional neural network. This combination is employed to solve the text classification problem. Initially, we apply the BERT embedding method to the texts (in the BBC news dataset and the IMDB movie reviews dataset) in order to transform all the texts to numerical vector. Then, the graph convolutional neural n… ▽ More This paper presents the novel way combining the BERT embedding method and the graph convolutional neural network. This combination is employed to solve the text classification problem. Initially, we apply the BERT embedding method to the texts (in the BBC news dataset and the IMDB movie reviews dataset) in order to transform all the texts to numerical vector. Then, the graph convolutional neural network will be applied to these numerical vectors to classify these texts into their ap-propriate classes/labels. Experiments show that the performance of the graph convolutional neural network model is better than the perfor-mances of the combination of the BERT embedding method with clas-sical machine learning models. △ Less

Submitted 3 September, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

arXiv:2102.01934 [pdf]

Noise-robust classification with hypergraph neural network

Authors: Nguyen Trinh Vu Dang, Loc Tran, Linh Tran

Abstract: This paper presents a novel version of the hypergraph neural network method. This method is utilized to solve the noisy label learning problem. First, we apply the PCA dimensional reduction technique to the feature matrices of the image datasets in order to reduce the "noise" and the redundant features in the feature matrices of the image datasets and to reduce the runtime constructing the hypergr… ▽ More This paper presents a novel version of the hypergraph neural network method. This method is utilized to solve the noisy label learning problem. First, we apply the PCA dimensional reduction technique to the feature matrices of the image datasets in order to reduce the "noise" and the redundant features in the feature matrices of the image datasets and to reduce the runtime constructing the hypergraph of the hypergraph neural network method. Then, the classic graph-based semi-supervised learning method, the classic hypergraph based semi-supervised learning method, the graph neural network, the hypergraph neural network, and our proposed hypergraph neural network are employed to solve the noisy label learning problem. The accuracies of these five methods are evaluated and compared. Experimental results show that the hypergraph neural network methods achieve the best performance when the noise level increases. Moreover, the hypergraph neural network methods are at least as good as the graph neural network. △ Less

Submitted 3 September, 2022; v1 submitted 3 February, 2021; originally announced February 2021.

MSC Class: 68Txx

arXiv:2011.05074 [pdf, other]

Efficient and Transferable Adversarial Examples from Bayesian Neural Networks

Authors: Martin Gubri, Maxime Cordy, Mike Papadakis, Yves Le Traon, Koushik Sen

Abstract: An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is fundamentally related to uncertainty. Based on a state-of-the-art Bayesian Deep Learning technique, we propose a new method to efficiently build a surrogate by sampling approximately from the poste… ▽ More An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is fundamentally related to uncertainty. Based on a state-of-the-art Bayesian Deep Learning technique, we propose a new method to efficiently build a surrogate by sampling approximately from the posterior distribution of neural network weights, which represents the belief about the value of each parameter. Our extensive experiments on ImageNet, CIFAR-10 and MNIST show that our approach improves the success rates of four state-of-the-art attacks significantly (up to 83.2 percentage points), in both intra-architecture and inter-architecture transferability. On ImageNet, our approach can reach 94% of success rate while reducing training computations from 11.6 to 2.4 exaflops, compared to an ensemble of independently trained DNNs. Our vanilla surrogate achieves 87.5% of the time higher transferability than three test-time techniques designed for this purpose. Our work demonstrates that the way to train a surrogate has been overlooked, although it is an important element of transfer-based attacks. We are, therefore, the first to review the effectiveness of several training methods in increasing transferability. We provide new directions to better understand the transferability phenomenon and offer a simple but strong baseline for future work. △ Less

Submitted 18 June, 2022; v1 submitted 10 November, 2020; originally announced November 2020.

Comments: Accepted at UAI 2022

arXiv:2008.03626 [pdf]

Directed hypergraph neural network

Authors: Loc Hoang Tran, Linh Hoang Tran

Abstract: To deal with irregular data structure, graph convolution neural networks have been developed by a lot of data scientists. However, data scientists just have concentrated primarily on develo** deep neural network method for un-directed graph. In this paper, we will present the novel neural network method for directed hypergraph. In the other words, we will develop not only the novel directed hype… ▽ More To deal with irregular data structure, graph convolution neural networks have been developed by a lot of data scientists. However, data scientists just have concentrated primarily on develo** deep neural network method for un-directed graph. In this paper, we will present the novel neural network method for directed hypergraph. In the other words, we will develop not only the novel directed hypergraph neural network method but also the novel directed hypergraph based semi-supervised learning method. These methods are employed to solve the node classification task. The two datasets that are used in the experiments are the cora and the citeseer datasets. Among the classic directed graph based semi-supervised learning method, the novel directed hypergraph based semi-supervised learning method, the novel directed hypergraph neural network method that are utilized to solve this node classification task, we recognize that the novel directed hypergraph neural network achieves the highest accuracies. △ Less

Submitted 3 September, 2022; v1 submitted 8 August, 2020; originally announced August 2020.

arXiv:2002.02655 [pdf, other]

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

Authors: Jakub Swiatkowski, Kevin Roth, Bastiaan S. Veeling, Linh Tran, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

Abstract: Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work develo** this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational d… ▽ More Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work develo** this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational distribution to a more compact parameterization. For a variety of deep Bayesian neural networks trained using Gaussian mean-field variational inference, we find that the posterior standard deviations consistently exhibit strong low-rank structure after convergence. This means that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance. Furthermore, we find that such factorized parameterizations improve the signal-to-noise ratio of stochastic gradient estimates of the variational lower bound, resulting in faster convergence. △ Less

Submitted 5 July, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

arXiv:2002.02405 [pdf, other]

How Good is the Bayes Posterior in Deep Neural Networks Really?

Authors: Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neura… ▽ More During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors. △ Less

Submitted 2 July, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: Full version (main paper and appendix) of the ICML 2020 publication

arXiv:2001.04694 [pdf, other]

Hydra: Preserving Ensemble Diversity for Model Distillation

Authors: Linh Tran, Bastiaan S. Veeling, Kevin Roth, Jakub Swiatkowski, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Sebastian Nowozin, Rodolphe Jenatton

Abstract: Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing d… ▽ More Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing distillation formulations summarize the ensemble by capturing its average predictions. As a result, the diversity of the ensemble predictions, stemming from each member, is lost. Thus, the distilled model cannot provide a measure of uncertainty comparable to that of the original ensemble. To retain more faithfully the diversity of the ensemble, we propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra. The shared body network learns a joint feature representation that enables each head to capture the predictive behavior of each ensemble member. We demonstrate that with a slight increase in parameter count, Hydra improves distillation performance on classification and regression settings while capturing the uncertainty behavior of the original ensemble over both in-domain and out-of-distribution tasks. △ Less

Submitted 19 March, 2021; v1 submitted 14 January, 2020; originally announced January 2020.

Comments: Accepted to ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning

arXiv:1909.08964 [pdf]

To Detect Irregular Trade Behaviors In Stock Market By Using Graph Based Ranking Methods

Authors: Loc Tran, Linh Tran

Abstract: To detect the irregular trade behaviors in the stock market is the important problem in machine learning field. These irregular trade behaviors are obviously illegal. To detect these irregular trade behaviors in the stock market, data scientists normally employ the supervised learning techniques. In this paper, we employ the three graph Laplacian based semi-supervised ranking methods to solve the… ▽ More To detect the irregular trade behaviors in the stock market is the important problem in machine learning field. These irregular trade behaviors are obviously illegal. To detect these irregular trade behaviors in the stock market, data scientists normally employ the supervised learning techniques. In this paper, we employ the three graph Laplacian based semi-supervised ranking methods to solve the irregular trade behavior detection problem. Experimental results show that that the un-normalized and symmetric normalized graph Laplacian based semi-supervised ranking methods outperform the random walk Laplacian based semi-supervised ranking method. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: 11 pages

MSC Class: 68T10

arXiv:1909.01132 [pdf]

PageRank algorithm for Directed Hypergraph

Authors: Loc Tran, Tho Quan, An Mai

Abstract: During the last two decades, we easilly see that the World Wide Web's link structure is modeled as the directed graph. In this paper, we will model the World Wide Web's link structure as the directed hypergraph. Moreover, we will develop the PageRank algorithm for this directed hypergraph. Due to the lack of the World Wide Web directed hypergraph datasets, we will apply the PageRank algorithm to t… ▽ More During the last two decades, we easilly see that the World Wide Web's link structure is modeled as the directed graph. In this paper, we will model the World Wide Web's link structure as the directed hypergraph. Moreover, we will develop the PageRank algorithm for this directed hypergraph. Due to the lack of the World Wide Web directed hypergraph datasets, we will apply the PageRank algorithm to the metabolic network which is the directed hypergraph itself. The experiments show that our novel PageRank algorithm is successfully applied to this metabolic network. △ Less

Submitted 6 September, 2022; v1 submitted 29 August, 2019; originally announced September 2019.

MSC Class: 68T10

arXiv:1908.11708 [pdf]

Solve fraud detection problem by using graph based learning methods

Authors: Loc Tran, Tuan Tran, Linh Tran, An Mai

Abstract: The credit cards' fraud transactions detection is the important problem in machine learning field. To detect the credit cards's fraud transactions help reduce the significant loss of the credit cards' holders and the banks. To detect the credit cards' fraud transactions, data scientists normally employ the unsupervised learning techniques and supervised learning techniques. In this paper, we emplo… ▽ More The credit cards' fraud transactions detection is the important problem in machine learning field. To detect the credit cards's fraud transactions help reduce the significant loss of the credit cards' holders and the banks. To detect the credit cards' fraud transactions, data scientists normally employ the unsupervised learning techniques and supervised learning techniques. In this paper, we employ the graph p-Laplacian based semi-supervised learning methods combined with the undersampling techniques such as Cluster Centroids to solve the credit cards' fraud transactions detection problem. Experimental results show that the graph p-Laplacian semi-supervised learning methods outperform the current state of the art graph Laplacian based semi-supervised learning method (p=2). △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: 9 pages. arXiv admin note: substantial text overlap with arXiv:1811.02986

MSC Class: 68T10

arXiv:1905.08037 [pdf, ps, other]

On Estimating Maximum Sum Rate of MIMO Systems with Successive Zero-Forcing Dirty Paper Coding and Per-antenna Power Constraint

Authors: Thuy M. Pham, Ronan Farrell, Le-Nam Tran

Abstract: In this paper, we study the sum rate maximization for successive zero-forcing dirty-paper coding (SZFDPC) with per-antenna power constraint (PAPC). Although SZFDPC is a low-complexity alternative to the optimal dirty paper coding (DPC), efficient algorithms to compute its sum rate are still open problems especially under practical PAPC. The existing solution to the considered problem is computatio… ▽ More In this paper, we study the sum rate maximization for successive zero-forcing dirty-paper coding (SZFDPC) with per-antenna power constraint (PAPC). Although SZFDPC is a low-complexity alternative to the optimal dirty paper coding (DPC), efficient algorithms to compute its sum rate are still open problems especially under practical PAPC. The existing solution to the considered problem is computationally inefficient due to employing high-complexity interior-point method. In this study, we propose two new low-complexity approaches to this important problem. More specifically, the first algorithm achieves the optimal solution by transforming the original problem in the broadcast channel into an equivalent problem in the multiple access channel, then the resulting problem is solved by alternating optimization together with successive convex approximation. We also derive a suboptimal solution based on machine learning to which simple linear regressions are applicable. The approaches are analyzed and validated extensively to demonstrate their superiors over the existing approach. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: 5 pages, 4 figures

arXiv:1904.13195 [pdf, other]

Test Selection for Deep Learning Systems

Authors: Wei Ma, Mike Papadakis, Anestis Tsakmalis, Maxime Cordy, Yves Le Traon

Abstract: Testing of deep learning models is challenging due to the excessive number and complexity of computations involved. As a result, test data selection is performed manually and in an ad hoc way. This raises the question of how we can automatically select candidate test data to test deep learning models. Recent research has focused on adapting test selection metrics from code-based software testing (… ▽ More Testing of deep learning models is challenging due to the excessive number and complexity of computations involved. As a result, test data selection is performed manually and in an ad hoc way. This raises the question of how we can automatically select candidate test data to test deep learning models. Recent research has focused on adapting test selection metrics from code-based software testing (such as coverage) to deep learning. However, deep learning models have different attributes from code such as spread of computations across the entire network reflecting training data properties, balance of neuron weights and redundancy (use of many more neurons than needed). Such differences make code-based metrics inappropriate to select data that can challenge the models (can trigger misclassification). We thus propose a set of test selection metrics based on the notion of model uncertainty (model confidence on specific inputs). Intuitively, the more uncertain we are about a candidate sample, the more likely it is that this sample triggers a misclassification. Similarly, the samples for which we are the most uncertain, are the most informative and should be used to improve the model by retraining. We evaluate these metrics on two widely-used image classification problems involving real and artificial (adversarial) data. We show that uncertainty-based metrics have a strong ability to select data that are misclassified and lead to major improvement in classification accuracy during retraining: up to 80% more gain than random selection and other state-of-the-art metrics on one dataset and up to 29% on the other. △ Less

Submitted 30 April, 2019; originally announced April 2019.

arXiv:1904.08496 [pdf]

Tensor Sparse PCA and Face Recognition: A Novel Approach

Authors: Loc Hoang Tran, Linh Hoang Tran

Abstract: Face recognition is the important field in machine learning and pattern recognition research area. It has a lot of applications in military, finance, public security, to name a few. In this paper, the combination of the tensor sparse PCA with the nearest-neighbor method (and with the kernel ridge regression method) will be proposed and applied to the face dataset. Experimental results show that th… ▽ More Face recognition is the important field in machine learning and pattern recognition research area. It has a lot of applications in military, finance, public security, to name a few. In this paper, the combination of the tensor sparse PCA with the nearest-neighbor method (and with the kernel ridge regression method) will be proposed and applied to the face dataset. Experimental results show that the combination of the tensor sparse PCA with any classification system does not always reach the best accuracy performance measures. However, the accuracy of the combination of the sparse PCA method and one specific classification system is always better than the accuracy of the combination of the PCA method and one specific classification system and is always better than the accuracy of the classification system itself. △ Less

Submitted 11 August, 2020; v1 submitted 11 April, 2019; originally announced April 2019.

Comments: It has some errors in the experimental section

MSC Class: 68T10

arXiv:1811.02986 [pdf]

Un-normalized hypergraph p-Laplacian based semi-supervised learning methods

Authors: Loc Hoang Tran, Linh Hoang Tran

Abstract: Most network-based machine learning methods assume that the labels of two adjacent samples in the network are likely to be the same. However, assuming the pairwise relationship between samples is not complete. The information a group of samples that shows very similar pattern and tends to have similar labels is missed. The natural way overcoming the information loss of the above assumption is to r… ▽ More Most network-based machine learning methods assume that the labels of two adjacent samples in the network are likely to be the same. However, assuming the pairwise relationship between samples is not complete. The information a group of samples that shows very similar pattern and tends to have similar labels is missed. The natural way overcoming the information loss of the above assumption is to represent the feature dataset of samples as the hypergraph. Thus, in this paper, we will present the un-normalized hypergraph p-Laplacian semi-supervised learning methods. These methods will be applied to the zoo dataset and the tiny version of 20 newsgroups dataset. Experiment results show that the accuracy performance measures of these un-normalized hypergraph p-Laplacian based semi-supervised learning methods are significantly greater than the accuracy performance measure of the un-normalized hypergraph Laplacian based semi-supervised learning method (the current state of the art method hypergraph Laplacian based semi-supervised learning method for classification problem with p=2). △ Less

Submitted 28 April, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: 13 pages, 3 figures, 2 tables. arXiv admin note: text overlap with arXiv:1810.12743, arXiv:1212.0388

MSC Class: 68T10

arXiv:1810.12743 [pdf]

Hypergraph based semi-supervised learning algorithms applied to speech recognition problem: a novel approach

Authors: Loc Hoang Tran, Trang Hoang, Bui Hoang Nam Huynh

Abstract: Most network-based speech recognition methods are based on the assumption that the labels of two adjacent speech samples in the network are likely to be the same. However, assuming the pairwise relationship between speech samples is not complete. The information a group of speech samples that show very similar patterns and tend to have similar labels is missed. The natural way overcoming the infor… ▽ More Most network-based speech recognition methods are based on the assumption that the labels of two adjacent speech samples in the network are likely to be the same. However, assuming the pairwise relationship between speech samples is not complete. The information a group of speech samples that show very similar patterns and tend to have similar labels is missed. The natural way overcoming the information loss of the above assumption is to represent the feature data of speech samples as the hypergraph. Thus, in this paper, the three un-normalized, random walk, and symmetric normalized hypergraph Laplacian based semi-supervised learning methods applied to hypergraph constructed from the feature data of speech samples in order to predict the labels of speech samples are introduced. Experiment results show that the sensitivity performance measures of these three hypergraph Laplacian based semi-supervised learning methods are greater than the sensitivity performance measures of the Hidden Markov Model method (the current state of the art method applied to speech recognition problem) and graph based semi-supervised learning methods (i.e. the current state of the art network-based method for classification problems) applied to network created from the feature data of speech samples. △ Less

Submitted 28 October, 2018; originally announced October 2018.

Comments: 11 pages, 1 figure, 2 tables. arXiv admin note: substantial text overlap with arXiv:1212.0388

MSC Class: 05C85

arXiv:1810.03030 [pdf, other]

Robust variance estimation and inference for causal effect estimation

Authors: Linh Tran, Maya Petersen, Joshua Schwab, Mark J van der Laan

Abstract: We consider a longitudinal data structure consisting of baseline covariates, time-varying treatment variables, intermediate time-dependent covariates, and a possibly time dependent outcome. Previous studies have shown that estimating the variance of asymptotically linear estimators using empirical influence functions in this setting result in anti-conservative estimates with increasing magnitudes… ▽ More We consider a longitudinal data structure consisting of baseline covariates, time-varying treatment variables, intermediate time-dependent covariates, and a possibly time dependent outcome. Previous studies have shown that estimating the variance of asymptotically linear estimators using empirical influence functions in this setting result in anti-conservative estimates with increasing magnitudes of positivity violations, leading to poor coverage and uncontrolled Type I errors. In this paper, we present two alternative approaches of estimating the variance of these estimators: (i) a robust approach which directly targets the variance of the influence function as a counterfactual mean outcome, and (ii) a non-parametric bootstrap based approach that is theoretically valid and lowers the computational cost, thereby increasing the feasibility in non-parametric settings using complex machine learning algorithms. The performance of these approaches are compared to that of the empirical influence function in simulations across different levels of positivity violations and treatment effect sizes. △ Less

Submitted 6 October, 2018; originally announced October 2018.

Comments: 20 pages, 8 figures

arXiv:1809.01703 [pdf, other]

HyperML: A Boosting Metric Learning Approach in Hyperbolic Space for Recommender Systems

Authors: Lucas Vinh Tran, Yi Tay, Shuai Zhang, Gao Cong, Xiaoli Li

Abstract: This paper investigates the notion of learning user and item representations in non-Euclidean space. Specifically, we study the connection between metric learning in hyperbolic space and collaborative filtering by exploring Mobius gyrovector spaces where the formalism of the spaces could be utilized to generalize the most common Euclidean vector operations. Overall, this work aims to bridge the ga… ▽ More This paper investigates the notion of learning user and item representations in non-Euclidean space. Specifically, we study the connection between metric learning in hyperbolic space and collaborative filtering by exploring Mobius gyrovector spaces where the formalism of the spaces could be utilized to generalize the most common Euclidean vector operations. Overall, this work aims to bridge the gap between Euclidean and hyperbolic geometry in recommender systems through metric learning approach. We propose HyperML (Hyperbolic Metric Learning), a conceptually simple but highly effective model for boosting the performance. Via a series of extensive experiments, we show that our proposed HyperML not only outperforms their Euclidean counterparts, but also achieves state-of-the-art performance on multiple benchmark datasets, demonstrating the effectiveness of personalized recommendation in hyperbolic geometry. △ Less

Submitted 28 November, 2019; v1 submitted 5 September, 2018; originally announced September 2018.

Comments: Accepted at WSDM 2020

arXiv:1308.5712 [pdf, other]

The Generalized Mean Information Coefficient

Authors: Alexander Luedtke, Linh Tran

Abstract: Reshef & Reshef recently published a paper in which they present a method called the Maximal Information Coefficient (MIC) that can detect all forms of statistical dependence between pairs of variables as sample size goes to infinity. While this method has been praised by some, it has also been criticized for its lack of power in finite samples. We seek to modify MIC so that it has higher power in… ▽ More Reshef & Reshef recently published a paper in which they present a method called the Maximal Information Coefficient (MIC) that can detect all forms of statistical dependence between pairs of variables as sample size goes to infinity. While this method has been praised by some, it has also been criticized for its lack of power in finite samples. We seek to modify MIC so that it has higher power in detecting associations for limited sample sizes. Here we present the Generalized Mean Information Coefficient (GMIC), a generalization of MIC which incorporates a tuning parameter that can be used to modify the complexity of the association favored by the measure. We define GMIC and prove it maintains several key asymptotic properties of MIC. Its increased power over MIC is demonstrated using a simulation of eight different functional relationships at sixty different noise levels. The results are compared to the Pearson correlation, distance correlation, and MIC. Simulation results suggest that while generally GMIC has slightly lower power than the distance correlation measure, it achieves higher power than MIC for many forms of underlying association. For some functional relationships, GMIC surpasses all other statistics calculated. Preliminary results suggest choosing a moderate value of the tuning parameter for GMIC will yield a test that is robust across underlying relationships. GMIC is a promising new method that mitigates the power issues suffered by MIC, at the possible expense of equitability. Nonetheless, distance correlation was in our simulations more powerful for many forms of underlying relationships. At a minimum, this work motivates further consideration of maximal information-based nonparametric exploration (MINE) methods as statistical tests of independence. △ Less

Submitted 26 August, 2013; originally announced August 2013.

arXiv:1212.0388 [pdf]

Hypergraph and protein function prediction with gene expression data

Authors: Loc Tran

Abstract: Most network-based protein (or gene) function prediction methods are based on the assumption that the labels of two adjacent proteins in the network are likely to be the same. However, assuming the pairwise relationship between proteins or genes is not complete, the information a group of genes that show very similar patterns of expression and tend to have similar functions (i.e. the functional mo… ▽ More Most network-based protein (or gene) function prediction methods are based on the assumption that the labels of two adjacent proteins in the network are likely to be the same. However, assuming the pairwise relationship between proteins or genes is not complete, the information a group of genes that show very similar patterns of expression and tend to have similar functions (i.e. the functional modules) is missed. The natural way overcoming the information loss of the above assumption is to represent the gene expression data as the hypergraph. Thus, in this paper, the three un-normalized, random walk, and symmetric normalized hypergraph Laplacian based semi-supervised learning methods applied to hypergraph constructed from the gene expression data in order to predict the functions of yeast proteins are introduced. Experiment results show that the average accuracy performance measures of these three hypergraph Laplacian based semi-supervised learning methods are the same. However, their average accuracy performance measures of these three methods are much greater than the average accuracy performance measures of un-normalized graph Laplacian based semi-supervised learning method (i.e. the baseline method of this paper) applied to gene co-expression network created from the gene expression data. △ Less

Submitted 3 December, 2012; originally announced December 2012.

Comments: 12 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:1211.4289

ACM Class: G.2.2

arXiv:1211.4289 [pdf]

doi 10.5121/ijbb.2013.3202

Application of three graph Laplacian based semi-supervised learning methods to protein function prediction problem

Authors: Loc Tran

Abstract: Protein function prediction is the important problem in modern biology. In this paper, the un-normalized, symmetric normalized, and random walk graph Laplacian based semi-supervised learning methods will be applied to the integrated network combined from multiple networks to predict the functions of all yeast proteins in these multiple networks. These multiple networks are network created from Pfa… ▽ More Protein function prediction is the important problem in modern biology. In this paper, the un-normalized, symmetric normalized, and random walk graph Laplacian based semi-supervised learning methods will be applied to the integrated network combined from multiple networks to predict the functions of all yeast proteins in these multiple networks. These multiple networks are network created from Pfam domain structure, co-participation in a protein complex, protein-protein interaction network, genetic interaction network, and network created from cell cycle gene expression measurements. Multiple networks are combined with fixed weights instead of using convex optimization to determine the combination weights due to high time complexity of convex optimization method. This simple combination method will not affect the accuracy performance measures of the three semi-supervised learning methods. Experiment results show that the un-normalized and symmetric normalized graph Laplacian based methods perform slightly better than random walk graph Laplacian based method for integrated network. Moreover, the accuracy performance measures of these three semi-supervised learning methods for integrated network are much better than the best accuracy performance measures of these three methods for the individual network. △ Less

Submitted 11 July, 2013; v1 submitted 18 November, 2012; originally announced November 2012.

Comments: 16 pages, 9 tables

ACM Class: H.2.8

Showing 1–23 of 23 results for author: Tran, L