Search | arXiv e-print repository

Customizing Graph Neural Networks using Path Reweighting

Authors: Jianpeng Chen, Yu**g Wang, Ming Zeng, Zongyi Xiang, Bitan Hou, Yunhai Tong, Ole J. Mengshoel, Yazhou Ren

Abstract: Graph Neural Networks (GNNs) have been extensively used for mining graph-structured data with impressive performance. However, because these traditional GNNs do not distinguish among various downstream tasks, embeddings embedded by them are not always effective. Intuitively, paths in a graph imply different semantics for different downstream tasks. Inspired by this, we design a novel GNN solution,… ▽ More Graph Neural Networks (GNNs) have been extensively used for mining graph-structured data with impressive performance. However, because these traditional GNNs do not distinguish among various downstream tasks, embeddings embedded by them are not always effective. Intuitively, paths in a graph imply different semantics for different downstream tasks. Inspired by this, we design a novel GNN solution, namely Customized Graph Neural Network with Path Reweighting (CustomGNN for short). Specifically, the proposed CustomGNN can automatically learn the high-level semantics for specific downstream tasks to highlight semantically relevant paths as well to filter out task-irrelevant noises in a graph. Furthermore, we empirically analyze the semantics learned by CustomGNN and demonstrate its ability to avoid the three inherent problems in traditional GNNs, i.e., over-smoothing, poor robustness, and overfitting. In experiments with the node classification task, CustomGNN achieves state-of-the-art accuracies on three standard graph datasets and four large graph datasets. The source code of the proposed CustomGNN is available at \url{https://github.com/cjpcool/CustomGNN}. △ Less

Submitted 12 March, 2024; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: 25 pages with 14 figures

MSC Class: 68T07; 68T30; 68R99 ACM Class: I.2.0; I.2.4

arXiv:2007.04915 [pdf, other]

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

Authors: Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

Abstract: We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act effic… ▽ More We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines. △ Less

Submitted 9 July, 2020; originally announced July 2020.

arXiv:1911.09454 [pdf, other]

Customized Graph Embedding: Tailoring Embedding Vectors to different Applications

Authors: Bitan Hou, Yu**g Wang, Ming Zeng, Shan Jiang, Ole J. Mengshoel, Yunhai Tong, **g Bai

Abstract: Graph is a natural representation of data for a variety of real-word applications, such as knowledge graph mining, social network analysis and biological network comparison. For these applications, graph embedding is crucial as it provides vector representations of the graph. One limitation of existing graph embedding methods is that their embedding optimization procedures are disconnected from th… ▽ More Graph is a natural representation of data for a variety of real-word applications, such as knowledge graph mining, social network analysis and biological network comparison. For these applications, graph embedding is crucial as it provides vector representations of the graph. One limitation of existing graph embedding methods is that their embedding optimization procedures are disconnected from the target application. In this paper, we propose a novel approach, namely Customized Graph Embedding (CGE) to tackle this problem. The CGE algorithm learns customized vector representations of graph nodes by differentiating the importance of distinct graph paths automatically for a specific application. Extensive experiments were carried out on a diverse set of node classification datasets, which demonstrate strong performances of CGE and provide deep insights into the model. △ Less

Submitted 23 January, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: The first three authors contributed equally to this paper

arXiv:1811.02188 [pdf, ps, other]

Adaptive Stress Testing: Finding Likely Failure Events with Reinforcement Learning

Authors: Ritchie Lee, Ole J. Mengshoel, Anshu Saksena, Ryan Gardner, Daniel Genin, Joshua Silbermann, Michael Owen, Mykel J. Kochenderfer

Abstract: Finding the most likely path to a set of failure states is important to the analysis of safety-critical systems that operate over a sequence of time steps, such as aircraft collision avoidance systems and autonomous cars. In many applications such as autonomous driving, failures cannot be completely eliminated due to the complex stochastic environment in which the system operates. As a result, saf… ▽ More Finding the most likely path to a set of failure states is important to the analysis of safety-critical systems that operate over a sequence of time steps, such as aircraft collision avoidance systems and autonomous cars. In many applications such as autonomous driving, failures cannot be completely eliminated due to the complex stochastic environment in which the system operates. As a result, safety validation is not only concerned about whether a failure can occur, but also discovering which failures are most likely to occur. This article presents adaptive stress testing (AST), a framework for finding the most likely path to a failure event in simulation. We consider a general black box setting for partially observable and continuous-valued systems operating in an environment with stochastic disturbances. We formulate the problem as a Markov decision process and use reinforcement learning to optimize it. The approach is simulation-based and does not require internal knowledge of the system, making it suitable for black-box testing of large systems. We present formulations for fully observable and partially observable systems. In the latter case, we present a modified Monte Carlo tree search algorithm that only requires access to the pseudorandom number generator of the simulator to overcome partial observability. We also present an extension of the framework, called differential adaptive stress testing (DAST), that can find failures that occur in one system but not in another. This type of differential analysis is useful in applications such as regression testing, where we are concerned with finding areas of relative weakness compared to a baseline. We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where a prototype aircraft collision avoidance system is stress tested to find the most likely scenarios of near mid-air collision. △ Less

Submitted 4 December, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

Comments: 36 pages, 17 figures, 5 tables

Journal ref: Journal of Artificial Intelligence Research (JAIR) 69 (2020) 1165-1201

arXiv:1810.04038 [pdf, other]

Understanding and Improving Recurrent Networks for Human Activity Recognition by Continuous Attention

Authors: Ming Zeng, Haoxiang Gao, Tong Yu, Ole J. Mengshoel, Helge Langseth, Ian Lane, Xiaobing Liu

Abstract: Deep neural networks, including recurrent networks, have been successfully applied to human activity recognition. Unfortunately, the final representation learned by recurrent networks might encode some noise (irrelevant signal components, unimportant sensor modalities, etc.). Besides, it is difficult to interpret the recurrent networks to gain insight into the models' behavior. To address these is… ▽ More Deep neural networks, including recurrent networks, have been successfully applied to human activity recognition. Unfortunately, the final representation learned by recurrent networks might encode some noise (irrelevant signal components, unimportant sensor modalities, etc.). Besides, it is difficult to interpret the recurrent networks to gain insight into the models' behavior. To address these issues, we propose two attention models for human activity recognition: temporal attention and sensor attention. These two mechanisms adaptively focus on important signals and sensor modalities. To further improve the understandability and mean F1 score, we add continuity constraints, considering that continuous sensor signals are more robust than discrete ones. We evaluate the approaches on three datasets and obtain state-of-the-art results. Furthermore, qualitative analysis shows that the attention learned by the models agree well with human intuition. △ Less

Submitted 7 October, 2018; originally announced October 2018.

Comments: 8 pages. published in The International Symposium on Wearable Computers (ISWC) 2018

Journal ref: The International Symposium on Wearable Computers (ISWC) 2018

arXiv:1802.05421 [pdf, other]

CADDeLaG: Framework for distributed anomaly detection in large dense graph sequences

Authors: Aniruddha Basak, Kamalika Das, Ole J. Mengshoel

Abstract: Random walk based distance measures for graphs such as commute-time distance are useful in a variety of graph algorithms, such as clustering, anomaly detection, and creating low dimensional embeddings. Since such measures hinge on the spectral decomposition of the graph, the computation becomes a bottleneck for large graphs and do not scale easily to graphs that cannot be loaded in memory. Most ex… ▽ More Random walk based distance measures for graphs such as commute-time distance are useful in a variety of graph algorithms, such as clustering, anomaly detection, and creating low dimensional embeddings. Since such measures hinge on the spectral decomposition of the graph, the computation becomes a bottleneck for large graphs and do not scale easily to graphs that cannot be loaded in memory. Most existing graph mining libraries for large graphs either resort to sampling or exploit the sparsity structure of such graphs for spectral analysis. However, such methods do not work for dense graphs constructed for studying pairwise relationships among entities in a data set. Examples of such studies include analyzing pairwise locations in gridded climate data for discovering long distance climate phenomena. These graphs representations are fully connected by construction and cannot be sparsified without loss of meaningful information. In this paper we describe CADDeLaG, a framework for scalable computation of commute-time distance based anomaly detection in large dense graphs without the need to load the entire graph in memory. The framework relies on Apache Spark's memory-centric cluster-computing infrastructure and consists of two building blocks: a decomposable algorithm for commute time distance computation and a distributed linear system solver. We illustrate the scalability of CADDeLaG and its dependency on various factors using both synthetic and real world data sets. We demonstrate the usefulness of CADDeLaG in identifying anomalies in a climate graph sequence, that have been historically missed due to ad hoc graph sparsification and on an election donation data set. △ Less

Submitted 15 February, 2018; originally announced February 2018.

arXiv:1801.07827 [pdf, other]

Semi-Supervised Convolutional Neural Networks for Human Activity Recognition

Authors: Ming Zeng, Tong Yu, Xiao Wang, Le T. Nguyen, Ole J. Mengshoel, Ian Lane

Abstract: Labeled data used for training activity recognition classifiers are usually limited in terms of size and diversity. Thus, the learned model may not generalize well when used in real-world use cases. Semi-supervised learning augments labeled examples with unlabeled examples, often resulting in improved performance. However, the semi-supervised methods studied in the activity recognition literatures… ▽ More Labeled data used for training activity recognition classifiers are usually limited in terms of size and diversity. Thus, the learned model may not generalize well when used in real-world use cases. Semi-supervised learning augments labeled examples with unlabeled examples, often resulting in improved performance. However, the semi-supervised methods studied in the activity recognition literatures assume that feature engineering is already done. In this paper, we lift this assumption and present two semi-supervised methods based on convolutional neural networks (CNNs) to learn discriminative hidden features. Our semi-supervised CNNs learn from both labeled and unlabeled data while also performing feature learning on raw sensor data. In experiments on three real world datasets, we show that our CNNs outperform supervised methods and traditional semi-supervised learning methods by up to 18% in mean F1-score (Fm). △ Less

Submitted 22 January, 2018; originally announced January 2018.

Comments: Accepted by BigData2017

arXiv:1711.08493 [pdf, other]

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Authors: Bing Liu, Tong Yu, Ian Lane, Ole J. Mengshoel

Abstract: Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with… ▽ More Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with a nonlinear reward function that uses distributed representation of text for online response selection. A bidirectional LSTM is used to produce the distributed representations of dialog context and responses, which serve as the input to a contextual bandit. In learning the bandit, we propose a customized Thompson sampling method that is applied to a polynomial feature space in approximating the reward. Experimental results on the Ubuntu Dialogue Corpus demonstrate significant performance gains of the proposed method over conventional linear contextual bandits. Moreover, we report encouraging response selection performance of the proposed neural bandit model using the Recall@k metric for a small set of online training samples. △ Less

Submitted 22 November, 2017; originally announced November 2017.

Comments: Accepted at AAAI 2018

arXiv:1709.07172 [pdf, ps, other]

SpectralLeader: Online Spectral Learning for Single Topic Models

Authors: Tong Yu, Branislav Kveton, Zheng Wen, Hung Bui, Ole J. Mengshoel

Abstract: We study the problem of learning a latent variable model from a stream of data. Latent variable models are popular in practice because they can explain observed data in terms of unobserved concepts. These models have been traditionally studied in the offline setting. In the online setting, on the other hand, the online EM is arguably the most popular algorithm for learning latent variable models.… ▽ More We study the problem of learning a latent variable model from a stream of data. Latent variable models are popular in practice because they can explain observed data in terms of unobserved concepts. These models have been traditionally studied in the offline setting. In the online setting, on the other hand, the online EM is arguably the most popular algorithm for learning latent variable models. Although the online EM is computationally efficient, it typically converges to a local optimum. In this work, we develop a new online learning algorithm for latent variable models, which we call SpectralLeader. SpectralLeader always converges to the global optimum, and we derive a sublinear upper bound on its $n$-step regret in the bag-of-words model. In both synthetic and real-world experiments, we show that SpectralLeader performs similarly to or better than the online EM with tuned hyper-parameters. △ Less

Submitted 25 April, 2018; v1 submitted 21 September, 2017; originally announced September 2017.

Comments: 17 pages, 2 figures

arXiv:1708.09121 [pdf, other]

Interpretable Categorization of Heterogeneous Time Series Data

Authors: Ritchie Lee, Mykel J. Kochenderfer, Ole J. Mengshoel, Joshua Silbermann

Abstract: Understanding heterogeneous multivariate time series data is important in many applications ranging from smart homes to aviation. Learning models of heterogeneous multivariate time series that are also human-interpretable is challenging and not adequately addressed by the existing literature. We propose grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs extend decision… ▽ More Understanding heterogeneous multivariate time series data is important in many applications ranging from smart homes to aviation. Learning models of heterogeneous multivariate time series that are also human-interpretable is challenging and not adequately addressed by the existing literature. We propose grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs extend decision trees with a grammar framework. Logical expressions derived from a context-free grammar are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. In particular, when a grammar based on temporal logic is used, we show that GBDTs can be used for the interpretable classi cation of high-dimensional and heterogeneous time series data. Furthermore, we show how GBDTs can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply GBDTs to analyze the classic Australian Sign Language dataset as well as data on near mid-air collisions (NMACs). The NMAC data comes from aircraft simulations used in the development of the next-generation Airborne Collision Avoidance System (ACAS X). △ Less

Submitted 26 January, 2018; v1 submitted 30 August, 2017; originally announced August 2017.

Comments: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data Mining (SDM) 2018

Showing 1–10 of 10 results for author: Mengshoel, O J