Search | arXiv e-print repository

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

Authors: Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt

Abstract: Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges. We investigate existing long-range techniques and benchmarks and find that they're not very suitable in this problem area. In this paper, we introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Red… ▽ More Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges. We investigate existing long-range techniques and benchmarks and find that they're not very suitable in this problem area. In this paper, we introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR) to encode and decode features from sequence elements. Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design. HGConv kernels are defined as simple parameters learned through backpropagation. The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks. With log-linear complexity in sequence length, the empirical results demonstrate substantially faster run-time by HGConv compared to other methods achieving far more efficient scaling even with sequence length $\geq 100,000$. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: To appear in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, Spain

arXiv:2403.08208 [pdf, other]

Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks

Authors: Khondoker Murad Hossain, Tim Oates

Abstract: In the rapidly evolving landscape of communication and network security, the increasing reliance on deep neural networks (DNNs) and cloud services for data processing presents a significant vulnerability: the potential for backdoors that can be exploited by malicious actors. Our approach leverages advanced tensor decomposition algorithms Independent Vector Analysis (IVA), Multiset Canonical Correl… ▽ More In the rapidly evolving landscape of communication and network security, the increasing reliance on deep neural networks (DNNs) and cloud services for data processing presents a significant vulnerability: the potential for backdoors that can be exploited by malicious actors. Our approach leverages advanced tensor decomposition algorithms Independent Vector Analysis (IVA), Multiset Canonical Correlation Analysis (MCCA), and Parallel Factor Analysis (PARAFAC2) to meticulously analyze the weights of pre-trained DNNs and distinguish between backdoored and clean models effectively. The key strengths of our method lie in its domain independence, adaptability to various network architectures, and ability to operate without access to the training data of the scrutinized models. This not only ensures versatility across different application scenarios but also addresses the challenge of identifying backdoors without prior knowledge of the specific triggers employed to alter network behavior. We have applied our detection pipeline to three distinct computer vision datasets, encompassing both image classification and object detection tasks. The results demonstrate a marked improvement in both accuracy and efficiency over existing backdoor detection methods. This advancement enhances the security of deep learning and AI in networked systems, providing essential cybersecurity against evolving threats in emerging technologies. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 6 pages, Accepted at the International Conference on Communications 2024. arXiv admin note: text overlap with arXiv:2212.08121

arXiv:2401.05432 [pdf, other]

TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep Neural Networks

Authors: Khondoker Murad Hossain, Tim Oates

Abstract: As deep neural networks and the datasets used to train them get larger, the default approach to integrating them into research and commercial projects is to download a pre-trained model and fine tune it. But these models can have uncertain provenance, opening up the possibility that they embed hidden malicious behavior such as trojans or backdoors, where small changes to an input (triggers) can ca… ▽ More As deep neural networks and the datasets used to train them get larger, the default approach to integrating them into research and commercial projects is to download a pre-trained model and fine tune it. But these models can have uncertain provenance, opening up the possibility that they embed hidden malicious behavior such as trojans or backdoors, where small changes to an input (triggers) can cause the model to produce incorrect outputs (e.g., to misclassify). This paper introduces a novel approach to backdoor detection that uses two tensor decomposition methods applied to network activations. This has a number of advantages relative to existing detection methods, including the ability to analyze multiple models at the same time, working across a wide variety of network architectures, making no assumptions about the nature of triggers used to alter network behavior, and being computationally efficient. We provide a detailed description of the detection pipeline along with results on models trained on the MNIST digit dataset, CIFAR-10 dataset, and two difficult datasets from NIST's TrojAI competition. These results show that our method detects backdoored networks more accurately and efficiently than current state-of-the-art methods. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2312.15310 [pdf, other]

Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations

Authors: Mohammad Mahmudul Alam, Edward Raff, Tim Oates

Abstract: While deep learning has enjoyed significant success in computer vision tasks over the past decade, many shortcomings still exist from a Cognitive Science (CogSci) perspective. In particular, the ability to subitize, i.e., quickly and accurately identify the small (less than 6) count of items, is not well learned by current Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) when usi… ▽ More While deep learning has enjoyed significant success in computer vision tasks over the past decade, many shortcomings still exist from a Cognitive Science (CogSci) perspective. In particular, the ability to subitize, i.e., quickly and accurately identify the small (less than 6) count of items, is not well learned by current Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) when using a standard cross-entropy (CE) loss. In this paper, we demonstrate that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs by develo** an alternative loss function using Holographic Reduced Representations (HRRs). We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs, and so we focus on specially crafted problems that isolate generalization to specific aspects of subitizing. Via saliency maps and out-of-distribution performance, we are able to empirically observe that the proposed HRR loss improves subitizing generalization though it does not completely solve the problem. In addition, we find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: Accepted in 38th Annual AAAI Workshop on Neuro-Symbolic Learning and Reasoning in the Era of Large Language Models (NuCLeaR), 2024

arXiv:2312.01242 [pdf, other]

DDxT: Deep Generative Transformer Models for Differential Diagnosis

Authors: Mohammad Mahmudul Alam, Edward Raff, Tim Oates, Cynthia Matuszek

Abstract: Differential Diagnosis (DDx) is the process of identifying the most likely medical condition among the possible pathologies through the process of elimination based on evidence. An automated process that narrows a large set of pathologies down to the most likely pathologies will be of great importance. The primary prior works have relied on the Reinforcement Learning (RL) paradigm under the intuit… ▽ More Differential Diagnosis (DDx) is the process of identifying the most likely medical condition among the possible pathologies through the process of elimination based on evidence. An automated process that narrows a large set of pathologies down to the most likely pathologies will be of great importance. The primary prior works have relied on the Reinforcement Learning (RL) paradigm under the intuition that it aligns better with how physicians perform DDx. In this paper, we show that a generative approach trained with simpler supervised and self-supervised learning signals can achieve superior results on the current benchmark. The proposed Transformer-based generative network, named DDxT, autoregressively produces a set of possible pathologies, i.e., DDx, and predicts the actual pathology using a neural network. Experiments are performed using the DDXPlus dataset. In the case of DDx, the proposed network has achieved a mean accuracy of 99.82% and a mean F1 score of 0.9472. Additionally, mean accuracy reaches 99.98% with a mean F1 score of 0.9949 while predicting ground truth pathology. The proposed DDxT outperformed the previous RL-based approaches by a big margin. Overall, the automated Transformer-based DDx generative model has the potential to become a useful tool for a physician in times of urgency. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: Accepted at 1st Workshop on Deep Generative Models for Health at NeurIPS 2023

arXiv:2311.05596 [pdf, other]

LLM Augmented Hierarchical Agents

Authors: Bharat Prakash, Tim Oates, Tinoosh Mohsenin

Abstract: Solving long-horizon, temporally-extended tasks using Reinforcement Learning (RL) is challenging, compounded by the common practice of learning without prior knowledge (or tabula rasa learning). Humans can generate and execute plans with temporally-extended actions and quickly learn to perform new tasks because we almost never solve problems from scratch. We want autonomous agents to have this sam… ▽ More Solving long-horizon, temporally-extended tasks using Reinforcement Learning (RL) is challenging, compounded by the common practice of learning without prior knowledge (or tabula rasa learning). Humans can generate and execute plans with temporally-extended actions and quickly learn to perform new tasks because we almost never solve problems from scratch. We want autonomous agents to have this same ability. Recently, LLMs have been shown to encode a tremendous amount of knowledge about the world and to perform impressive in-context learning and reasoning. However, using LLMs to solve real world problems is hard because they are not grounded in the current task. In this paper we exploit the planning capabilities of LLMs while using RL to provide learning from the environment, resulting in a hierarchical agent that uses LLMs to solve long-horizon tasks. Instead of completely relying on LLMs, they guide a high-level policy, making learning significantly more sample efficient. This approach is evaluated in simulation environments such as MiniGrid, SkillHack, and Crafter, and on a real robot arm in block manipulation tasks. We show that agents trained using our approach outperform other baselines methods and, once trained, don't need access to LLMs during deployment. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2306.16354 [pdf, ps, other]

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

Authors: Corey J. Nolet, Divye Gala, Alex Fender, Mahesh Doijade, Joe Eaton, Edward Raff, John Zedlewski, Brad Rees, Tim Oates

Abstract: In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time. We also propose a set of novel and reusable building blocks that compose cuSLINK. These building blocks include highly optimized computational patterns for $k$-NN graph construction, spanning trees, a… ▽ More In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time. We also propose a set of novel and reusable building blocks that compose cuSLINK. These building blocks include highly optimized computational patterns for $k$-NN graph construction, spanning trees, and dendrogram cluster extraction. We show how we used our primitives to implement cuSLINK end-to-end on the GPU, further enabling a wide range of real-world data mining and machine learning applications that were once intractable. In addition to being a primary computational bottleneck in the popular HDBSCAN algorithm, the impact of our end-to-end cuSLINK algorithm spans a large range of important applications, including cluster analysis in social and computer networks, natural language processing, and computer vision. Users can obtain cuSLINK at https://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clustering △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: To appear in ECML PKDD 2023 by Springer Nature

arXiv:2305.19534 [pdf, other]

Recasting Self-Attention with Holographic Reduced Representations

Authors: Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt

Abstract: In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the $\mathcal{O}(T^2)$ memory and $\mathcal{O}(T^2 H)$ compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of $T \geq 100,000$ are a roadblock to deep learning, we re-… ▽ More In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the $\mathcal{O}(T^2)$ memory and $\mathcal{O}(T^2 H)$ compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of $T \geq 100,000$ are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a ``Hrrformer'' we obtain several benefits including $\mathcal{O}(T H \log H)$ time complexity, $\mathcal{O}(T H)$ space complexity, and convergence in $10\times$ fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to $280\times$ faster to train on the Long Range Arena benchmark. Code is available at \url{https://github.com/NeuromorphicComputationResearchProgram/Hrrformer} △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: To appear in Proceedings of the 40th International Conference on Machine Learning (ICML)

arXiv:2302.06134 [pdf, other]

RFC-Net: Learning High Resolution Global Features for Medical Image Segmentation on a Computational Budget

Authors: Sourajit Saha, Shaswati Saha, Md Osman Gani, Tim Oates, David Chapman

Abstract: Learning High-Resolution representations is essential for semantic segmentation. Convolutional neural network (CNN)architectures with downstream and upstream propagation flow are popular for segmentation in medical diagnosis. However, due to performing spatial downsampling and upsampling in multiple stages, information loss is inexorable. On the contrary, connecting layers densely on high spatial… ▽ More Learning High-Resolution representations is essential for semantic segmentation. Convolutional neural network (CNN)architectures with downstream and upstream propagation flow are popular for segmentation in medical diagnosis. However, due to performing spatial downsampling and upsampling in multiple stages, information loss is inexorable. On the contrary, connecting layers densely on high spatial resolution is computationally expensive. In this work, we devise a Loose Dense Connection Strategy to connect neurons in subsequent layers with reduced parameters. On top of that, using a m-way Tree structure for feature propagation we propose Receptive Field Chain Network (RFC-Net) that learns high resolution global features on a compressed computational space. Our experiments demonstrates that RFC-Net achieves state-of-the-art performance on Kvasir and CVC-ClinicDB benchmarks for Polyp segmentation. △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: In Proceedings of AAAI Conference on Artificial Intelligence 2023

arXiv:2212.08121 [pdf, other]

Backdoor Attack Detection in Computer Vision by Applying Matrix Factorization on the Weights of Deep Networks

Authors: Khondoker Murad Hossain, Tim Oates

Abstract: The increasing importance of both deep neural networks (DNNs) and cloud services for training them means that bad actors have more incentive and opportunity to insert backdoors to alter the behavior of trained models. In this paper, we introduce a novel method for backdoor detection that extracts features from pre-trained DNN's weights using independent vector analysis (IVA) followed by a machine… ▽ More The increasing importance of both deep neural networks (DNNs) and cloud services for training them means that bad actors have more incentive and opportunity to insert backdoors to alter the behavior of trained models. In this paper, we introduce a novel method for backdoor detection that extracts features from pre-trained DNN's weights using independent vector analysis (IVA) followed by a machine learning classifier. In comparison to other detection techniques, this has a number of benefits, such as not requiring any training data, being applicable across domains, operating with a wide range of network architectures, not assuming the nature of the triggers used to change network behavior, and being highly scalable. We discuss the detection pipeline, and then demonstrate the results on two computer vision datasets regarding image classification and object detection. Our method outperforms the competing algorithms in terms of efficiency and is more accurate, hel** to ensure the safe application of deep learning and AI. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: 7 pages, 4 figures, 5 tables, AAAI Workshop on Safe AI 2023

arXiv:2211.13250 [pdf, other]

Lempel-Ziv Networks

Authors: Rebecca Saul, Mohammad Mahmudul Alam, John Hurwitz, Edward Raff, Tim Oates, James Holt

Abstract: Sequence processing has long been a central area of machine learning research. Recurrent neural nets have been successful in processing sequences for a number of tasks; however, they are known to be both ineffective and computationally expensive when applied to very long sequences. Compression-based methods have demonstrated more robustness when processing such sequences -- in particular, an appro… ▽ More Sequence processing has long been a central area of machine learning research. Recurrent neural nets have been successful in processing sequences for a number of tasks; however, they are known to be both ineffective and computationally expensive when applied to very long sequences. Compression-based methods have demonstrated more robustness when processing such sequences -- in particular, an approach pairing the Lempel-Ziv Jaccard Distance (LZJD) with the k-Nearest Neighbor algorithm has shown promise on long sequence problems (up to $T=200,000,000$ steps) involving malware classification. Unfortunately, use of LZJD is limited to discrete domains. To extend the benefits of LZJD to a continuous domain, we investigate the effectiveness of a deep-learning analog of the algorithm, the Lempel-Ziv Network. While we achieve successful proof of concept, we are unable to improve meaningfully on the performance of a standard LSTM across a variety of datasets and sequence processing tasks. In addition to presenting this negative result, our work highlights the problem of sub-par baseline tuning in newer research areas. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: I Can't Believe It's Not Better Workshop at NeurIPS 2022

arXiv:2210.08412 [pdf, other]

Towards an Interpretable Hierarchical Agent Framework using Semantic Goals

Authors: Bharat Prakash, Nicholas Waytowich, Tim Oates, Tinoosh Mohsenin

Abstract: Learning to solve long horizon temporally extended tasks with reinforcement learning has been a challenge for several years now. We believe that it is important to leverage both the hierarchical structure of complex tasks and to use expert supervision whenever possible to solve such tasks. This work introduces an interpretable hierarchical agent framework by combining planning and semantic goal di… ▽ More Learning to solve long horizon temporally extended tasks with reinforcement learning has been a challenge for several years now. We believe that it is important to leverage both the hierarchical structure of complex tasks and to use expert supervision whenever possible to solve such tasks. This work introduces an interpretable hierarchical agent framework by combining planning and semantic goal directed reinforcement learning. We assume access to certain spatial and haptic predicates and construct a simple and powerful semantic goal space. These semantic goal representations are more interpretable, making expert supervision and intervention easier. They also eliminate the need to write complex, dense reward functions thereby reducing human engineering effort. We evaluate our framework on a robotic block manipulation task and show that it performs better than other methods, including both sparse and dense reward functions. We also suggest some next steps and discuss how this framework makes interaction and collaboration with humans easier. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Report number: AIHRI/2022/7590

arXiv:2206.05893 [pdf, other]

Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations

Authors: Mohammad Mahmudul Alam, Edward Raff, Tim Oates, James Holt

Abstract: Due to the computational cost of running inference for a neural network, the need to deploy the inferential steps on a third party's compute environment or hardware is common. If the third party is not fully trusted, it is desirable to obfuscate the nature of the inputs and outputs, so that the third party can not easily determine what specific task is being performed. Provably secure protocols fo… ▽ More Due to the computational cost of running inference for a neural network, the need to deploy the inferential steps on a third party's compute environment or hardware is common. If the third party is not fully trusted, it is desirable to obfuscate the nature of the inputs and outputs, so that the third party can not easily determine what specific task is being performed. Provably secure protocols for leveraging an untrusted party exist but are too computational demanding to run in practice. We instead explore a different strategy of fast, heuristic security that we call Connectionist Symbolic Pseudo Secrets. By leveraging Holographic Reduced Representations (HRR), we create a neural network with a pseudo-encryption style defense that empirically shows robustness to attack, even under threat models that unrealistically favor the adversary. △ Less

Submitted 12 June, 2022; originally announced June 2022.

Comments: To appear in the Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

arXiv:2111.04120 [pdf, other]

Automatic Goal Generation using Dynamical Distance Learning

Authors: Bharat Prakash, Nicholas Waytowich, Tinoosh Mohsenin, Tim Oates

Abstract: Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment. However, sample efficiency remains a major challenge. In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging. On the other hand, humans or other biological agen… ▽ More Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment. However, sample efficiency remains a major challenge. In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging. On the other hand, humans or other biological agents learn such tasks in a much more strategic way, following a curriculum where tasks are sampled with increasing difficulty level in order to make gradual and efficient learning progress. In this work, we propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion. DDF is a function which predicts the dynamical distance between any two states within a markov decision process (MDP). With this, we generate a curriculum of goals at the appropriate difficulty level to facilitate efficient learning throughout the training process. We evaluate this approach on several goal-conditioned robotic manipulation and navigation tasks, and show improvements in sample efficiency over a baseline method which only uses random goal sampling. △ Less

Submitted 7 November, 2021; originally announced November 2021.

arXiv:2110.04649 [pdf, other]

Interactive Hierarchical Guidance using Language

Authors: Bharat Prakash, Nicholas Waytowich, Tim Oates, Tinoosh Mohsenin

Abstract: Reinforcement learning has been successful in many tasks ranging from robotic control, games, energy management etc. In complex real world environments with sparse rewards and long task horizons, sample efficiency is still a major challenge. Most complex tasks can be easily decomposed into high-level planning and low level control. Therefore, it is important to enable agents to leverage the hierar… ▽ More Reinforcement learning has been successful in many tasks ranging from robotic control, games, energy management etc. In complex real world environments with sparse rewards and long task horizons, sample efficiency is still a major challenge. Most complex tasks can be easily decomposed into high-level planning and low level control. Therefore, it is important to enable agents to leverage the hierarchical structure and decompose bigger tasks into multiple smaller sub-tasks. We introduce an approach where we use language to specify sub-tasks and a high-level planner issues language commands to a low level controller. The low-level controller executes the sub-tasks based on the language commands. Our experiments show that this method is able to solve complex long horizon planning tasks with limited human supervision. Using language has added benefit of interpretability and ability for expert humans to take over the high-level planning task and provide language commands if necessary. △ Less

Submitted 9 October, 2021; originally announced October 2021.

Comments: Presented at AI-HRI symposium as part of AAAI-FSS 2021 (arXiv:2109.10836)

Report number: AIHRI/2021/45

arXiv:2110.00078 [pdf, other]

Determining Standard Occupational Classification Codes from Job Descriptions in Immigration Petitions

Authors: Sourav Mukherjee, David Widmark, Vince DiMascio, Tim Oates

Abstract: Accurate specification of standard occupational classification (SOC) code is critical to the success of many U.S. work visa applications. Determination of correct SOC code relies on careful study of job requirements and comparison to definitions given by the U.S. Bureau of Labor Statistics, which is often a tedious activity. In this paper, we apply methods from natural language processing (NLP) to… ▽ More Accurate specification of standard occupational classification (SOC) code is critical to the success of many U.S. work visa applications. Determination of correct SOC code relies on careful study of job requirements and comparison to definitions given by the U.S. Bureau of Labor Statistics, which is often a tedious activity. In this paper, we apply methods from natural language processing (NLP) to computationally determine SOC code based on job description. We implement and empirically evaluate a broad variety of predictive models with respect to quality of prediction and training time, and identify models best suited for this task. △ Less

Submitted 30 September, 2021; originally announced October 2021.

Comments: To appear in ICDM 2021 workshop: MLLD-2021

arXiv:2109.02157 [pdf, other]

Learning with Holographic Reduced Representations

Authors: Ashwinkumar Ganesan, Hang Gao, Sunil Gandhi, Edward Raff, Tim Oates, James Holt, Mark McLean

Abstract: Holographic Reduced Representations (HRR) are a method for performing symbolic AI on top of real-valued vectors by associating each vector with an abstract concept, and providing mathematical operations to manipulate vectors as if they were classic symbolic objects. This method has seen little use outside of older symbolic AI work and cognitive science. Our goal is to revisit this approach to unde… ▽ More Holographic Reduced Representations (HRR) are a method for performing symbolic AI on top of real-valued vectors by associating each vector with an abstract concept, and providing mathematical operations to manipulate vectors as if they were classic symbolic objects. This method has seen little use outside of older symbolic AI work and cognitive science. Our goal is to revisit this approach to understand if it is viable for enabling a hybrid neural-symbolic approach to learning as a differentiable component of a deep learning architecture. HRRs today are not effective in a differentiable solution due to numerical instability, a problem we solve by introducing a projection step that forces the vectors to exist in a well behaved point in space. In doing so we improve the concept retrieval efficacy of HRRs by over $100\times$. Using multi-label classification we demonstrate how to leverage the symbolic HRR properties to develop an output layer and loss function that is able to learn effectively, and allows us to investigate some of the pros and cons of an HRR neuro-symbolic learning approach. Our code can be found at https://github.com/NeuromorphicComputationResearchProgram/Learning-with-Holographic-Reduced-Representations △ Less

Submitted 28 December, 2021; v1 submitted 5 September, 2021; originally announced September 2021.

Comments: To appear in the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2107.00124 [pdf, other]

Learning a Reversible Embedding Map** using Bi-Directional Manifold Alignment

Authors: Ashwinkumar Ganesan, Francis Ferraro, Tim Oates

Abstract: We propose a Bi-Directional Manifold Alignment (BDMA) that learns a non-linear map** between two manifolds by explicitly training it to be bijective. We demonstrate BDMA by training a model for a pair of languages rather than individual, directed source and target combinations, reducing the number of models by 50%. We show that models trained with BDMA in the "forward" (source to target) directi… ▽ More We propose a Bi-Directional Manifold Alignment (BDMA) that learns a non-linear map** between two manifolds by explicitly training it to be bijective. We demonstrate BDMA by training a model for a pair of languages rather than individual, directed source and target combinations, reducing the number of models by 50%. We show that models trained with BDMA in the "forward" (source to target) direction can successfully map words in the "reverse" (target to source) direction, yielding equivalent (or better) performance to standard unidirectional translation models where the source and target language is flipped. We also show how BDMA reduces the overall size of the model. △ Less

Submitted 30 June, 2021; originally announced July 2021.

Journal ref: Findings of the Association for Computational Linguistics: ACL 2021

arXiv:2104.06357 [pdf, ps, other]

GPU Semiring Primitives for Sparse Neighborhood Methods

Authors: Corey J. Nolet, Divye Gala, Edward Raff, Joe Eaton, Brad Rees, John Zedlewski, Tim Oates

Abstract: High-performance primitives for mathematical operations on sparse vectors must deal with the challenges of skewed degree distributions and limits on memory consumption that are typically not issues in dense operations. We demonstrate that a sparse semiring primitive can be flexible enough to support a wide range of critical distance measures while maintaining performance and memory efficiency on t… ▽ More High-performance primitives for mathematical operations on sparse vectors must deal with the challenges of skewed degree distributions and limits on memory consumption that are typically not issues in dense operations. We demonstrate that a sparse semiring primitive can be flexible enough to support a wide range of critical distance measures while maintaining performance and memory efficiency on the GPU. We further show that this primitive is a foundational component for enabling many neighborhood-based information retrieval and machine learning algorithms to accept sparse input. To our knowledge, this is the first work aiming to unify the computation of several critical distance measures on the GPU under a single flexible design paradigm and we hope that it provides a good baseline for future research in this area. Our implementation is fully open source and publicly available as part of the RAFT library of GPU-accelerated machine learning primitives (https://github.com/rapidsai/raft). △ Less

Submitted 4 March, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

arXiv:2010.01997 [pdf, other]

doi 10.1109/ICDMW51313.2020.00114

Immigration Document Classification and Automated Response Generation

Authors: Sourav Mukherjee, Tim Oates, Vince DiMascio, Huguens Jean, Rob Ares, David Widmark, Jaclyn Harder

Abstract: In this paper, we consider the problem of organizing supporting documents vital to U.S. work visa petitions, as well as responding to Requests For Evidence (RFE) issued by the U.S.~Citizenship and Immigration Services (USCIS). Typically, both processes require a significant amount of repetitive manual effort. To reduce the burden of mechanical work, we apply machine learning methods to automate th… ▽ More In this paper, we consider the problem of organizing supporting documents vital to U.S. work visa petitions, as well as responding to Requests For Evidence (RFE) issued by the U.S.~Citizenship and Immigration Services (USCIS). Typically, both processes require a significant amount of repetitive manual effort. To reduce the burden of mechanical work, we apply machine learning methods to automate these processes, with humans in the loop to review and edit output for submission. In particular, we use an ensemble of image and text classifiers to categorize supporting documents. We also use a text classifier to automatically identify the types of evidence being requested in an RFE, and used the identified types in conjunction with response templates and extracted fields to assemble draft responses. Empirical results suggest that our approach achieves considerable accuracy while significantly reducing processing time. △ Less

Submitted 29 September, 2020; originally announced October 2020.

Comments: To appear in ICDM 2020 workshop: MLLD-2020

Journal ref: 2020 International Conference on Data Mining Workshops (ICDMW), 2020, pp. 782-789

arXiv:2008.00325 [pdf, other]

Bringing UMAP Closer to the Speed of Light with GPU Acceleration

Authors: Corey J. Nolet, Victor Lafargue, Edward Raff, Thejaswi Nanditale, Tim Oates, John Zedlewski, Joshua Patterson

Abstract: The Uniform Manifold Approximation and Projection (UMAP) algorithm has become widely popular for its ease of use, quality of results, and support for exploratory, unsupervised, supervised, and semi-supervised learning. While many algorithms can be ported to a GPU in a simple and direct fashion, such efforts have resulted in inefficient and inaccurate versions of UMAP. We show a number of technique… ▽ More The Uniform Manifold Approximation and Projection (UMAP) algorithm has become widely popular for its ease of use, quality of results, and support for exploratory, unsupervised, supervised, and semi-supervised learning. While many algorithms can be ported to a GPU in a simple and direct fashion, such efforts have resulted in inefficient and inaccurate versions of UMAP. We show a number of techniques that can be used to make a faster and more faithful GPU version of UMAP, and obtain speedups of up to 100x in practice. Many of these design choices/lessons are general purpose and may inform the conversion of other graph and manifold learning algorithms to use GPUs. Our implementation has been made publicly available as part of the open source RAPIDS cuML library (https://github.com/rapidsai/cuml). △ Less

Submitted 29 March, 2021; v1 submitted 1 August, 2020; originally announced August 2020.

arXiv:2004.03734 [pdf, other]

Locality Preserving Loss: Neighbors that Live together, Align together

Authors: Ashwinkumar Ganesan, Francis Ferraro, Tim Oates

Abstract: We present a locality preserving loss (LPL) that improves the alignment between vector space embeddings while separating uncorrelated representations. Given two pretrained embedding manifolds, LPL optimizes a model to project an embedding and maintain its local neighborhood while aligning one manifold to another. This reduces the overall size of the dataset required to align the two in tasks such… ▽ More We present a locality preserving loss (LPL) that improves the alignment between vector space embeddings while separating uncorrelated representations. Given two pretrained embedding manifolds, LPL optimizes a model to project an embedding and maintain its local neighborhood while aligning one manifold to another. This reduces the overall size of the dataset required to align the two in tasks such as cross-lingual word alignment. We show that the LPL-based alignment between input vector spaces acts as a regularizer, leading to better and consistent accuracy than the baseline, especially when the size of the training set is small. We demonstrate the effectiveness of LPL optimized alignment on semantic text similarity (STS), natural language inference (SNLI), multi-genre language inference (MNLI) and cross-lingual word alignment(CLA) showing consistent improvements, finding up to 16% improvement over our baseline in lower resource settings. △ Less

Submitted 11 March, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

ACM Class: I.2.7

Journal ref: Adapt-NLP 2021

arXiv:1910.04724 [pdf, other]

Using Neural Networks for Programming by Demonstration

Authors: Karan K. Budhraja, Hang Gao, Tim Oates

Abstract: Agent-based modeling is a paradigm of modeling dynamic systems of interacting agents that are individually governed by specified behavioral rules. Training a model of such agents to produce an emergent behavior by specification of the emergent (as opposed to agent) behavior is easier from a demonstration perspective. Without the involvement of manual behavior specification via code or reliance on… ▽ More Agent-based modeling is a paradigm of modeling dynamic systems of interacting agents that are individually governed by specified behavioral rules. Training a model of such agents to produce an emergent behavior by specification of the emergent (as opposed to agent) behavior is easier from a demonstration perspective. Without the involvement of manual behavior specification via code or reliance on a defined taxonomy of possible behaviors, the demonstrator specifies the desired emergent behavior of the system over time, and retrieves agent-level parameters required to execute that motion. A low time-complexity and data requirement favoring framework for reproducing emergent behavior, given an abstract demonstration, is discussed in [1], [2]. The existing framework does, however, observe an inherent limitation in scalability because of an exponentially growing search space (with the number of agent-level parameters). Our work addresses this limitation by pursuing a more scalable architecture with the use of neural networks. While the (proof-of-concept) architecture is not suitable for many evaluated domains because of its lack of representational capacity for that domain, it is more suitable than existing work for larger datasets for the Civil Violence agent-based model. △ Less

Submitted 10 October, 2019; originally announced October 2019.

arXiv:1910.04618 [pdf, ps, other]

Universal Adversarial Perturbation for Text Classification

Authors: Hang Gao, Tim Oates

Abstract: Given a state-of-the-art deep neural network text classifier, we show the existence of a universal and very small perturbation vector (in the embedding space) that causes natural text to be misclassified with high probability. Unlike images on which a single fixed-size adversarial perturbation can be found, text is of variable length, so we define the "universality" as "token-agnostic", where a si… ▽ More Given a state-of-the-art deep neural network text classifier, we show the existence of a universal and very small perturbation vector (in the embedding space) that causes natural text to be misclassified with high probability. Unlike images on which a single fixed-size adversarial perturbation can be found, text is of variable length, so we define the "universality" as "token-agnostic", where a single perturbation is applied to each token, resulting in different perturbations of flexible sizes at the sequence level. We propose an algorithm to compute universal adversarial perturbations, and show that the state-of-the-art deep neural networks are highly vulnerable to them, even though they keep the neighborhood of tokens mostly preserved. We also show how to use these adversarial perturbations to generate adversarial text samples. The surprising existence of universal "token-agnostic" adversarial perturbations may reveal important properties of a text classifier. △ Less

Submitted 10 October, 2019; originally announced October 2019.

arXiv:1909.13392 [pdf, other]

Learning from Observations Using a Single Video Demonstration and Human Feedback

Authors: Sunil Gandhi, Tim Oates, Tinoosh Mohsenin, Nicholas Waytowich

Abstract: In this paper, we present a method for learning from video demonstrations by using human feedback to construct a map** between the standard representation of the agent and the visual representation of the demonstration. In this way, we leverage the advantages of both these representations, i.e., we learn the policy using standard state representations, but are able to specify the expected behavi… ▽ More In this paper, we present a method for learning from video demonstrations by using human feedback to construct a map** between the standard representation of the agent and the visual representation of the demonstration. In this way, we leverage the advantages of both these representations, i.e., we learn the policy using standard state representations, but are able to specify the expected behavior using video demonstration. We train an autonomous agent using a single video demonstration and use human feedback (using numerical similarity rating) to map the standard representation to the visual representation with a neural network. We show the effectiveness of our method by teaching a hopper agent in the MuJoCo to perform a backflip using a single video demonstration generated in MuJoCo as well as from a real-world YouTube video of a person performing a backflip. Additionally, we show that our method can transfer to new tasks, such as hop**, with very little human feedback. △ Less

Submitted 29 September, 2019; originally announced September 2019.

arXiv:1909.05890 [pdf, other]

doi 10.1145/3306195.3306199

Determining the Scale of Impact from Denial-of-Service Attacks in Real Time Using Twitter

Authors: Chi Zhang, Bryan Wilkinson, Ashwinkumar Ganesan, Tim Oates

Abstract: Denial of Service (DoS) attacks are common in on-line and mobile services such as Twitter, Facebook and banking. As the scale and frequency of Distributed Denial of Service (DDoS) attacks increase, there is an urgent need for determining the impact of the attack. Two central challenges of the task are to get feedback from a large number of users and to get it in a timely manner. In this paper, we… ▽ More Denial of Service (DoS) attacks are common in on-line and mobile services such as Twitter, Facebook and banking. As the scale and frequency of Distributed Denial of Service (DDoS) attacks increase, there is an urgent need for determining the impact of the attack. Two central challenges of the task are to get feedback from a large number of users and to get it in a timely manner. In this paper, we present a weakly-supervised model that does not need annotated data to measure the impact of DoS issues by applying Latent Dirichlet Allocation and symmetric Kullback-Leibler divergence on tweets. There is a limitation to the weakly-supervised module. It assumes that the event detected in a time window is a DoS attack event. This will become less of a problem, when more non-attack events twitter got collected and become less likely to be identified as a new event. Another way to remove that limitation, an optional classification layer, trained on manually annotated DoS attack tweets, to filter out non-attack tweets can be used to increase precision at the expense of recall. Experimental results show that we can learn weakly-supervised models that can achieve comparable precision to supervised ones and can be generalized across entities in the same industry. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Comments: DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security Workshop, December 3--4, 2018, San Juan, PR, USA

arXiv:1908.02947 [pdf, other]

Graph Node Embeddings using Domain-Aware Biased Random Walks

Authors: Sourav Mukherjee, Tim Oates, Ryan Wright

Abstract: The recent proliferation of publicly available graph-structured data has sparked an interest in machine learning algorithms for graph data. Since most traditional machine learning algorithms assume data to be tabular, embedding algorithms for map** graph data to real-valued vector spaces has become an active area of research. Existing graph embedding approaches are based purely on structural inf… ▽ More The recent proliferation of publicly available graph-structured data has sparked an interest in machine learning algorithms for graph data. Since most traditional machine learning algorithms assume data to be tabular, embedding algorithms for map** graph data to real-valued vector spaces has become an active area of research. Existing graph embedding approaches are based purely on structural information and ignore any semantic information from the underlying domain. In this paper, we demonstrate that semantic information can play a useful role in computing graph embeddings. Specifically, we present a framework for devising embedding strategies aware of domain-specific interpretations of graph nodes and edges, and use knowledge of downstream machine learning tasks to identify relevant graph substructures. Using two real-life domains, we show that our framework yields embeddings that are simple to implement and yet achieve equal or greater accuracy in machine learning tasks compared to domain independent approaches. △ Less

Submitted 8 August, 2019; originally announced August 2019.

arXiv:1905.00752 [pdf]

Hybrid Mortality Prediction using Multiple Source Systems

Authors: Isaac Mativo, Yelena Yesha, Michael Grasso, Tim Oates, Qian Zhu

Abstract: The use of artificial intelligence in clinical care to improve decision support systems is increasing. This is not surprising since, by its very nature, the practice of medicine consists of making decisions based on observations from different systems both inside and outside the human body. In this paper, we combine three general systems (ICU, diabetes, and comorbidities) and use them to make pati… ▽ More The use of artificial intelligence in clinical care to improve decision support systems is increasing. This is not surprising since, by its very nature, the practice of medicine consists of making decisions based on observations from different systems both inside and outside the human body. In this paper, we combine three general systems (ICU, diabetes, and comorbidities) and use them to make patient clinical predictions. We use an artificial intelligence approach to show that we can improve mortality prediction of hospitalized diabetic patients. We do this by utilizing a machine learning approach to select clinical input features that are more likely to predict mortality. We then use these features to create a hybrid mortality prediction model and compare our results to non-artificial intelligence models. For simplicity, we limit our input features to patient comorbidities and features derived from a well-known mortality measure, the Sequential Organ Failure Assessment (SOFA). △ Less

Submitted 17 April, 2019; originally announced May 2019.

arXiv:1903.12101 [pdf, other]

Extending Signature-based Intrusion Detection Systems WithBayesian Abductive Reasoning

Authors: Ashwinkumar Ganesan, Pooja Parameshwarappa, Akshay Peshave, Zhiyuan Chen, Tim Oates

Abstract: Evolving cybersecurity threats are a persistent challenge for systemadministrators and security experts as new malwares are continu-ally released. Attackers may look for vulnerabilities in commercialproducts or execute sophisticated reconnaissance campaigns tounderstand a targets network and gather information on securityproducts like firewalls and intrusion detection / prevention systems(network… ▽ More Evolving cybersecurity threats are a persistent challenge for systemadministrators and security experts as new malwares are continu-ally released. Attackers may look for vulnerabilities in commercialproducts or execute sophisticated reconnaissance campaigns tounderstand a targets network and gather information on securityproducts like firewalls and intrusion detection / prevention systems(network or host-based). Many new attacks tend to be modificationsof existing ones. In such a scenario, rule-based systems fail to detectthe attack, even though there are minor differences in conditions /attributes between rules to identify the new and existing attack. Todetect these differences the IDS must be able to isolate the subset ofconditions that are true and predict the likely conditions (differentfrom the original) that must be observed. In this paper, we proposeaprobabilistic abductive reasoningapproach that augments an exist-ing rule-based IDS (snort [29]) to detect these evolved attacks by (a)Predicting rule conditions that are likely to occur (based on existingrules) and (b) able to generate new snort rules when provided withseed rule (i.e. a starting rule) to reduce the burden on experts toconstantly update them. We demonstrate the effectiveness of theapproach by generating new rules from the snort 2012 rules set andtesting it on the MACCDC 2012 dataset [6]. △ Less

Submitted 28 March, 2019; originally announced March 2019.

arXiv:1903.10404 [pdf, other]

On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning

Authors: Bharat Prakash, Mark Horton, Nicholas R. Waytowich, William David Hairston, Tim Oates, Tinoosh Mohsenin

Abstract: In autonomous embedded systems, it is often vital to reduce the amount of actions taken in the real world and energy required to learn a policy. Training reinforcement learning agents from high dimensional image representations can be very expensive and time consuming. Autoencoders are deep neural network used to compress high dimensional data such as pixelated images into small latent representat… ▽ More In autonomous embedded systems, it is often vital to reduce the amount of actions taken in the real world and energy required to learn a policy. Training reinforcement learning agents from high dimensional image representations can be very expensive and time consuming. Autoencoders are deep neural network used to compress high dimensional data such as pixelated images into small latent representations. This compression model is vital to efficiently learn policies, especially when learning on embedded systems. We have implemented this model on the NVIDIA Jetson TX2 embedded GPU, and evaluated the power consumption, throughput, and energy consumption of the autoencoders for various CPU/GPU core combinations, frequencies, and model parameters. Additionally, we have shown the reconstructions generated by the autoencoder to analyze the quality of the generated compressed representation and also the performance of the reinforcement learning agent. Finally, we have presented an assessment of the viability of training these models on embedded systems and their usefulness in develo** autonomous policies. Using autoencoders, we were able to achieve 4-5 $\times$ improved performance compared to a baseline RL agent with a convolutional feature extractor, while using less than 2W of power. △ Less

Submitted 25 March, 2019; originally announced March 2019.

arXiv:1808.00116 [pdf, other]

Cognitive Techniques for Early Detection of Cybersecurity Events

Authors: Sandeep Narayanan, Ashwinkumar Ganesan, Karuna Joshi, Tim Oates, Anupam Joshi, Tim Finin

Abstract: The early detection of cybersecurity events such as attacks is challenging given the constantly evolving threat landscape. Even with advanced monitoring, sophisticated attackers can spend as many as 146 days in a system before being detected. This paper describes a novel, cognitive framework that assists a security analyst by exploiting the power of semantically rich knowledge representation and r… ▽ More The early detection of cybersecurity events such as attacks is challenging given the constantly evolving threat landscape. Even with advanced monitoring, sophisticated attackers can spend as many as 146 days in a system before being detected. This paper describes a novel, cognitive framework that assists a security analyst by exploiting the power of semantically rich knowledge representation and reasoning with machine learning techniques. Our Cognitive Cybersecurity system ingests information from textual sources, and various agents representing host and network-based sensors, and represents this information in a knowledge graph. This graph uses terms from an extended version of the Unified Cybersecurity Ontology. The system reasons over the knowledge graph to derive better actionable intelligence to security administrators, thus decreasing their cognitive load and increasing their confidence in the system. We have developed a proof of concept framework for our approach and demonstrate its capabilities using a custom-built ransomware instance that is similar to WannaCry. △ Less

Submitted 31 July, 2018; originally announced August 2018.

arXiv:1709.04305 [pdf, other]

Automated Cloud Provisioning on AWS using Deep Reinforcement Learning

Authors: Zhiguang Wang, Chul Gwon, Tim Oates, Adam Iezzi

Abstract: As the use of cloud computing continues to rise, controlling cost becomes increasingly important. Yet there is evidence that 30\% - 45\% of cloud spend is wasted. Existing tools for cloud provisioning typically rely on highly trained human experts to specify what to monitor, thresholds for triggering action, and actions. In this paper we explore the use of reinforcement learning (RL) to acquire po… ▽ More As the use of cloud computing continues to rise, controlling cost becomes increasingly important. Yet there is evidence that 30\% - 45\% of cloud spend is wasted. Existing tools for cloud provisioning typically rely on highly trained human experts to specify what to monitor, thresholds for triggering action, and actions. In this paper we explore the use of reinforcement learning (RL) to acquire policies to balance performance and spend, allowing humans to specify what they want as opposed to how to do it, minimizing the need for cloud expertise. Empirical results with tabular, deep, and dueling double deep Q-learning with the CloudSim simulator show the utility of RL and the relative merits of the approaches. We also demonstrate effective policy transfer learning from an extremely simple simulator to CloudSim, with the next step being transfer from CloudSim to an Amazon Web Services physical environment. △ Less

Submitted 19 September, 2017; v1 submitted 13 September, 2017; originally announced September 2017.

arXiv:1708.08430 [pdf]

Deep Belief Networks used on High Resolution Multichannel Electroencephalography Data for Seizure Detection

Authors: JT Turner, Adam Page, Tinoosh Mohsenin, Tim Oates

Abstract: Ubiquitous bio-sensing for personalized health monitoring is slowly becoming a reality with the increasing availability of small, diverse, robust, high fidelity sensors. This oncoming flood of data begs the question of how we will extract useful information from it. In this paper we explore the use of a variety of representations and machine learning algorithms applied to the task of seizure detec… ▽ More Ubiquitous bio-sensing for personalized health monitoring is slowly becoming a reality with the increasing availability of small, diverse, robust, high fidelity sensors. This oncoming flood of data begs the question of how we will extract useful information from it. In this paper we explore the use of a variety of representations and machine learning algorithms applied to the task of seizure detection in high resolution, multichannel EEG data. We explore classification accuracy, computational complexity and memory requirements with a view toward understanding which approaches are most suitable for such tasks as the number of people involved and the amount of data they produce grows to be quite large. In particular, we show that layered learning approaches such as Deep Belief Networks excel along these dimensions. △ Less

Submitted 28 August, 2017; originally announced August 2017.

Comments: Old draft of AAAI paper, AAAI Spring Symposium Series. 2014

arXiv:1707.09899 [pdf, other]

Fashioning with Networks: Neural Style Transfer to Design Clothes

Authors: Prutha Date, Ashwinkumar Ganesan, Tim Oates

Abstract: Convolutional Neural Networks have been highly successful in performing a host of computer vision tasks such as object recognition, object detection, image segmentation and texture synthesis. In 2015, Gatys et. al [7] show how the style of a painter can be extracted from an image of the painting and applied to another normal photograph, thus recreating the photo in the style of the painter. The me… ▽ More Convolutional Neural Networks have been highly successful in performing a host of computer vision tasks such as object recognition, object detection, image segmentation and texture synthesis. In 2015, Gatys et. al [7] show how the style of a painter can be extracted from an image of the painting and applied to another normal photograph, thus recreating the photo in the style of the painter. The method has been successfully applied to a wide range of images and has since spawned multiple applications and mobile apps. In this paper, the neural style transfer algorithm is applied to fashion so as to synthesize new custom clothes. We construct an approach to personalize and generate new custom clothes based on a users preference and by learning the users fashion choices from a limited set of clothes from their closet. The approach is evaluated by analyzing the generated images of clothes and how well they align with the users fashion style. △ Less

Submitted 31 July, 2017; originally announced July 2017.

Comments: ML4Fashion 2017

arXiv:1706.04215 [pdf, other]

Identifying Spatial Relations in Images using Convolutional Neural Networks

Authors: Mandar Haldekar, Ashwinkumar Ganesan, Tim Oates

Abstract: Traditional approaches to building a large scale knowledge graph have usually relied on extracting information (entities, their properties, and relations between them) from unstructured text (e.g. Dbpedia). Recent advances in Convolutional Neural Networks (CNN) allow us to shift our focus to learning entities and relations from images, as they build robust models that require little or no pre-proc… ▽ More Traditional approaches to building a large scale knowledge graph have usually relied on extracting information (entities, their properties, and relations between them) from unstructured text (e.g. Dbpedia). Recent advances in Convolutional Neural Networks (CNN) allow us to shift our focus to learning entities and relations from images, as they build robust models that require little or no pre-processing of the images. In this paper, we present an approach to identify and extract spatial relations (e.g., The girl is standing behind the table) from images using CNNs. Our research addresses two specific challenges: providing insight into how spatial relations are learned by the network and which parts of the image are used to predict these relations. We use the pre-trained network VGGNet to extract features from an image and train a Multi-layer Perceptron (MLP) on a set of synthetic images and the sun09 dataset to extract spatial relations. The MLP predicts spatial relations without a bounding box around the objects or the space in the image depicting the relation. To understand how the spatial relations are represented in the network, a heatmap is overlayed on the image to show the regions that are deemed important by the network. Also, we analyze the MLP to show the relationship between the activation of consistent groups of nodes and the prediction of a spatial relation. We show how the loss of these groups affects the networks ability to identify relations. △ Less

Submitted 13 June, 2017; originally announced June 2017.

arXiv:1611.06455 [pdf, other]

Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline

Authors: Zhiguang Wang, Weizhong Yan, Tim Oates

Abstract: We propose a simple but strong baseline for time series classification from scratch with deep neural networks. Our proposed baseline models are pure end-to-end without any heavy preprocessing on the raw data or feature crafting. The proposed Fully Convolutional Network (FCN) achieves premium performance to other state-of-the-art approaches and our exploration of the very deep neural networks with… ▽ More We propose a simple but strong baseline for time series classification from scratch with deep neural networks. Our proposed baseline models are pure end-to-end without any heavy preprocessing on the raw data or feature crafting. The proposed Fully Convolutional Network (FCN) achieves premium performance to other state-of-the-art approaches and our exploration of the very deep neural networks with the ResNet structure is also competitive. The global average pooling in our convolutional model enables the exploitation of the Class Activation Map (CAM) to find out the contributing region in the raw data for the specific labels. Our models provides a simple choice for the real world application and a good starting point for the future research. An overall analysis is provided to discuss the generalization capability of our models, learned features, network structures and the classification semantics. △ Less

Submitted 14 December, 2016; v1 submitted 19 November, 2016; originally announced November 2016.

arXiv:1610.05819 [pdf, other]

Finding Representative Points in Multivariate Data Using PCA

Authors: Ashwinkumar Ganesan, Tim Oates, Matt Schmill

Abstract: The idea of representation has been used in various fields of study from data analysis to political science. In this paper, we define representativeness and describe a method to isolate data points that can represent the entire data set. Also, we show how the minimum set of representative data points can be generated. We use data from GLOBE (a project to study the effects on Land Change based on a… ▽ More The idea of representation has been used in various fields of study from data analysis to political science. In this paper, we define representativeness and describe a method to isolate data points that can represent the entire data set. Also, we show how the minimum set of representative data points can be generated. We use data from GLOBE (a project to study the effects on Land Change based on a set of parameters that include temperature, forest cover, human population, atmospheric parameters and many other variables) to test & validate the algorithm. Principal Component Analysis (PCA) is used to reduce the dimensions of the multivariate data set, so that the representative points can be generated efficiently and its Representativeness has been compared against Random Sampling of points from the data set. △ Less

Submitted 18 October, 2016; originally announced October 2016.

arXiv:1608.02971 [pdf, other]

Neuroevolution-Based Inverse Reinforcement Learning

Authors: Karan K. Budhraja, Tim Oates

Abstract: The problem of Learning from Demonstration is targeted at learning to perform tasks based on observed examples. One approach to Learning from Demonstration is Inverse Reinforcement Learning, in which actions are observed to infer rewards. This work combines a feature based state evaluation approach to Inverse Reinforcement Learning with neuroevolution, a paradigm for modifying neural networks base… ▽ More The problem of Learning from Demonstration is targeted at learning to perform tasks based on observed examples. One approach to Learning from Demonstration is Inverse Reinforcement Learning, in which actions are observed to infer rewards. This work combines a feature based state evaluation approach to Inverse Reinforcement Learning with neuroevolution, a paradigm for modifying neural networks based on their performance on a given task. Neural networks are used to learn from a demonstrated expert policy and are evolved to generate a policy similar to the demonstration. The algorithm is discussed and evaluated against competitive feature-based Inverse Reinforcement Learning approaches. At the cost of execution time, neural networks allow for non-linear combinations of features in state evaluations. These valuations may correspond to state value or state reward. This results in better correspondence to observed examples as opposed to using linear combinations. This work also extends existing work on Bayesian Non-Parametric Feature Construction for Inverse Reinforcement Learning by using non-linear combinations of intermediate data to improve performance. The algorithm is observed to be specifically suitable for a linearly solvable non-deterministic Markov Decision Processes in which multiple rewards are sparsely scattered in state space. A conclusive performance hierarchy between evaluated algorithms is presented. △ Less

Submitted 9 August, 2016; originally announced August 2016.

Comments: 12 pages, 15 figures

arXiv:1510.03826 [pdf, other]

Adopting Robustness and Optimality in Fitting and Learning

Authors: Zhiguang Wang, Tim Oates, James Lo

Abstract: We generalized a modified exponentialized estimator by pushing the robust-optimal (RO) index $λ$ to $-\infty$ for achieving robustness to outliers by optimizing a quasi-Minimin function. The robustness is realized and controlled adaptively by the RO index without any predefined threshold. Optimality is guaranteed by expansion of the convexity region in the Hessian matrix to largely avoid local opt… ▽ More We generalized a modified exponentialized estimator by pushing the robust-optimal (RO) index $λ$ to $-\infty$ for achieving robustness to outliers by optimizing a quasi-Minimin function. The robustness is realized and controlled adaptively by the RO index without any predefined threshold. Optimality is guaranteed by expansion of the convexity region in the Hessian matrix to largely avoid local optima. Detailed quantitative analysis on both robustness and optimality are provided. The results of proposed experiments on fitting tasks for three noisy non-convex functions and the digits recognition task on the MNIST dataset consolidate the conclusions. △ Less

Submitted 18 October, 2023; v1 submitted 13 October, 2015; originally announced October 2015.

Comments: arXiv admin note: text overlap with arXiv:1506.02690

arXiv:1509.07481 [pdf, other]

Spatially Encoding Temporal Correlations to Classify Temporal Data Using Convolutional Neural Networks

Authors: Zhiguang Wang, Tim Oates

Abstract: We propose an off-line approach to explicitly encode temporal patterns spatially as different types of images, namely, Gramian Angular Fields and Markov Transition Fields. This enables the use of techniques from computer vision for feature learning and classification. We used Tiled Convolutional Neural Networks to learn high-level features from individual GAF, MTF, and GAF-MTF images on 12 benchma… ▽ More We propose an off-line approach to explicitly encode temporal patterns spatially as different types of images, namely, Gramian Angular Fields and Markov Transition Fields. This enables the use of techniques from computer vision for feature learning and classification. We used Tiled Convolutional Neural Networks to learn high-level features from individual GAF, MTF, and GAF-MTF images on 12 benchmark time series datasets and two real spatial-temporal trajectory datasets. The classification results of our approach are competitive with state-of-the-art approaches on both types of data. An analysis of the features and weights learned by the CNNs explains why the approach works. △ Less

Submitted 24 September, 2015; originally announced September 2015.

Comments: Submit to JCSS. Preliminary versions are appeared in AAAI 2015 workshop and IJCAI 2016 [arXiv:1506.00327]

arXiv:1506.02690 [pdf, other]

Adaptive Normalized Risk-Averting Training For Deep Neural Networks

Authors: Zhiguang Wang, Tim Oates, James Lo

Abstract: This paper proposes a set of new error criteria and learning approaches, Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex optimization problem in training deep neural networks (DNNs). Theoretically, we demonstrate its effectiveness on global and local convexity lower-bounded by the standard $L_p$-norm error. By analyzing the gradient on the convexity index $λ$, we expla… ▽ More This paper proposes a set of new error criteria and learning approaches, Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex optimization problem in training deep neural networks (DNNs). Theoretically, we demonstrate its effectiveness on global and local convexity lower-bounded by the standard $L_p$-norm error. By analyzing the gradient on the convexity index $λ$, we explain the reason why to learn $λ$ adaptively using gradient descent works. In practice, we show how this method improves training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard ConvNets + MSE/cross entropy. Performance on deep/shallow multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can be combined with other quasi-Newton training methods, innovative network variants, regularization techniques and other specific tricks in DNNs. Other than unsupervised pretraining, it provides a new perspective to address the non-convex optimization problem in DNNs. △ Less

Submitted 9 June, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

Comments: AAAI 2016, 0.39%~0.4% ER on MNIST with single 32-32-256-10 ConvNets, code available at https://github.com/cauchyturing/ANRAE

arXiv:1506.00327 [pdf, other]

Imaging Time-Series to Improve Classification and Imputation

Authors: Zhiguang Wang, Tim Oates

Abstract: Inspired by recent successes of deep learning in computer vision, we propose a novel framework for encoding time series as different types of images, namely, Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for time series classification and imputation. We used Tiled Convolutional Neural Networks (ti… ▽ More Inspired by recent successes of deep learning in computer vision, we propose a novel framework for encoding time series as different types of images, namely, Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for time series classification and imputation. We used Tiled Convolutional Neural Networks (tiled CNNs) on 20 standard datasets to learn high-level features from the individual and compound GASF-GADF-MTF images. Our approaches achieve highly competitive results when compared to nine of the current best time series classification approaches. Inspired by the bijection property of GASF on 0/1 rescaled data, we train Denoised Auto-encoders (DA) on the GASF images of four standard and one synthesized compound dataset. The imputation MSE on test data is reduced by 12.18%-48.02% when compared to using the raw data. An analysis of the features and weights learned via tiled CNNs and DAs explains why the approaches work. △ Less

Submitted 31 May, 2015; originally announced June 2015.

Comments: Accepted by IJCAI-2015 ML track

arXiv:1412.6502

Detecting Epileptic Seizures from EEG Data using Neural Networks

Authors: Siddharth Pramod, Adam Page, Tinoosh Mohsenin, Tim Oates

Abstract: We explore the use of neural networks trained with dropout in predicting epileptic seizures from electroencephalographic data (scalp EEG). The input to the neural network is a 126 feature vector containing 9 features for each of the 14 EEG channels obtained over 1-second, non-overlap** windows. The models in our experiments achieved high sensitivity and specificity on patient records not used in… ▽ More We explore the use of neural networks trained with dropout in predicting epileptic seizures from electroencephalographic data (scalp EEG). The input to the neural network is a 126 feature vector containing 9 features for each of the 14 EEG channels obtained over 1-second, non-overlap** windows. The models in our experiments achieved high sensitivity and specificity on patient records not used in the training process. This is demonstrated using leave-one-out-cross-validation across patient records, where we hold out one patient's record as the test set and use all other patients' records for training; repeating this procedure for all patients in the database. △ Less

Submitted 3 February, 2019; v1 submitted 19 December, 2014; originally announced December 2014.

Comments: This paper has been withdrawn by the authors due to an error discovered in the experiments

arXiv:1307.6889 [pdf]

Contextualizing the global relevance of local land change observations

Authors: N. R. Magliocca, E. C. Ellis, T. Oates, M. Schmill

Abstract: To understand global changes in the Earth system, scientists must generalize globally from observations made locally and regionally. In land change science (LCS), local field-based observations are costly and time consuming, and generally obtained by researchers working at disparate local and regional case-study sites chosen for different reasons. As a result, global synthesis efforts in LCS tend… ▽ More To understand global changes in the Earth system, scientists must generalize globally from observations made locally and regionally. In land change science (LCS), local field-based observations are costly and time consuming, and generally obtained by researchers working at disparate local and regional case-study sites chosen for different reasons. As a result, global synthesis efforts in LCS tend to be based on non-statistical inferences subject to geographic biases stemming from data limitations and fragmentation. Thus, a fundamental challenge is the production of generalized knowledge that links evidence of the causes and consequences of local land change to global patterns and vice versa. The GLOBE system was designed to meet this challenge. GLOBE aims to transform global change science by enabling new scientific workflows based on statistically robust, globally relevant integration of local and regional observations using an online social-computational and geovisualization system. Consistent with the goals of Digital Earth, GLOBE has the capability to assess the global relevance of local case-study findings within the context of over 50 global biophysical, land-use, climate, and socio-economic datasets. We demonstrate the implementation of one such assessment - a representativeness analysis - with a recently published meta-study of changes in swidden agriculture in tropical forests. The analysis provides a standardized indicator to judge the global representativeness of the trends reported in the meta-study, and a geovisualization is presented that highlights areas for which sampling efforts can be reduced and those in need of further study. GLOBE will enable researchers and institutions to rapidly share, compare, and synthesize local and regional studies within the global context, as well as contributing to the larger goal of creating a Digital Earth. △ Less

Submitted 25 July, 2013; originally announced July 2013.

Comments: 5 pages, 4 figures, white paper

arXiv:1301.0567 [pdf]

The Thing That We Tried Didn't Work Very Well : Deictic Representation in Reinforcement Learning

Authors: Sarah Finney, Natalia Gardiol, Leslie Pack Kaelbling, Tim Oates

Abstract: Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic repres… ▽ More Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naïve propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen learning performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects. △ Less

Submitted 12 December, 2012; originally announced January 2013.

Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

Report number: UAI-P-2002-PG-154-161

Showing 1–45 of 45 results for author: Oates, T