Search | arXiv e-print repository

Efficient Data Fusion using the Tsetlin Machine

Authors: Rupsa Saha, Vladimir I. Zadorozhny, Ole-Christoffer Granmo

Abstract: We propose a novel way of assessing and fusing noisy dynamic data using a Tsetlin Machine. Our approach consists in monitoring how explanations in form of logical clauses that a TM learns changes with possible noise in dynamic data. This way TM can recognize the noise by lowering weights of previously learned clauses, or reflect it in the form of new clauses. We also perform a comprehensive experi… ▽ More We propose a novel way of assessing and fusing noisy dynamic data using a Tsetlin Machine. Our approach consists in monitoring how explanations in form of logical clauses that a TM learns changes with possible noise in dynamic data. This way TM can recognize the noise by lowering weights of previously learned clauses, or reflect it in the form of new clauses. We also perform a comprehensive experimental study using notably different datasets that demonstrated high performance of the proposed approach. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2302.00128 [pdf, other]

TBAM: Towards An Agent-Based Model to Enrich Twitter Data

Authors: Usman Anjum, Vladimir Zadorozhny, Prashant Krishnamurthy

Abstract: Twitter (one example of microblogging) is widely being used by researchers to understand human behavior, specifically how people behave when a significant event occurs and how it changes user microblogging patterns. The changing microblogging behavior can reveal patterns that can help in detecting real-world events. However, the Twitter data that is available has limitations, such as, it is incomp… ▽ More Twitter (one example of microblogging) is widely being used by researchers to understand human behavior, specifically how people behave when a significant event occurs and how it changes user microblogging patterns. The changing microblogging behavior can reveal patterns that can help in detecting real-world events. However, the Twitter data that is available has limitations, such as, it is incomplete and noisy and the samples are irregular. In this paper we create a model, called Twitter Behavior Agent-Based Model (TBAM) to simulate Twitter pattern and behavior using Agent-Based Modeling (ABM). The generated data from ABM simulations can be used in place or to complement the real-world data toward improving the accuracy of event detection. We confirm the validity of our model by finding the cross-correlation between the real data collected from Twitter and the data generated using TBAM. △ Less

Submitted 31 January, 2023; originally announced February 2023.

Journal ref: 18th ISCRAM Conference Proceedings 2021

arXiv:2102.10952 [pdf, other]

A Relational Tsetlin Machine with Applications to Natural Language Understanding

Authors: Rupsa Saha, Ole-Christoffer Granmo, Vladimir I. Zadorozhny, Morten Goodwin

Abstract: TMs are a pattern recognition approach that uses finite state machines for learning and propositional logic to represent patterns. In addition to being natively interpretable, they have provided competitive accuracy for various tasks. In this paper, we increase the computing power of TMs by proposing a first-order logic-based framework with Herbrand semantics. The resulting TM is relational and ca… ▽ More TMs are a pattern recognition approach that uses finite state machines for learning and propositional logic to represent patterns. In addition to being natively interpretable, they have provided competitive accuracy for various tasks. In this paper, we increase the computing power of TMs by proposing a first-order logic-based framework with Herbrand semantics. The resulting TM is relational and can take advantage of logical structures appearing in natural language, to learn rules that represent how actions and consequences are related in the real world. The outcome is a logic program of Horn clauses, bringing in a structured view of unstructured data. In closed-domain question-answering, the first-order representation produces 10x more compact KBs, along with an increase in answering accuracy from 94.83% to 99.48%. The approach is further robust towards erroneous, missing, and superfluous information, distilling the aspects of a text that are important for real-world understanding. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: 14 pages, 3 figures, 7 tables, relational approach to TM in NLP

ACM Class: I.2.7; I.2.4

arXiv:2010.02097 [pdf, other]

FaNDS: Fake News Detection System Using Energy Flow

Authors: Jiawei Xu, Vladimir Zadorozhny, Danchen Zhang, John Grant

Abstract: Recently, the term "fake news" has been broadly and extensively utilized for disinformation, misinformation, hoaxes, propaganda, satire, rumors, click-bait, and junk news. It has become a serious problem around the world. We present a new system, FaNDS, that detects fake news efficiently. The system is based on several concepts used in some previous works but in a different context. There are two… ▽ More Recently, the term "fake news" has been broadly and extensively utilized for disinformation, misinformation, hoaxes, propaganda, satire, rumors, click-bait, and junk news. It has become a serious problem around the world. We present a new system, FaNDS, that detects fake news efficiently. The system is based on several concepts used in some previous works but in a different context. There are two main concepts: an Inconsistency Graph and Energy Flow. The Inconsistency Graph contains news items as nodes and inconsistent opinions between them for edges. Energy Flow assigns each node an initial energy and then some energy is propagated along the edges until the energy distribution on all nodes converges. To illustrate FaNDS we use the original data from the Fake News Challenge (FNC-1). First, the data has to be reconstructed in order to generate the Inconsistency Graph. The graph contains various subgraphs with well-defined shapes that represent different types of connections between the news items. Then the Energy Flow method is applied. The nodes with high energy are the candidates for being fake news. In our experiments, all these were indeed fake news as we checked each using several reliable web sites. We compared FaNDS to several other fake news detection methods and found it to be more sensitive in discovering fake news items. △ Less

Submitted 5 October, 2020; originally announced October 2020.

arXiv:1807.04415 [pdf]

doi 10.1109/IRI.2016.55

Process Discovery using Classification Tree Hidden Semi-Markov Model

Authors: Yihuang Kang, Vladimir Zadorozhny

Abstract: Various and ubiquitous information systems are being used in monitoring, exchanging, and collecting information. These systems are generating massive amount of event sequence logs that may help us understand underlying phenomenon. By analyzing these logs, we can learn process models that describe system procedures, predict the development of the system, or check whether the changes are expected. I… ▽ More Various and ubiquitous information systems are being used in monitoring, exchanging, and collecting information. These systems are generating massive amount of event sequence logs that may help us understand underlying phenomenon. By analyzing these logs, we can learn process models that describe system procedures, predict the development of the system, or check whether the changes are expected. In this paper, we consider a novel technique that models these sequences of events in temporal-probabilistic manners. Specifically, we propose a probabilistic process model that combines hidden semi-Markov model and classification trees learning. Our experimental result shows that the proposed approach can answer a kind of question-"what are the most frequent sequence of system dynamics relevant to a given sequence of observable events?". For example, "Given a series of medical treatments, what are the most relevant patients' health condition pattern changes at different times?" △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: 2016 IEEE International Conference on Information Reuse and Integration

arXiv:1807.03387 [pdf]

doi 10.1007/s10115-015-0858-z

Process Monitoring Using Maximum Sequence Divergence

Authors: Yihuang Kang, Vladimir Zadorozhny

Abstract: Process Monitoring involves tracking a system's behaviors, evaluating the current state of the system, and discovering interesting events that require immediate actions. In this paper, we consider monitoring temporal system state sequences to help detect the changes of dynamic systems, check the divergence of the system development, and evaluate the significance of the deviation. We begin with dis… ▽ More Process Monitoring involves tracking a system's behaviors, evaluating the current state of the system, and discovering interesting events that require immediate actions. In this paper, we consider monitoring temporal system state sequences to help detect the changes of dynamic systems, check the divergence of the system development, and evaluate the significance of the deviation. We begin with discussions of data reduction, symbolic data representation, and the anomaly detection in temporal discrete sequences. Time-series representation methods are also discussed and used in this paper to discretize raw data into sequences of system states. Markov Chains and stationary state distributions are continuously generated from temporal sequences to represent snapshots of the system dynamics in different time frames. We use generalized Jensen-Shannon Divergence as the measure to monitor changes of the stationary symbol probability distributions and evaluate the significance of system deviations. We prove that the proposed approach is able to detect deviations of the systems we monitor and assess the deviation significance in probabilistic manner. △ Less

Submitted 9 July, 2018; originally announced July 2018.

Journal ref: Knowledge and Information Systems 48.1 (2016): 81-109

Showing 1–6 of 6 results for author: Zadorozhny, V