-
On the Directed Oberwolfach Problem with variable cycle lengths
Authors:
Elaheh Shabani,
Mateja Šajna
Abstract:
The Directed Oberwolfach Problem can be considered as the directed version of the well-known Oberwolfach Problem, first mentioned by Ringel at a conference in Oberwolfach, Germany in 1967. In this paper, we describe some new partial results on the Directed Oberwolfach Problem with variable cycle lengths. In particular, we show that the complete symmetric digraph $K_n^{*}$ admits a…
▽ More
The Directed Oberwolfach Problem can be considered as the directed version of the well-known Oberwolfach Problem, first mentioned by Ringel at a conference in Oberwolfach, Germany in 1967. In this paper, we describe some new partial results on the Directed Oberwolfach Problem with variable cycle lengths. In particular, we show that the complete symmetric digraph $K_n^{*}$ admits a $( \vec{C}_2, ..., \vec{C}_2, \vec{C}_3) $-factorization for all $ n\equiv 1, 3,$ or $ 7\pmod{8}$. We also show that $K_n^{*}$ admits a $(\vec{C}_2, \vec{C}_{n-2})$-factorization for any integer $n \geq 5$.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
Detecting Pathogenic Social Media Accounts without Content or Network Structure
Authors:
Elham Shaabani,
Ruocheng Guo,
Paulo Shakarian
Abstract:
The spread of harmful mis-information in social media is a pressing problem. We refer accounts that have the capability of spreading such information to viral proportions as "Pathogenic Social Media" accounts. These accounts include terrorist supporters accounts, water armies, and fake news writers. We introduce an unsupervised causality-based framework that also leverages label propagation. This…
▽ More
The spread of harmful mis-information in social media is a pressing problem. We refer accounts that have the capability of spreading such information to viral proportions as "Pathogenic Social Media" accounts. These accounts include terrorist supporters accounts, water armies, and fake news writers. We introduce an unsupervised causality-based framework that also leverages label propagation. This approach identifies these users without using network structure, cascade path information, content and user's information. We show our approach obtains higher precision (0.75) in identifying Pathogenic Social Media accounts in comparison with random (precision of 0.11) and existing bot detection (precision of 0.16) methods.
△ Less
Submitted 4 May, 2019;
originally announced May 2019.
-
An End-to-End Framework to Identify Pathogenic Social Media Accounts on Twitter
Authors:
Elham Shaabani,
Ashkan Sadeghi-Mobarakeh,
Hamidreza Alvari,
Paulo Shakarian
Abstract:
Pathogenic Social Media (PSM) accounts such as terrorist supporter accounts and fake news writers have the capability of spreading disinformation to viral proportions. Early detection of PSM accounts is crucial as they are likely to be key users to make malicious information "viral". In this paper, we adopt the causal inference framework along with graph-based metrics in order to distinguish PSMs…
▽ More
Pathogenic Social Media (PSM) accounts such as terrorist supporter accounts and fake news writers have the capability of spreading disinformation to viral proportions. Early detection of PSM accounts is crucial as they are likely to be key users to make malicious information "viral". In this paper, we adopt the causal inference framework along with graph-based metrics in order to distinguish PSMs from normal users within a short time of their activities. We propose both supervised and semi-supervised approaches without taking the network information and content into account. Results on a real-world dataset from Twitter accentuates the advantage of our proposed frameworks. We show our approach achieves 0.28 improvement in F1 score over existing approaches with the precision of 0.90 and F1 score of 0.63.
△ Less
Submitted 4 May, 2019;
originally announced May 2019.
-
Less is More: Semi-Supervised Causal Inference for Detecting Pathogenic Users in Social Media
Authors:
Hamidreza Alvari,
Elham Shaabani,
Soumajyoti Sarkar,
Ghazaleh Beigi,
Paulo Shakarian
Abstract:
Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as "Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost imp…
▽ More
Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as "Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost importance for social media authorities. The burden usually falls to automatic approaches that can identify these accounts and protect social media reputation. However, lack of sufficient labeled examples for devising and training sophisticated approaches to combat these accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to massive user-generated data. In this paper, we propose a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data. In particular, the proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on a real-world ISIS-related dataset from Twitter suggests promising results of utilizing unlabeled instances for detecting PSMs.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.
-
Anomaly Detection for an E-commerce Pricing System
Authors:
Jagdish Ramakrishnan,
Elham Shaabani,
Chao Li,
Mátyás A. Sustik
Abstract:
Online retailers execute a very large number of price updates when compared to brick-and-mortar stores. Even a few mis-priced items can have a significant business impact and result in a loss of customer trust. Early detection of anomalies in an automated real-time fashion is an important part of such a pricing system. In this paper, we describe unsupervised and supervised anomaly detection approa…
▽ More
Online retailers execute a very large number of price updates when compared to brick-and-mortar stores. Even a few mis-priced items can have a significant business impact and result in a loss of customer trust. Early detection of anomalies in an automated real-time fashion is an important part of such a pricing system. In this paper, we describe unsupervised and supervised anomaly detection approaches we developed and deployed for a large-scale online pricing system at Walmart. Our system detects anomalies both in batch and real-time streaming settings, and the items flagged are reviewed and actioned based on priority and business impact. We found that having the right architecture design was critical to facilitate model performance at scale, and business impact and speed were important factors influencing model selection, parameter choice, and prioritization in a production environment for a large-scale system. We conducted analyses on the performance of various approaches on a test set using real-world retail data and fully deployed our approach into production. We found that our approach was able to detect the most important anomalies with high precision.
△ Less
Submitted 1 June, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Early Identification of Pathogenic Social Media Accounts
Authors:
Hamidreza Alvari,
Elham Shaabani,
Paulo Shakarian
Abstract:
Pathogenic Social Media (PSM) accounts such as terrorist supporters exploit large communities of supporters for conducting attacks on social media. Early detection of these accounts is crucial as they are high likely to be key users in making a harmful message "viral". In this paper, we make the first attempt on utilizing causal inference to identify PSMs within a short time frame around their act…
▽ More
Pathogenic Social Media (PSM) accounts such as terrorist supporters exploit large communities of supporters for conducting attacks on social media. Early detection of these accounts is crucial as they are high likely to be key users in making a harmful message "viral". In this paper, we make the first attempt on utilizing causal inference to identify PSMs within a short time frame around their activity. We propose a time-decay causality metric and incorporate it into a causal community detection-based algorithm. The proposed algorithm is applied to groups of accounts sharing similar causality features and is followed by a classification algorithm to classify accounts as PSM or not. Unlike existing techniques that take significant time to collect information such as network, cascade path, or content, our scheme relies solely on action log of users. Results on a real-world dataset from Twitter demonstrate effectiveness and efficiency of our approach. We achieved precision of 0.84 for detecting PSMs only based on their first 10 days of activity; the misclassified accounts were then detected 10 days later.
△ Less
Submitted 26 September, 2018; v1 submitted 25 September, 2018;
originally announced September 2018.
-
Toward Early and Order-of-Magnitude Cascade Prediction in Social Networks
Authors:
Ruocheng Guo,
Elham Shaabani,
Abhinav Bhatnagar,
Paulo Shakarian
Abstract:
When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to viral proportions - where viral can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power-law - which leads to a severe imbalance in this cl…
▽ More
When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to viral proportions - where viral can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power-law - which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on structural diversity - the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class - despite this class comprising under 2% of samples. This significantly outperforms our baseline approach as well as the current state-of-the-art. We also show this approach also performs well for identifying if cascades observed for 60 minutes will grow to 500 reposts as well as demonstrate how we can tradeoff between precision and recall.
△ Less
Submitted 8 August, 2016;
originally announced August 2016.
-
MIST: Missing Person Intelligence Synthesis Toolkit
Authors:
Elham Shaabani,
Hamidreza Alvari,
Paulo Shakarian,
J. E. Kelly Snyder
Abstract:
Each day, approximately 500 missing persons cases occur that go unsolved/unresolved in the United States. The non-profit organization known as the Find Me Group (FMG), led by former law enforcement professionals, is dedicated to solving or resolving these cases. This paper introduces the Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a data-driven variant of geospatial abduct…
▽ More
Each day, approximately 500 missing persons cases occur that go unsolved/unresolved in the United States. The non-profit organization known as the Find Me Group (FMG), led by former law enforcement professionals, is dedicated to solving or resolving these cases. This paper introduces the Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a data-driven variant of geospatial abductive inference. This system takes search locations provided by a group of experts and rank-orders them based on the probability assigned to areas based on the prior performance of the experts taken as a group. We evaluate our approach compared to the current practices employed by the Find Me Group and found it significantly reduces the search area - leading to a reduction of 31 square miles over 24 cases we examined in our experiments. Currently, we are using MIST to aid the Find Me Group in an active missing person case.
△ Less
Submitted 29 August, 2016; v1 submitted 28 July, 2016;
originally announced July 2016.
-
Early Identification of Violent Criminal Gang Members
Authors:
Elham Shaabani,
Ashkan Aleali,
Paulo Shakarian,
John Bertetto
Abstract:
Gang violence is a major problem in the United States accounting for a large fraction of homicides and other violent crime. In this paper, we study the problem of early identification of violent gang members. Our approach relies on modified centrality measures that take into account additional data of the individuals in the social network of co-arrestees which together with other arrest metadata p…
▽ More
Gang violence is a major problem in the United States accounting for a large fraction of homicides and other violent crime. In this paper, we study the problem of early identification of violent gang members. Our approach relies on modified centrality measures that take into account additional data of the individuals in the social network of co-arrestees which together with other arrest metadata provide a rich set of features for a classification algorithm. We show our approach obtains high precision and recall (0.89 and 0.78 respectively) in the case where the entire network is known and out-performs current approaches used by law-enforcement to the problem in the case where the network is discovered overtime by virtue of new arrests - mimicking real-world law-enforcement operations. Operational issues are also discussed as we are preparing to leverage this method in an operational environment.
△ Less
Submitted 17 August, 2015;
originally announced August 2015.
-
Toward Order-of-Magnitude Cascade Prediction
Authors:
Ruocheng Guo,
Elham Shaabani,
Abhinav Bhatnagar,
Paulo Shakarian
Abstract:
When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to "viral" proportions -- where "viral" is defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power-law - which leads to a severe imbalance in this c…
▽ More
When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to "viral" proportions -- where "viral" is defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power-law - which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on "structural diversity" -- the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class - despite this class comprising under 2\% of samples. This significantly outperforms our baseline approach as well as the current state-of-the-art. Our work also demonstrates how we can tradeoff between precision and recall.
△ Less
Submitted 13 August, 2015;
originally announced August 2015.