-
GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Authors:
Mehran Kazemi,
Hamidreza Alvari,
Ankit Anand,
Jialin Wu,
Xi Chen,
Radu Soricut
Abstract:
Large language models have shown impressive results for multi-hop mathematical reasoning when the input question is only textual. Many mathematical reasoning problems, however, contain both text and image. With the ever-increasing adoption of vision language models (VLMs), understanding their reasoning abilities for such problems is crucial. In this paper, we evaluate the reasoning capabilities of…
▽ More
Large language models have shown impressive results for multi-hop mathematical reasoning when the input question is only textual. Many mathematical reasoning problems, however, contain both text and image. With the ever-increasing adoption of vision language models (VLMs), understanding their reasoning abilities for such problems is crucial. In this paper, we evaluate the reasoning capabilities of VLMs along various axes through the lens of geometry problems. We procedurally create a synthetic dataset of geometry questions with controllable difficulty levels along multiple axes, thus enabling a systematic evaluation. The empirical results obtained using our benchmark for state-of-the-art VLMs indicate that these models are not as capable in subjects like geometry (and, by generalization, other topics requiring similar reasoning) as suggested by previous benchmarks. This is made especially clear by the construction of our benchmark at various depth levels, since solving higher-depth problems requires long chains of reasoning rather than additional memorized knowledge. We release the dataset for further research in this area.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Mitigating Bias in Online Microfinance Platforms: A Case Study on Kiva.org
Authors:
Soumajyoti Sarkar,
Hamidreza Alvari
Abstract:
Over the last couple of decades in the lending industry, financial disintermediation has occurred on a global scale. Traditionally, even for small supply of funds, banks would act as the conduit between the funds and the borrowers. It has now been possible to overcome some of the obstacles associated with such supply of funds with the advent of online platforms like Kiva, Prosper, LendingClub. Kiv…
▽ More
Over the last couple of decades in the lending industry, financial disintermediation has occurred on a global scale. Traditionally, even for small supply of funds, banks would act as the conduit between the funds and the borrowers. It has now been possible to overcome some of the obstacles associated with such supply of funds with the advent of online platforms like Kiva, Prosper, LendingClub. Kiva for example, works with Micro Finance Institutions (MFIs) in develo** countries to build Internet profiles of borrowers with a brief biography, loan requested, loan term, and purpose. Kiva, in particular, allows lenders to fund projects in different sectors through group or individual funding. Traditional research studies have investigated various factors behind lender preferences purely from the perspective of loan attributes and only until recently have some cross-country cultural preferences been investigated. In this paper, we investigate lender perceptions of economic factors of the borrower countries in relation to their preferences towards loans associated with different sectors. We find that the influence from economic factors and loan attributes can have substantially different roles to play for different sectors in achieving faster funding. We formally investigate and quantify the hidden biases prevalent in different loan sectors using recent tools from causal inference and regression models that rely on Bayesian variable selection methods. We then extend these models to incorporate fairness constraints based on our empirical analysis and find that such models can still achieve near comparable results with respect to baseline regression models.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
A Feature-Driven Approach for Identifying Pathogenic Social Media Accounts
Authors:
Hamidreza Alvari,
Ghazaleh Beigi,
Soumajyoti Sarkar,
Scott W. Ruston,
Steven R. Corman,
Hasan Davulcu,
Paulo Shakarian
Abstract:
Over the past few years, we have observed different media outlets' attempts to shift public opinion by framing information to support a narrative that facilitate their goals. Malicious users referred to as "pathogenic social media" (PSM) accounts are more likely to amplify this phenomena by spreading misinformation to viral proportions. Understanding the spread of misinformation from account-level…
▽ More
Over the past few years, we have observed different media outlets' attempts to shift public opinion by framing information to support a narrative that facilitate their goals. Malicious users referred to as "pathogenic social media" (PSM) accounts are more likely to amplify this phenomena by spreading misinformation to viral proportions. Understanding the spread of misinformation from account-level perspective is thus a pressing problem. In this work, we aim to present a feature-driven approach to detect PSM accounts in social media. Inspired by the literature, we set out to assess PSMs from three broad perspectives: (1) user-related information (e.g., user activity, profile characteristics), (2) source-related information (i.e., information linked via URLs shared by users) and (3) content-related information (e.g., tweets characteristics). For the user-related information, we investigate malicious signals using causality analysis (i.e., if user is frequently a cause of viral cascades) and profile characteristics (e.g., number of followers, etc.). For the source-related information, we explore various malicious properties linked to URLs (e.g., URL address, content of the associated website, etc.). Finally, for the content-related information, we examine attributes (e.g., number of hashtags, suspicious hashtags, etc.) from tweets posted by users. Experiments on real-world Twitter data from different countries demonstrate the effectiveness of the proposed approach in identifying PSM users.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
Privacy-Aware Recommendation with Private-Attribute Protection using Adversarial Learning
Authors:
Ghazaleh Beigi,
Ahmadreza Mosallanezhad,
Ruocheng Guo,
Hamidreza Alvari,
Alexander Nou,
Huan Liu
Abstract:
Recommendation is one of the critical applications that helps users find information relevant to their interests. However, a malicious attacker can infer users' private information via recommendations. Prior work obfuscates user-item data before sharing it with recommendation system. This approach does not explicitly address the quality of recommendation while performing data obfuscation. Moreover…
▽ More
Recommendation is one of the critical applications that helps users find information relevant to their interests. However, a malicious attacker can infer users' private information via recommendations. Prior work obfuscates user-item data before sharing it with recommendation system. This approach does not explicitly address the quality of recommendation while performing data obfuscation. Moreover, it cannot protect users against private-attribute inference attacks based on recommendations. This work is the first attempt to build a Recommendation with Attribute Protection (RAP) model which simultaneously recommends relevant items and counters private-attribute inference attacks. The key idea of our approach is to formulate this problem as an adversarial learning problem with two main components: the private attribute inference attacker, and the Bayesian personalized recommender. The attacker seeks to infer users' private-attribute information according to their items list and recommendations. The recommender aims to extract users' interests while employing the attacker to regularize the recommendation process. Experiments show that the proposed model both preserves the quality of recommendation service and protects users against private-attribute inference attacks.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
An End-to-End Framework to Identify Pathogenic Social Media Accounts on Twitter
Authors:
Elham Shaabani,
Ashkan Sadeghi-Mobarakeh,
Hamidreza Alvari,
Paulo Shakarian
Abstract:
Pathogenic Social Media (PSM) accounts such as terrorist supporter accounts and fake news writers have the capability of spreading disinformation to viral proportions. Early detection of PSM accounts is crucial as they are likely to be key users to make malicious information "viral". In this paper, we adopt the causal inference framework along with graph-based metrics in order to distinguish PSMs…
▽ More
Pathogenic Social Media (PSM) accounts such as terrorist supporter accounts and fake news writers have the capability of spreading disinformation to viral proportions. Early detection of PSM accounts is crucial as they are likely to be key users to make malicious information "viral". In this paper, we adopt the causal inference framework along with graph-based metrics in order to distinguish PSMs from normal users within a short time of their activities. We propose both supervised and semi-supervised approaches without taking the network information and content into account. Results on a real-world dataset from Twitter accentuates the advantage of our proposed frameworks. We show our approach achieves 0.28 improvement in F1 score over existing approaches with the precision of 0.90 and F1 score of 0.63.
△ Less
Submitted 4 May, 2019;
originally announced May 2019.
-
Understanding Information Flow in Cascades Using Network Motifs
Authors:
Soumajyoti Sarkar,
Hamidreza Alvari,
Paulo Shakarian
Abstract:
A growing set of applications consider the process of network formation by using subgraphs as a tool for generating the network topology. One of the pressing research challenges is thus to be able to use these subgraphs to understand the network topology of information cascades which ultimately paves the way to theorize about how information spreads over time. In this paper, we make the first atte…
▽ More
A growing set of applications consider the process of network formation by using subgraphs as a tool for generating the network topology. One of the pressing research challenges is thus to be able to use these subgraphs to understand the network topology of information cascades which ultimately paves the way to theorize about how information spreads over time. In this paper, we make the first attempt at using network motifs to understand whether or not they can be used as generative elements for the diffusion network organization during different phases of the cascade lifecycle. In doing so, we propose a motif percolation-based algorithm that uses network motifs to measure the extent to which they can represent the temporal cascade network organization. We compare two phases of the cascade lifecycle from the perspective of diffusion-- the phase of steep growth and the phase of inhibition prior to its saturation. Our experiments on a set of cascades from the Weibo platform and with 5-node motifs demonstrate that there are only a few specific motif patterns with triads that are able to characterize the spreading process and hence the network organization during the inhibition region better than during the phase of high growth. In contrast, we do not find compelling results for the phase of steep growth.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
Less is More: Semi-Supervised Causal Inference for Detecting Pathogenic Users in Social Media
Authors:
Hamidreza Alvari,
Elham Shaabani,
Soumajyoti Sarkar,
Ghazaleh Beigi,
Paulo Shakarian
Abstract:
Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as "Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost imp…
▽ More
Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as "Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost importance for social media authorities. The burden usually falls to automatic approaches that can identify these accounts and protect social media reputation. However, lack of sufficient labeled examples for devising and training sophisticated approaches to combat these accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to massive user-generated data. In this paper, we propose a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data. In particular, the proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on a real-world ISIS-related dataset from Twitter suggests promising results of utilizing unlabeled instances for detecting PSMs.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.
-
Leveraging Motifs to Model the Temporal Dynamics of Diffusion Networks
Authors:
Soumajyoti Sarkar,
Hamidreza Alvari,
Paulo Shakarian
Abstract:
Information diffusion mechanisms based on social influence models are mainly studied using likelihood of adoption when active neighbors expose a user to a message. The problem arises primarily from the fact that for the most part, this explicit information of who-exposed-whom among a group of active neighbors in a social network, before a susceptible node is infected is not available. In this pape…
▽ More
Information diffusion mechanisms based on social influence models are mainly studied using likelihood of adoption when active neighbors expose a user to a message. The problem arises primarily from the fact that for the most part, this explicit information of who-exposed-whom among a group of active neighbors in a social network, before a susceptible node is infected is not available. In this paper, we attempt to understand the diffusion process through information cascades by studying the temporal network structure of the cascades. In doing so, we accommodate the effect of exposures from active neighbors of a node through a network pruning technique that leverages network motifs to identify potential infectors responsible for exposures from among those active neighbors. We attempt to evaluate the effectiveness of the components used in modeling cascade dynamics and especially whether the additional effect of the exposure information is useful. Following this model, we develop an inference algorithm namely InferCut, that uses parameters learned from the model and the exposure information to predict the actual parent node of each potentially susceptible user in a given cascade. Empirical evaluation on a real world dataset from Weibo social network demonstrate the significance of incorporating exposure information in recovering the exact parents of the exposed users at the early stages of the diffusion process.
△ Less
Submitted 22 March, 2020; v1 submitted 27 February, 2019;
originally announced February 2019.
-
Hawkes Process for Understanding the Influence of Pathogenic Social Media Accounts
Authors:
Hamidreza Alvari,
Paulo Shakarian
Abstract:
Over the past years, political events and public opinion on the Web have been allegedly manipulated by accounts dedicated to spreading disinformation and performing malicious activities on social media. These accounts hereafter referred to as "Pathogenic Social Media (PSM)" accounts, are often controlled by terrorist supporters, water armies or fake news writers and hence can pose threats to socia…
▽ More
Over the past years, political events and public opinion on the Web have been allegedly manipulated by accounts dedicated to spreading disinformation and performing malicious activities on social media. These accounts hereafter referred to as "Pathogenic Social Media (PSM)" accounts, are often controlled by terrorist supporters, water armies or fake news writers and hence can pose threats to social media and general public. Understanding and analyzing PSMs could help social media firms devise sophisticated and automated techniques that could be deployed to stop them from reaching their audience and consequently reduce their threat. In this paper, we leverage the well-known statistical technique "Hawkes Process" to quantify the influence of PSM accounts on the dissemination of malicious information on social media platforms. Our findings on a real-world ISIS-related dataset from Twitter indicate that PSMs are significantly different from regular users in making a message viral. Specifically, we observed that PSMs do not usually post URLs from mainstream news sources. Instead, their tweets usually receive large impact on audience, if contained URLs from Facebook and alternative news outlets. In contrary, tweets posted by regular users receive nearly equal impression regardless of the posted URLs and their sources. Our findings can further shed light on understanding and detecting PSM accounts.
△ Less
Submitted 5 February, 2019;
originally announced February 2019.
-
Detection of Violent Extremists in Social Media
Authors:
Hamidreza Alvari,
Soumajyoti Sarkar,
Paulo Shakarian
Abstract:
The ease of use of the Internet has enabled violent extremists such as the Islamic State of Iraq and Syria (ISIS) to easily reach large audience, build personal relationships and increase recruitment. Social media are primarily based on the reports they receive from their own users to mitigate the problem. Despite efforts of social media in suspending many accounts, this solution is not guaranteed…
▽ More
The ease of use of the Internet has enabled violent extremists such as the Islamic State of Iraq and Syria (ISIS) to easily reach large audience, build personal relationships and increase recruitment. Social media are primarily based on the reports they receive from their own users to mitigate the problem. Despite efforts of social media in suspending many accounts, this solution is not guaranteed to be effective, because not all extremists are caught this way, or they can simply return with another account or migrate to other social networks. In this paper, we design an automatic detection scheme that using as little as three groups of information related to usernames, profile, and textual content of users, determines whether or not a given username belongs to an extremist user. We first demonstrate that extremists are inclined to adopt usernames that are similar to the ones that their like-minded have adopted in the past. We then propose a detection framework that deploys features which are highly indicative of potential online extremism. Results on a real-world ISIS-related dataset from Twitter demonstrate the effectiveness of the methodology in identifying extremist users.
△ Less
Submitted 5 February, 2019;
originally announced February 2019.
-
Early Identification of Pathogenic Social Media Accounts
Authors:
Hamidreza Alvari,
Elham Shaabani,
Paulo Shakarian
Abstract:
Pathogenic Social Media (PSM) accounts such as terrorist supporters exploit large communities of supporters for conducting attacks on social media. Early detection of these accounts is crucial as they are high likely to be key users in making a harmful message "viral". In this paper, we make the first attempt on utilizing causal inference to identify PSMs within a short time frame around their act…
▽ More
Pathogenic Social Media (PSM) accounts such as terrorist supporters exploit large communities of supporters for conducting attacks on social media. Early detection of these accounts is crucial as they are high likely to be key users in making a harmful message "viral". In this paper, we make the first attempt on utilizing causal inference to identify PSMs within a short time frame around their activity. We propose a time-decay causality metric and incorporate it into a causal community detection-based algorithm. The proposed algorithm is applied to groups of accounts sharing similar causality features and is followed by a classification algorithm to classify accounts as PSM or not. Unlike existing techniques that take significant time to collect information such as network, cascade path, or content, our scheme relies solely on action log of users. Results on a real-world dataset from Twitter demonstrate effectiveness and efficiency of our approach. We achieved precision of 0.84 for detecting PSMs only based on their first 10 days of activity; the misclassified accounts were then detected 10 days later.
△ Less
Submitted 26 September, 2018; v1 submitted 25 September, 2018;
originally announced September 2018.
-
Causal Inference for Early Detection of Pathogenic Social Media Accounts
Authors:
Hamidreza Alvari,
Paulo Shakarian
Abstract:
Pathogenic social media accounts such as terrorist supporters exploit communities of supporters for conducting attacks on social media. Early detection of PSM accounts is crucial as they are likely to be key users in making a harmful message "viral". This paper overviews my recent doctoral work on utilizing causal inference to identify PSM accounts within a short time frame around their activity.…
▽ More
Pathogenic social media accounts such as terrorist supporters exploit communities of supporters for conducting attacks on social media. Early detection of PSM accounts is crucial as they are likely to be key users in making a harmful message "viral". This paper overviews my recent doctoral work on utilizing causal inference to identify PSM accounts within a short time frame around their activity. The proposed scheme (1) assigns time-decay causality scores to users, (2) applies a community detection-based algorithm to group of users sharing similar causality scores and finally (3) deploys a classification algorithm to classify accounts. Unlike existing techniques that require network structure, cascade path, or content, our scheme relies solely on action log of users.
△ Less
Submitted 3 August, 2018; v1 submitted 26 June, 2018;
originally announced June 2018.
-
Strongly Hierarchical Factorization Machines and ANOVA Kernel Regression
Authors:
Ruocheng Guo,
Hamidreza Alvari,
Paulo Shakarian
Abstract:
High-order parametric models that include terms for feature interactions are applied to various data mining tasks, where ground truth depends on interactions of features. However, with sparse data, the high- dimensional parameters for feature interactions often face three issues: expensive computation, difficulty in parameter estimation and lack of structure. Previous work has proposed approaches…
▽ More
High-order parametric models that include terms for feature interactions are applied to various data mining tasks, where ground truth depends on interactions of features. However, with sparse data, the high- dimensional parameters for feature interactions often face three issues: expensive computation, difficulty in parameter estimation and lack of structure. Previous work has proposed approaches which can partially re- solve the three issues. In particular, models with factorized parameters (e.g. Factorization Machines) and sparse learning algorithms (e.g. FTRL-Proximal) can tackle the first two issues but fail to address the third. Regarding to unstructured parameters, constraints or complicated regularization terms are applied such that hierarchical structures can be imposed. However, these methods make the optimization problem more challenging. In this work, we propose Strongly Hierarchical Factorization Machines and ANOVA kernel regression where all the three issues can be addressed without making the optimization problem more difficult. Experimental results show the proposed models significantly outperform the state-of-the-art in two data mining tasks: cold-start user response time prediction and stock volatility prediction.
△ Less
Submitted 5 January, 2018; v1 submitted 25 December, 2017;
originally announced December 2017.
-
Semi-Supervised Learning for Detecting Human Trafficking
Authors:
Hamidreza Alvari,
Paulo Shakarian,
J. E. Kelly Snyder
Abstract:
Human trafficking is one of the most atrocious crimes and among the challenging problems facing law enforcement which demands attention of global magnitude. In this study, we leverage textual data from the website "Backpage"- used for classified advertisement- to discern potential patterns of human trafficking activities which manifest online and identify advertisements of high interest to law enf…
▽ More
Human trafficking is one of the most atrocious crimes and among the challenging problems facing law enforcement which demands attention of global magnitude. In this study, we leverage textual data from the website "Backpage"- used for classified advertisement- to discern potential patterns of human trafficking activities which manifest online and identify advertisements of high interest to law enforcement. Due to the lack of ground truth, we rely on a human analyst from law enforcement, for hand-labeling a small portion of the crawled data. We extend the existing Laplacian SVM and present S3VM-R, by adding a regularization term to exploit exogenous information embedded in our feature space in favor of the task at hand. We train the proposed method using labeled and unlabeled data and evaluate it on a fraction of the unlabeled data, herein referred to as unseen data, with our expert's further verification. Results from comparisons between our method and other semi-supervised and supervised approaches on the labeled data demonstrate that our learner is effective in identifying advertisements of high interest to law enforcement
△ Less
Submitted 30 May, 2017;
originally announced May 2017.
-
Exploiting Consistency Theory for Modeling Twitter Hashtag Adoption
Authors:
Hamidreza Alvari
Abstract:
Twitter, a microblogging service, has evolved into a powerful communication platform with millions of active users who generate immense volume of microposts on a daily basis. To facilitate effective categorization and easy search, users adopt hashtags, keywords or phrases preceded by hash (#) character. Successful prediction of the spread and propagation of information in the form of trending topi…
▽ More
Twitter, a microblogging service, has evolved into a powerful communication platform with millions of active users who generate immense volume of microposts on a daily basis. To facilitate effective categorization and easy search, users adopt hashtags, keywords or phrases preceded by hash (#) character. Successful prediction of the spread and propagation of information in the form of trending topics or hashtags in Twitter, could help real time identification of new trends and thus improve marketing efforts. Social theories such as consistency theory suggest that people prefer harmony or consistency in their thoughts. In Twitter, for example, users are more likely to adopt the same trending hashtag multiple times before it eventually dies. In this paper, we propose a low-rank weighted matrix factorization approach to model trending hashtag adoption in Twitter based on consistency theory. In particular, we first cast the problem of modeling trending hashtag adoption into an optimization problem, then integrate consistency theory into it as a regularization term and finally leverage widely used matrix factorization to solve the optimization. Empirical experiments demonstrate that our method outperforms other baselines in predicting whether a specific trending hashtag will be used by users in future.
△ Less
Submitted 30 May, 2017;
originally announced May 2017.
-
Twitter Hashtag Recommendation using Matrix Factorization
Authors:
Hamidreza Alvari
Abstract:
Twitter, one of the biggest and most popular microblogging Websites, has evolved into a powerful communication platform which allows millions of active users to generate huge volume of microposts and queries on a daily basis. To accommodate effective categorization and easy search, users are allowed to make use of hashtags, keywords or phrases prefixed by hash character, to categorize and summariz…
▽ More
Twitter, one of the biggest and most popular microblogging Websites, has evolved into a powerful communication platform which allows millions of active users to generate huge volume of microposts and queries on a daily basis. To accommodate effective categorization and easy search, users are allowed to make use of hashtags, keywords or phrases prefixed by hash character, to categorize and summarize their posts. However, valid hashtags are not restricted and thus are created in a free and heterogeneous style, increasing difficulty of the task of tweet categorization. In this paper, we propose a low-rank weighted matrix factorization based method to recommend hashtags to the users solely based on their hashtag usage history and independent from their tweets' contents. We confirm using two-sample t-test that users are more likely to adopt new hashtags similar to the ones they have previously adopted. In particular, we formulate the problem of hashtag recommendation into an optimization problem and incorporate hashtag correlation weight matrix into it to account for the similarity between different hashtags. We finally leverage widely used matrix factorization from recommender systems to solve the optimization problem by capturing the latent factors of users and hashtags. Empirical experiments demonstrate that our method is capable to properly recommend hashtags.
△ Less
Submitted 30 May, 2017;
originally announced May 2017.
-
Identifying Community Structures in Dynamic Networks
Authors:
Hamidreza Alvari,
Alireza Hajibagheri,
Gita Sukthankar,
Kiran Lakkaraju
Abstract:
Most real-world social networks are inherently dynamic, composed of communities that are constantly changing in membership. To track these evolving communities, we need dynamic community detection techniques. This article evaluates the performance of a set of game theoretic approaches for identifying communities in dynamic networks. Our method, D-GT (Dynamic Game Theoretic community detection), mo…
▽ More
Most real-world social networks are inherently dynamic, composed of communities that are constantly changing in membership. To track these evolving communities, we need dynamic community detection techniques. This article evaluates the performance of a set of game theoretic approaches for identifying communities in dynamic networks. Our method, D-GT (Dynamic Game Theoretic community detection), models each network node as a rational agent who periodically plays a community membership game with its neighbors. During game play, nodes seek to maximize their local utility by joining or leaving the communities of network neighbors. The community structure emerges after the game reaches a Nash equilibrium. Compared to the benchmark community detection methods, D-GT more accurately predicts the number of communities and finds community assignments with a higher normalized mutual information, while retaining a good modularity.
△ Less
Submitted 11 September, 2016; v1 submitted 8 September, 2016;
originally announced September 2016.
-
A Non-Parametric Learning Approach to Identify Online Human Trafficking
Authors:
Hamidreza Alvari,
Paulo Shakarian,
J. E. Kelly Snyder
Abstract:
Human trafficking is among the most challenging law enforcement problems which demands persistent fight against from all over the globe. In this study, we leverage readily available data from the website "Backpage"-- used for classified advertisement-- to discern potential patterns of human trafficking activities which manifest online and identify most likely trafficking related advertisements. Du…
▽ More
Human trafficking is among the most challenging law enforcement problems which demands persistent fight against from all over the globe. In this study, we leverage readily available data from the website "Backpage"-- used for classified advertisement-- to discern potential patterns of human trafficking activities which manifest online and identify most likely trafficking related advertisements. Due to the lack of ground truth, we rely on two human analysts --one human trafficking victim survivor and one from law enforcement, for hand-labeling the small portion of the crawled data. We then present a semi-supervised learning approach that is trained on the available labeled and unlabeled data and evaluated on unseen data with further verification of experts.
△ Less
Submitted 1 August, 2016; v1 submitted 29 July, 2016;
originally announced July 2016.
-
MIST: Missing Person Intelligence Synthesis Toolkit
Authors:
Elham Shaabani,
Hamidreza Alvari,
Paulo Shakarian,
J. E. Kelly Snyder
Abstract:
Each day, approximately 500 missing persons cases occur that go unsolved/unresolved in the United States. The non-profit organization known as the Find Me Group (FMG), led by former law enforcement professionals, is dedicated to solving or resolving these cases. This paper introduces the Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a data-driven variant of geospatial abduct…
▽ More
Each day, approximately 500 missing persons cases occur that go unsolved/unresolved in the United States. The non-profit organization known as the Find Me Group (FMG), led by former law enforcement professionals, is dedicated to solving or resolving these cases. This paper introduces the Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a data-driven variant of geospatial abductive inference. This system takes search locations provided by a group of experts and rank-orders them based on the probability assigned to areas based on the prior performance of the experts taken as a group. We evaluate our approach compared to the current practices employed by the Find Me Group and found it significantly reduces the search area - leading to a reduction of 31 square miles over 24 cases we examined in our experiments. Currently, we are using MIST to aid the Find Me Group in an active missing person case.
△ Less
Submitted 29 August, 2016; v1 submitted 28 July, 2016;
originally announced July 2016.