Search | arXiv e-print repository

doi 10.1145/3201064.3201103

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

Authors: Mohammadreza Rezvan, Saeedeh Shekarpour, Lakshika Balasuriya, Krishnaprasad Thirunarayan, Valerie Shalin, Amit Sheth

Abstract: Having a quality annotated corpus is essential especially for applied research. Despite the recent focus of Web science community on researching about cyberbullying, the community dose not still have standard benchmarks. In this paper, we publish first, a quality annotated corpus and second, an offensive words lexicon capturing different types type of harassment as (i) sexual harassment, (ii) raci… ▽ More Having a quality annotated corpus is essential especially for applied research. Despite the recent focus of Web science community on researching about cyberbullying, the community dose not still have standard benchmarks. In this paper, we publish first, a quality annotated corpus and second, an offensive words lexicon capturing different types type of harassment as (i) sexual harassment, (ii) racial harassment, (iii) appearance-related harassment, (iv) intellectual harassment, and (v) political harassment.We crawled data from Twitter using our offensive lexicon. Then relied on the human judge to annotate the collected tweets w.r.t. the contextual types because using offensive words is not sufficient to reliably detect harassment. Our corpus consists of 25,000 annotated tweets in five contextual types. We are pleased to share this novel annotated corpus and the lexicon with the research community. The instruction to acquire the corpus has been published on the Git repository. △ Less

Submitted 23 May, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

arXiv:1802.06305 [pdf, other]

doi 10.1016/j.dcan.2017.10.002

Machine learning for Internet of Things data analysis: A survey

Authors: Mohammad Saeid Mahdavinejad, Mohammadreza Rezvan, Mohammadamin Barekatain, Peyman Adibi, Payam Barnaghi, Amit P. Sheth

Abstract: Rapid developments in hardware, software, and communication technologies have allowed the emergence of Internet-connected sensory devices that provide observation and data measurement from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As the numbers grow and technologies become more mature, the volume… ▽ More Rapid developments in hardware, software, and communication technologies have allowed the emergence of Internet-connected sensory devices that provide observation and data measurement from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As the numbers grow and technologies become more mature, the volume of data published will increase. Internet-connected devices technology, referred to as Internet of Things (IoT), continues to extend the current Internet by providing connectivity and interaction between the physical and cyber worlds. In addition to increased volume, the IoT generates Big Data characterized by velocity in terms of time and location dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of this Big Data is the key to develo** smart IoT applications. This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case. The key contribution of this study is presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying Support Vector Machine (SVM) on Aarhus Smart City traffic data is presented for a more detailed exploration. △ Less

Submitted 17 February, 2018; originally announced February 2018.

Comments: Digital Communications and Networks (2017)

arXiv:1801.00356 [pdf]

How will the Internet of Things enable Augmented Personalized Health?

Authors: Amit Sheth, Utkarshani Jaimini, Hong Yung Yip

Abstract: Internet-of-Things (IoT) is profoundly redefining the way we create, consume, and share information. Health aficionados and citizens are increasingly using IoT technologies to track their sleep, food intake, activity, vital body signals, and other physiological observations. This is complemented by IoT systems that continuously collect health-related data from the environment and inside the living… ▽ More Internet-of-Things (IoT) is profoundly redefining the way we create, consume, and share information. Health aficionados and citizens are increasingly using IoT technologies to track their sleep, food intake, activity, vital body signals, and other physiological observations. This is complemented by IoT systems that continuously collect health-related data from the environment and inside the living quarters. Together, these have created an opportunity for a new generation of healthcare solutions. However, interpreting data to understand an individual's health is challenging. It is usually necessary to look at that individual's clinical record and behavioral information, as well as social and environmental information affecting that individual. Interpreting how well a patient is doing also requires looking at his adherence to respective health objectives, application of relevant clinical knowledge and the desired outcomes. We resort to the vision of Augmented Personalized Healthcare (APH) to exploit the extensive variety of relevant data and medical knowledge using Artificial Intelligence (AI) techniques to extend and enhance human health to presents various stages of augmented health management strategies: self-monitoring, self-appraisal, self-management, intervention, and disease progress tracking and prediction. kHealth technology, a specific incarnation of APH, and its application to Asthma and other diseases are used to provide illustrations and discuss alternatives for technology-assisted health management. Several prominent efforts involving IoT and patient-generated health data (PGHD) with respect converting multimodal data into actionable information (big data to smart data) are also identified. Roles of three components in an evidence-based semantic perception approach- Contextualization, Abstraction, and Personalization are discussed. △ Less

Submitted 31 December, 2017; originally announced January 2018.

arXiv:1710.05429 [pdf, other]

Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media

Authors: Amir Hossein Yazdavar, Hussein S. Al-Olimat, Monireh Ebrahimi, Goonmeet Bajaj, Tanvi Banerjee, Krishnaprasad Thirunarayan, Jyotishman Pathak, Amit Sheth

Abstract: With the rise of social media, millions of people are routinely expressing their moods, feelings, and daily struggles with mental health issues on social media platforms like Twitter. Unlike traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of clinical depression from tweets obtained unobtrusively. Based on the an… ▽ More With the rise of social media, millions of people are routinely expressing their moods, feelings, and daily struggles with mental health issues on social media platforms like Twitter. Unlike traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of clinical depression from tweets obtained unobtrusively. Based on the analysis of tweets crawled from users with self-reported depressive symptoms in their Twitter profiles, we demonstrate the potential for detecting clinical depression symptoms which emulate the PHQ-9 questionnaire clinicians use today. Our study uses a semi-supervised statistical model to evaluate how the duration of these symptoms and their expression on Twitter (in terms of word usage patterns and topical preferences) align with the medical findings reported via the PHQ-9. Our proactive and automatic screening tool is able to identify clinical depressive symptoms with an accuracy of 68% and precision of 72%. △ Less

Submitted 15 October, 2017; originally announced October 2017.

Comments: 8 pages, Advances in Social Networks Analysis and Mining (ASONAM), 2017 IEEE/ACM International Conference

arXiv:1710.02514 [pdf]

On the Challenges of Sentiment Analysis for Dynamic Events

Authors: Monireh Ebrahimi, Amir Hossein Yazdavar, Amit Sheth

Abstract: With the proliferation of social media over the last decade, determining people's attitude with respect to a specific topic, document, interaction or events has fueled research interest in natural language processing and introduced a new channel called sentiment and emotion analysis. For instance, businesses routinely look to develop systems to automatically understand their customer conversations… ▽ More With the proliferation of social media over the last decade, determining people's attitude with respect to a specific topic, document, interaction or events has fueled research interest in natural language processing and introduced a new channel called sentiment and emotion analysis. For instance, businesses routinely look to develop systems to automatically understand their customer conversations by identifying the relevant content to enhance marketing their products and managing their reputations. Previous efforts to assess people's sentiment on Twitter have suggested that Twitter may be a valuable resource for studying political sentiment and that it reflects the offline political landscape. According to a Pew Research Center report, in January 2016 44 percent of US adults stated having learned about the presidential election through social media. Furthermore, 24 percent reported use of social media posts of the two candidates as a source of news and information, which is more than the 15 percent who have used both candidates' websites or emails combined. The first presidential debate between Trump and Hillary was the most tweeted debate ever with 17.1 million tweets. △ Less

Submitted 6 October, 2017; originally announced October 2017.

Comments: 9 pages, 2 figures ,IEEE Intelligent Systems 2017

arXiv:1708.03105 [pdf, other]

Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models

Authors: Hussein S. Al-Olimat, Krishnaprasad Thirunarayan, Valerie Shalin, Amit Sheth

Abstract: Extracting location names from informal and unstructured social media data requires the identification of referent boundaries and partitioning compound names. Variability, particularly systematic variability in location names (Carroll, 1983), challenges the identification task. Some of this variability can be anticipated as operations within a statistical language model, in this case drawn from ga… ▽ More Extracting location names from informal and unstructured social media data requires the identification of referent boundaries and partitioning compound names. Variability, particularly systematic variability in location names (Carroll, 1983), challenges the identification task. Some of this variability can be anticipated as operations within a statistical language model, in this case drawn from gazetteers such as OpenStreetMap (OSM), Geonames, and DBpedia. This permits evaluation of an observed n-gram in Twitter targeted text as a legitimate location name variant from the same location-context. Using n-gram statistics and location-related dictionaries, our Location Name Extraction tool (LNEx) handles abbreviations and automatically filters and augments the location names in gazetteers (handling name contractions and auxiliary contents) to help detect the boundaries of multi-word location names and thereby delimit them in texts. We evaluated our approach on 4,500 event-specific tweets from three targeted streams to compare the performance of LNEx against that of ten state-of-the-art taggers that rely on standard semantic, syntactic and/or orthographic features. LNEx improved the average F-Score by 33-179%, outperforming all taggers. Further, LNEx is capable of stream processing. △ Less

Submitted 26 April, 2020; v1 submitted 10 August, 2017; originally announced August 2017.

Comments: https://www.aclweb.org/anthology/C18-1169.pdf

MSC Class: 68T50 ACM Class: I.2.7

Journal ref: In The 27th International Conference on Computational Linguistics (COLING 2018)

arXiv:1707.08470 [pdf, other]

doi 10.1007/978-3-319-34129-3_8

Implicit Entity Linking in Tweets

Authors: Sujan Perera, Pablo N. Mendes, Adarsh Alex, Amit Sheth, Krishnaprasad Thirunarayan

Abstract: Over the years, Twitter has become one of the largest communication platforms providing key data to various applications such as brand monitoring, trend detection, among others. Entity linking is one of the major tasks in natural language understanding from tweets and it associates entity mentions in text to corresponding entries in knowledge bases in order to provide unambiguous interpretation an… ▽ More Over the years, Twitter has become one of the largest communication platforms providing key data to various applications such as brand monitoring, trend detection, among others. Entity linking is one of the major tasks in natural language understanding from tweets and it associates entity mentions in text to corresponding entries in knowledge bases in order to provide unambiguous interpretation and additional con- text. State-of-the-art techniques have focused on linking explicitly mentioned entities in tweets with reasonable success. However, we argue that in addition to explicit mentions i.e. The movie Gravity was more ex- pensive than the mars orbiter mission entities (movie Gravity) can also be mentioned implicitly i.e. This new space movie is crazy. you must watch it!. This paper introduces the problem of implicit entity linking in tweets. We propose an approach that models the entities by exploiting their factual and contextual knowledge. We demonstrate how to use these models to perform implicit entity linking on a ground truth dataset with 397 tweets from two domains, namely, Movie and Book. Specifically, we show: 1) the importance of linking implicit entities and its value addition to the standard entity linking task, and 2) the importance of exploiting contextual knowledge associated with an entity for linking their implicit mentions. We also make the ground truth dataset publicly available to foster the research in this new research area. △ Less

Submitted 26 July, 2017; originally announced July 2017.

Comments: This paper was accepted at the Extended Semantic Web Conference 2016 as a full research track paper

arXiv:1707.05308 [pdf, other]

doi 10.1145/3106426.3109448

Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

Authors: Amit Sheth, Sujan Perera, Sanjaya Wijeratne, Krishnaprasad Thirunarayan

Abstract: Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowled… ▽ More Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques. △ Less

Submitted 14 July, 2017; originally announced July 2017.

Comments: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.07708

arXiv:1707.04653 [pdf, other]

doi 10.1145/3106426.3106490

A Semantics-Based Measure of Emoji Similarity

Authors: Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran

Abstract: Emoji have grown to become one of the most important forms of communication on the web. With its widespread use, measuring the similarity of emoji has become an important problem for contemporary text processing since it lies at the heart of sentiment analysis, search, and interface design tasks. This paper presents a comprehensive analysis of the semantic similarity of emoji through embedding mod… ▽ More Emoji have grown to become one of the most important forms of communication on the web. With its widespread use, measuring the similarity of emoji has become an important problem for contemporary text processing since it lies at the heart of sentiment analysis, search, and interface design tasks. This paper presents a comprehensive analysis of the semantic similarity of emoji through embedding models that are learned over machine-readable emoji meanings in the EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, we develop and test multiple embedding models to measure emoji similarity. To evaluate our work, we create a new dataset called EmoSim508, which assigns human-annotated semantic similarity scores to a set of 508 carefully selected emoji pairs. After validation with EmoSim508, we present a real-world use-case of our emoji embedding models using a sentiment analysis task and show that our models outperform the previous best-performing emoji embedding model on this task. The EmoSim508 dataset and our emoji embedding models are publicly released with this paper and can be downloaded from http://emo**et.knoesis.org/. △ Less

Submitted 14 July, 2017; originally announced July 2017.

Comments: This paper is accepted at Web Intelligence 2017 as a full paper, In 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). Leipzig, Germany: ACM, 2017

arXiv:1707.04652 [pdf, other]

EmojiNet: An Open Service and API for Emoji Sense Discovery

Authors: Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran

Abstract: This paper presents the release of EmojiNet, the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. EmojiNet is a dataset consisting of: (i) 12,904 sense labels over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet, (ii) context words associated wit… ▽ More This paper presents the release of EmojiNet, the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. EmojiNet is a dataset consisting of: (i) 12,904 sense labels over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet, (ii) context words associated with each emoji sense, which are inferred through word embedding models trained over Google News corpus and a Twitter message corpus for each emoji sense definition, and (iii) recognizing discrepancies in the presentation of emoji on different platforms, specification of the most likely platform-based emoji sense for a selected set of emoji. The dataset is hosted as an open service with a REST API and is available at http://emo**et.knoesis.org/. The development of this dataset, evaluation of its quality, and its applications including emoji sense disambiguation and emoji sense similarity are discussed. △ Less

Submitted 14 July, 2017; originally announced July 2017.

Comments: This paper was published at ICWSM 2017 as a full paper, Proc. of the 11th International AAAI Conference on Web and Social Media (ICWSM 2017). Montreal, Canada. 2017

arXiv:1701.07490 [pdf]

What Are People Tweeting about Zika? An Exploratory Study Concerning Symptoms, Treatment, Transmission, and Prevention

Authors: Michele Miller, Dr. Tanvi Banerjee, RoopTeja Muppalla, Dr. William Romine, Dr. Amit Sheth

Abstract: The purpose of this study was to do a dataset distribution analysis, a classification performance analysis, and a topical analysis concerning what people are tweeting about four disease characteristics: symptoms, transmission, prevention, and treatment. A combination of natural language processing and machine learning techniques were used to determine what people are tweeting about Zika. Specifica… ▽ More The purpose of this study was to do a dataset distribution analysis, a classification performance analysis, and a topical analysis concerning what people are tweeting about four disease characteristics: symptoms, transmission, prevention, and treatment. A combination of natural language processing and machine learning techniques were used to determine what people are tweeting about Zika. Specifically, a two-stage classifier system was built to find relevant tweets on Zika, and then categorize these into the four disease categories. Tweets in each disease category were then examined using latent dirichlet allocation (LDA) to determine the five main tweet topics for each disease characteristic. Results 1,234,605 tweets were collected. Tweets by males and females were similar (28% and 23% respectively). The classifier performed well on the training and test data for relevancy (F=0.87 and 0.99 respectively) and disease characteristics (F=0.79 and 0.90 respectively). Five topics for each category were found and discussed with a focus on the symptoms category. Through this process, we demonstrate how misinformation can be discovered so that public health officials can respond to the tweets with misinformation. △ Less

Submitted 17 January, 2017; originally announced January 2017.

arXiv:1701.05724 [pdf, other]

Logical Inferences with Contexts of RDF Triples

Authors: Vinh Nguyen, Amit Sheth

Abstract: Logical inference, an integral feature of the Semantic Web, is the process of deriving new triples by applying entailment rules on knowledge bases. The entailment rules are determined by the model-theoretic semantics. Incorporating context of an RDF triple (e.g., provenance, time, and location) into the inferencing process requires the formal semantics to be capable of describing the context of RD… ▽ More Logical inference, an integral feature of the Semantic Web, is the process of deriving new triples by applying entailment rules on knowledge bases. The entailment rules are determined by the model-theoretic semantics. Incorporating context of an RDF triple (e.g., provenance, time, and location) into the inferencing process requires the formal semantics to be capable of describing the context of RDF triples also in the form of triples, or in other words, RDF contextual triples about triples. The formal semantics should also provide the rules that could entail new contextual triples about triples. In this paper, we propose the first inferencing mechanism that allows context of RDF triples, represented in the form of RDF triples about triples, to be the first-class citizens in the model-theoretic semantics and in the logical rules. Our inference mechanism is well-formalized with all new concepts being captured in the model-theoretic semantics. This formal semantics also allows us to derive a new set of entailment rules that could entail new contextual triples about triples. To demonstrate the feasibility and the scalability of the proposed mechanism, we implement a new tool in which we transform the existing knowledge bases to our representation of RDF triples about triples and provide the option for this tool to compute the inferred triples for the proposed rules. We evaluate the computation of the proposed rules on a large scale using various real-world knowledge bases such as Bio2RDF NCBI Genes and DBpedia. The results show that the computation of the inferred triples can be highly scalable. On average, one billion inferred triples adds 5-6 minutes to the overall transformation process. NCBI Genes, with 20 billion triples in total, took only 232 minutes for the transformation of 12 billion triples and added 42 minutes for inferring 8 billion triples to the overall process. △ Less

Submitted 20 January, 2017; originally announced January 2017.

arXiv:1701.05625 [pdf, other]

CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation

Authors: Saeedeh Shekarpour, Faisal Alshargi, Valerie Shalin, Krishnaprasad Thirunarayan, Amit P. Sheth

Abstract: While the general analysis of named entities has received substantial research attention on unstructured as well as structured data, the analysis of relations among named entities has received limited focus. In fact, a review of the literature revealed a deficiency in research on the abstract conceptualization required to organize relations. We believe that such an abstract conceptualization can b… ▽ More While the general analysis of named entities has received substantial research attention on unstructured as well as structured data, the analysis of relations among named entities has received limited focus. In fact, a review of the literature revealed a deficiency in research on the abstract conceptualization required to organize relations. We believe that such an abstract conceptualization can benefit various communities and applications such as natural language processing, information extraction, machine learning, and ontology engineering. In this paper, we present Comprehensive EVent Ontology (CEVO), built on Levin's conceptual hierarchy of English verbs that categorizes verbs with shared meaning, and syntactic behavior. We present the fundamental concepts and requirements for this ontology. Furthermore, we present three use cases employing the CEVO ontology on annotation tasks: (i) annotating relations in plain text, (ii) annotating ontological properties, and (iii) linking textual relations to ontological properties. These use-cases demonstrate the benefits of using CEVO for annotation: (i) annotating English verbs from an abstract conceptualization, (ii) playing the role of an upper ontology for organizing ontological properties, and (iii) facilitating the annotation of text relations using any underlying vocabulary. This resource is available at https://shekarpour.github.io/cevo.io/ using https://w3id.org/cevo namespace. △ Less

Submitted 3 October, 2018; v1 submitted 19 January, 2017; originally announced January 2017.

arXiv:1610.09516 [pdf, other]

Finding Street Gang Members on Twitter

Authors: Lakshika Balasuriya, Sanjaya Wijeratne, Derek Doran, Amit Sheth

Abstract: Most street gang members use Twitter to intimidate others, to present outrageous images and statements to the world, and to share recent illegal activities. Their tweets may thus be useful to law enforcement agencies to discover clues about recent crimes or to anticipate ones that may occur. Finding these posts, however, requires a method to discover gang member Twitter profiles. This is a challen… ▽ More Most street gang members use Twitter to intimidate others, to present outrageous images and statements to the world, and to share recent illegal activities. Their tweets may thus be useful to law enforcement agencies to discover clues about recent crimes or to anticipate ones that may occur. Finding these posts, however, requires a method to discover gang member Twitter profiles. This is a challenging task since gang members represent a very small population of the 320 million Twitter users. This paper studies the problem of automatically finding gang members on Twitter. It outlines a process to curate one of the largest sets of verifiable gang member profiles that have ever been studied. A review of these profiles establishes differences in the language, images, YouTube links, and emojis gang members use compared to the rest of the Twitter population. Features from this review are used to train a series of supervised classifiers. Our classifier achieves a promising F1 score with a low false positive rate. △ Less

Submitted 29 October, 2016; originally announced October 2016.

Comments: 8 pages, 9 figures, 2 tables, Published as a full paper at 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016)

Journal ref: The 2016 IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining. vol. 8, pp. 685-692. San Francisco, CA, USA (2016)

arXiv:1610.08597 [pdf, other]

Word Embeddings to Enhance Twitter Gang Member Profile Identification

Authors: Sanjaya Wijeratne, Lakshika Balasuriya, Derek Doran, Amit Sheth

Abstract: Gang affiliates have joined the masses who use social media to share thoughts and actions publicly. Interestingly, they use this public medium to express recent illegal actions, to intimidate others, and to share outrageous images and statements. Agencies able to unearth these profiles may thus be able to anticipate, stop, or hasten the investigation of gang-related crimes. This paper investigates… ▽ More Gang affiliates have joined the masses who use social media to share thoughts and actions publicly. Interestingly, they use this public medium to express recent illegal actions, to intimidate others, and to share outrageous images and statements. Agencies able to unearth these profiles may thus be able to anticipate, stop, or hasten the investigation of gang-related crimes. This paper investigates the use of word embeddings to help identify gang members on Twitter. Building on our previous work, we generate word embeddings that translate what Twitter users post in their profile descriptions, tweets, profile images, and linked YouTube content to a real vector format amenable for machine learning classification. Our experimental results show that pre-trained word embeddings can boost the accuracy of supervised learning algorithms trained over gang members social media posts. △ Less

Submitted 26 October, 2016; originally announced October 2016.

Comments: 7 pages, 1 figure, 2 tables, Published at IJCAI Workshop on Semantic Machine Learning (SML 2016)

Journal ref: IJCAI Workshop on Semantic Machine Learning (SML 2016). pp. 18-24. CEUR-WS, New York City, NY (07 2016)

arXiv:1610.07710 [pdf, other]

EmojiNet: Building a Machine Readable Sense Inventory for Emoji

Authors: Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran

Abstract: Emoji are a contemporary and extremely popular way to enhance electronic communication. Without rigid semantics attached to them, emoji symbols take on different meanings based on the context of a message. Thus, like the word sense disambiguation task in natural language processing, machines also need to disambiguate the meaning or sense of an emoji. In a first step toward achieving this goal, thi… ▽ More Emoji are a contemporary and extremely popular way to enhance electronic communication. Without rigid semantics attached to them, emoji symbols take on different meanings based on the context of a message. Thus, like the word sense disambiguation task in natural language processing, machines also need to disambiguate the meaning or sense of an emoji. In a first step toward achieving this goal, this paper presents EmojiNet, the first machine readable sense inventory for emoji. EmojiNet is a resource enabling systems to link emoji with their context-specific meaning. It is automatically constructed by integrating multiple emoji resources with BabelNet, which is the most comprehensive multilingual sense inventory available to date. The paper discusses its construction, evaluates the automatic resource creation process, and presents a use case where EmojiNet disambiguates emoji usage in tweets. EmojiNet is available online for use at http://emo**et.knoesis.org. △ Less

Submitted 24 October, 2016; originally announced October 2016.

Comments: 15 pages, 4 figures, 3 tables, Accepted to publish at the 8th International Conference on Social Informatics (SocInfo 2016) as a full research track paper

ACM Class: I.2.7

arXiv:1610.07708

Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

Authors: Amit Sheth, Sujan Perera, Sanjaya Wijeratne

Abstract: Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to unsupervised learning from a massive amount of data, albeit much of it relates to one modality/type of data at a time. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition of utilizing knowledge whenever it is available or can be cre… ▽ More Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to unsupervised learning from a massive amount of data, albeit much of it relates to one modality/type of data at a time. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition of utilizing knowledge whenever it is available or can be created purposefully. In this paper, we focus on discussing the indispensable role of knowledge for deeper understanding of complex text and multimodal data in situations where (i) large amounts of training data (labeled/unlabeled) are not available or labor intensive to create, (ii) the objects (particularly text) to be recognized are complex (i.e., beyond simple entity-person/location/organization names), such as implicit entities and highly subjective content, and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create knowledge, varying from comprehensive or cross domain to domain or application specific, and (b) carefully exploit the knowledge to further empower or extend the applications of ML/NLP techniques. Using the early results in several diverse situations - both in data types and applications - we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data. △ Less

Submitted 22 January, 2019; v1 submitted 24 October, 2016; originally announced October 2016.

Comments: There is a new version of this paper with new authors uploaded as arXiv:1707.05308, so this is an invalid entry

ACM Class: I.2

arXiv:1609.09014 [pdf, other]

SWoTSuite: A Development Framework for Prototy** Cross-domain Semantic Web of Things Applications

Authors: Pankesh Patel, Amelie Gyrard, Dhavalkumar Thakker, Amit Sheth, Martin Serrano

Abstract: Semantic Web of Things (SWoT) applications focus on providing a wide-scale interoperability that allows the sharing of IoT devices across domains and the reusing of available knowledge on the web. However, the application development is difficult because developers have to do various tasks such as designing an application, annotating IoT data, interpreting data, and combining application domains.… ▽ More Semantic Web of Things (SWoT) applications focus on providing a wide-scale interoperability that allows the sharing of IoT devices across domains and the reusing of available knowledge on the web. However, the application development is difficult because developers have to do various tasks such as designing an application, annotating IoT data, interpreting data, and combining application domains. To address the above challenges, this paper demonstrates SWoTSuite, a toolkit for prototy** SWoT applications. It hides the use of semantic web technologies as much as possible to avoid the burden of designing SWoT applications that involves designing ontologies, annotating sensor data, and using reasoning mechanisms to enrich data. Taking inspiration from sharing and reuse approaches, SWoTSuite reuses data and vocabularies. It leverages existing technologies to build applications. We take a hello world naturopathy application as an example and demonstrate an application development process using SWoTSuite. The demo video is available at URL: http://tinyurl.com/zs9flrt. △ Less

Submitted 28 September, 2016; originally announced September 2016.

Comments: 8 pages

arXiv:1606.07988 [pdf, other]

Building the Web of Knowledge with Smart IoT Applications (Extended Version)

Authors: Amelie Gyrard, Pankesh Patel, Amit Sheth, Martin Serrano

Abstract: The Internet of Things (IoT) is experiencing fast adoption in the society, from industrial to home applications. The number of deployed sensors and connected devices to the Internet is changing our perspective and the way we understand the world. The development and generation of IoT applications is just starting and they will modify our physical and virtual lives, from how we control remotely app… ▽ More The Internet of Things (IoT) is experiencing fast adoption in the society, from industrial to home applications. The number of deployed sensors and connected devices to the Internet is changing our perspective and the way we understand the world. The development and generation of IoT applications is just starting and they will modify our physical and virtual lives, from how we control remotely appliances at home to how we deal with insurance companies in order to start insurance schemes via smart cards. This massive deployment of IoT devices represents a tremendous economic impact and at the same time offers multiple opportunities. However, the potential of IoT is underexploited and day by day this gap between devices and useful applications is getting bigger. Additionally, the physical and cyber worlds are largely disconnected, requiring a lot of manual efforts to integrate, find, and use information in a meaningful way. To build a connection between the physical and the virtual, we need a knowledge framework that allow bilateral understandings, devices producing data, information systems managing the data and applications transforming information into meaningful knowledge. The first column in this series in the previous issue of this magazine titled "Internet of Things to Smart IoT Through Semantic, Cognitive, and Perceptual Computing," reviews IoT growth and potential that have energized research and technology development, centered on aspects of Artificial Intelligence to build future intelligent system. This column steps back and demonstrates the benefits of using semantic web technologies to get meaningful knowledge from sensor data to design smart systems. △ Less

Submitted 25 June, 2016; originally announced June 2016.

Comments: 7 pages, 3 figure

arXiv:1606.00480 [pdf, other]

A Formal Graph Model for RDF and Its Implementation

Authors: Vinh Nguyen, Jyoti Leeka, Olivier Bodenreider, Amit Sheth

Abstract: Formalizing an RDF abstract graph model to be compatible with the RDF formal semantics has remained one of the foundational problems in the Semantic Web. In this paper, we propose a new formal graph model for RDF datasets. This model allows us to express the current model-theoretic semantics in the form of a graph. We also propose the concepts of resource path and triple path as well as an algorit… ▽ More Formalizing an RDF abstract graph model to be compatible with the RDF formal semantics has remained one of the foundational problems in the Semantic Web. In this paper, we propose a new formal graph model for RDF datasets. This model allows us to express the current model-theoretic semantics in the form of a graph. We also propose the concepts of resource path and triple path as well as an algorithm for traversing the new graph. We demonstrate the feasibility of this graph model through two implementations: one is a new graph engine called GraphKE, and the other is extended from RDF-3X to show that existing systems can also benefit from this model. In order to evaluate the empirical aspect of our graph model, we choose the shortest path algorithm and implement it in the GraphKE and the RDF-3X. Our experiments on both engines for finding the shortest paths in the YAGO2S-SP dataset give decent performance in terms of execution time. The empirical results show that our graph model with well-defined semantics can be effectively implemented. △ Less

Submitted 1 June, 2016; originally announced June 2016.

arXiv:1510.05963 [pdf]

Semantic, Cognitive, and Perceptual Computing: Advances toward Computing for Human Experience

Authors: Amit Sheth, Pramod Anantharam, Cory Henson

Abstract: The World Wide Web continues to evolve and serve as the infrastructure for carrying massive amounts of multimodal and multisensory observations. These observations capture various situations pertinent to people's needs and interests along with all their idiosyncrasies. To support human-centered computing that empower people in making better and timely decisions, we look towards computation that is… ▽ More The World Wide Web continues to evolve and serve as the infrastructure for carrying massive amounts of multimodal and multisensory observations. These observations capture various situations pertinent to people's needs and interests along with all their idiosyncrasies. To support human-centered computing that empower people in making better and timely decisions, we look towards computation that is inspired by human perception and cognition. Toward this goal, we discuss computing paradigms of semantic computing, cognitive computing, and an emerging aspect of computing, which we call perceptual computing. In our view, these offer a continuum to make the most out of vast, growing, and diverse data pertinent to human needs and interests. We propose details of perceptual computing characterized by interpretation and exploration operations comparable to the interleaving of bottom and top brain processing. This article consists of two parts. First we describe semantic computing, cognitive computing, and perceptual computing to lay out distinctions while acknowledging their complementary capabilities. We then provide a conceptual overview of the newest of these three paradigms--perceptual computing. For further insights, we focus on an application scenario of asthma management converting massive, heterogeneous and multimodal (big) data into actionable information or smart data. △ Less

Submitted 20 October, 2015; originally announced October 2015.

Comments: 13 pages, 4 Figures, IEEE Computer

arXiv:1509.04513 [pdf, ps, other]

On Reasoning with RDF Statements about Statements using Singleton Property Triples

Authors: Vinh Nguyen, Olivier Bodenreider, Krishnaprasad Thirunarayan, Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I. Furlong, Michel Dumontier, Amit Sheth

Abstract: The Singleton Property (SP) approach has been proposed for representing and querying metadata about RDF triples such as provenance, time, location, and evidence. In this approach, one singleton property is created to uniquely represent a relationship in a particular context, and in general, generates a large property hierarchy in the schema. It has become the subject of important questions from Se… ▽ More The Singleton Property (SP) approach has been proposed for representing and querying metadata about RDF triples such as provenance, time, location, and evidence. In this approach, one singleton property is created to uniquely represent a relationship in a particular context, and in general, generates a large property hierarchy in the schema. It has become the subject of important questions from Semantic Web practitioners. Can an existing reasoner recognize the singleton property triples? And how? If the singleton property triples describe a data triple, then how can a reasoner infer this data triple from the singleton property triples? Or would the large property hierarchy affect the reasoners in some way? We address these questions in this paper and present our study about the reasoning aspects of the singleton properties. We propose a simple mechanism to enable existing reasoners to recognize the singleton property triples, as well as to infer the data triples described by the singleton property triples. We evaluate the effect of the singleton property triples in the reasoning processes by comparing the performance on RDF datasets with and without singleton properties. Our evaluation uses as benchmark the LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal information added through singleton properties. △ Less

Submitted 15 September, 2015; originally announced September 2015.

arXiv:1509.02822 [pdf, other]

Exposing Provenance Metadata Using Different RDF Models

Authors: Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I. Furlong, Vinh Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier

Abstract: A standard model for exposing structured provenance metadata of scientific assertions on the Semantic Web would increase interoperability, discoverability, reliability, as well as reproducibility for scientific discourse and evidence-based knowledge discovery. Several Resource Description Framework (RDF) models have been proposed to track provenance. However, provenance metadata may not only be ve… ▽ More A standard model for exposing structured provenance metadata of scientific assertions on the Semantic Web would increase interoperability, discoverability, reliability, as well as reproducibility for scientific discourse and evidence-based knowledge discovery. Several Resource Description Framework (RDF) models have been proposed to track provenance. However, provenance metadata may not only be verbose, but also significantly redundant. Therefore, an appropriate RDF provenance model should be efficient for publishing, querying, and reasoning over Linked Data. In the present work, we have collected millions of pairwise relations between chemicals, genes, and diseases from multiple data sources, and demonstrated the extent of redundancy of provenance information in the life science domain. We also evaluated the suitability of several RDF provenance models for this crowdsourced data set, including the N-ary model, the Singleton Property model, and the Nanopublication model. We examined query performance against three commonly used large RDF stores, including Virtuoso, Stardog, and Blazegraph. Our experiments demonstrate that query performance depends on both RDF store as well as the RDF provenance model. △ Less

Submitted 9 September, 2015; originally announced September 2015.

arXiv:1503.02086 [pdf]

Gender-Based Violence in 140 Characters or Fewer: A #BigData Case Study of Twitter

Authors: Hemant Purohit, Tanvi Banerjee, Andrew Hampton, Valerie L. Shalin, Nayanesh Bhandutia, Amit P. Sheth

Abstract: Public institutions are increasingly reliant on data from social media sites to measure public attitude and provide timely public engagement. Such reliance includes the exploration of public views on important social issues such as gender-based violence (GBV). In this study, we examine big (social) data consisting of nearly fourteen million tweets collected from Twitter over a period of ten months… ▽ More Public institutions are increasingly reliant on data from social media sites to measure public attitude and provide timely public engagement. Such reliance includes the exploration of public views on important social issues such as gender-based violence (GBV). In this study, we examine big (social) data consisting of nearly fourteen million tweets collected from Twitter over a period of ten months to analyze public opinion regarding GBV, highlighting the nature of tweeting practices by geographical location and gender. We demonstrate the utility of Computational Social Science to mine insight from the corpus while accounting for the influence of both transient events and sociocultural factors. We reveal public awareness regarding GBV tolerance and suggest opportunities for intervention and the measurement of intervention effectiveness assisting both governmental and non-governmental organizations in policy development. △ Less

Submitted 29 June, 2015; v1 submitted 6 March, 2015; originally announced March 2015.

ACM Class: H.1.2; J.4

arXiv:1503.00760 [pdf]

On Using Synthetic Social Media Stimuli in an Emergency Preparedness Functional Exercise

Authors: Andrew Hampton, Shreyansh Bhatt, Alan Smith, Jeremy Brunn, Hemant Purohit, Valerie L. Shalin, John M. Flach, Amit P. Sheth

Abstract: This paper details the creation and use of a massive (over 32,000 messages) artificially constructed 'Twitter' microblog stream for a regional emergency preparedness functional exercise. By combining microblog conversion, manual production, and a control set, we created a web based information stream providing valid, misleading, and irrelevant information to public information officers (PIOs) repr… ▽ More This paper details the creation and use of a massive (over 32,000 messages) artificially constructed 'Twitter' microblog stream for a regional emergency preparedness functional exercise. By combining microblog conversion, manual production, and a control set, we created a web based information stream providing valid, misleading, and irrelevant information to public information officers (PIOs) representing hospitals, fire departments, the local Red Cross, and city and county government officials. PIOs searched, monitored, and (through conventional channels) verified potentially acionable information that could then be redistributed through a personalized screen name. Our case study of a key PIO reveals several capabilities that social media can support, including event detection, the distribution of information between functions within the emergency response community, and the distribution of messages to the public. We suggest that training as well as information filtering tools are necessary to realize the potential of social media in both emergencies and exercises. △ Less

Submitted 2 March, 2015; originally announced March 2015.

Comments: 18 pages

arXiv:1411.3761 [pdf, other]

A Hybrid Approach to Finding Relevant Social Media Content for Complex Domain Specific Information Needs

Authors: Delroy Cameron, Amit Sheth, Nishita Jaykumar, Krishnaprasad Thirunarayan, Gaurish Anand, Gary A. Smith

Abstract: While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and 'intelligible constructs' not typically modeled in ontologies. These intelligible constructs convey essential information that i… ▽ More While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and 'intelligible constructs' not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. We present a hybrid approach to domain specific information retrieval (or knowledge-aware search) that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of certain specific expressions that belong to such patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE developed for prescription drug abuse epidemiology. Keywords: Knowledge-Aware Search, Ontology, Semantic Search, Background Knowledge, Context-Free Grammar △ Less

Submitted 13 November, 2014; originally announced November 2014.

Comments: Accepted for publication: Journal of Web Semantics, Elsevier

ACM Class: H.3.3

arXiv:1410.4977 [pdf]

Semantic Gateway as a Service architecture for IoT Interoperability

Authors: Pratikkumar Desai, Amit Sheth, Pramod Anantharam

Abstract: The Internet of Things (IoT) is set to occupy a substantial component of future Internet. The IoT connects sensors and devices that record physical observations to applications and services of the Internet. As a successor to technologies such as RFID and Wireless Sensor Networks (WSN), the IoT has stumbled into vertical silos of proprietary systems, providing little or no interoperability with sim… ▽ More The Internet of Things (IoT) is set to occupy a substantial component of future Internet. The IoT connects sensors and devices that record physical observations to applications and services of the Internet. As a successor to technologies such as RFID and Wireless Sensor Networks (WSN), the IoT has stumbled into vertical silos of proprietary systems, providing little or no interoperability with similar systems. As the IoT represents future state of the Internet, an intelligent and scalable architecture is required to provide connectivity between these silos, enabling discovery of physical sensors and interpretation of messages between things. This paper proposes a gateway and Semantic Web enabled IoT architecture to provide interoperability between systems using established communication and data standards. The Semantic Gateway as Service (SGS) allows translation between messaging protocols such as XMPP, CoAP and MQTT via a multi-protocol proxy architecture. Utilization of broadly accepted specifications such as W3C's Semantic Sensor Network (SSN) ontology for semantic annotations of sensor data provide semantic interoperability between messages and support semantic reasoning to obtain higher-level actionable knowledge from low-level sensor data. △ Less

Submitted 18 October, 2014; originally announced October 2014.

Comments: 16 pages

arXiv:1212.0141 [pdf, other]

On the Role of Social Identity and Cohesion in Characterizing Online Social Communities

Authors: Hemant Purohit, Yiye Ruan, David Fuhry, Srinivasan Parthasarathy, Amit Sheth

Abstract: Two prevailing theories for explaining social group or community structure are cohesion and identity. The social cohesion approach posits that social groups arise out of an aggregation of individuals that have mutual interpersonal attraction as they share common characteristics. These characteristics can range from common interests to kinship ties and from social values to ethnic backgrounds. In c… ▽ More Two prevailing theories for explaining social group or community structure are cohesion and identity. The social cohesion approach posits that social groups arise out of an aggregation of individuals that have mutual interpersonal attraction as they share common characteristics. These characteristics can range from common interests to kinship ties and from social values to ethnic backgrounds. In contrast, the social identity approach posits that an individual is likely to join a group based on an intrinsic self-evaluation at a cognitive or perceptual level. In other words group members typically share an awareness of a common category membership. In this work we seek to understand the role of these two contrasting theories in explaining the behavior and stability of social communities in Twitter. A specific focal point of our work is to understand the role of these theories in disparate contexts ranging from disaster response to socio-political activism. We extract social identity and social cohesion features-of-interest for large scale datasets of five real-world events and examine the effectiveness of such features in capturing behavioral characteristics and the stability of groups. We also propose a novel measure of social group sustainability based on the divergence in group discussion. Our main findings are: 1) Sharing of social identities (especially physical location) among group members has a positive impact on group sustainability, 2) Structural cohesion (represented by high group density and low average shortest path length) is a strong indicator of group sustainability, and 3) Event characteristics play a role in sha** group sustainability, as social groups in transient events behave differently from groups in events that last longer. △ Less

Submitted 1 December, 2012; originally announced December 2012.

ACM Class: H.5.3; J.4

arXiv:1210.0595 [pdf]

From Questions to Effective Answers: On the Utility of Knowledge-Driven Querying Systems for Life Sciences Data

Authors: Amir H. Asiaee, Prashant Doshi, Todd Minning, Satya Sahoo, Priti Parikh, Amit Sheth, Rick L. Tarleton

Abstract: We compare two distinct approaches for querying data in the context of the life sciences. The first approach utilizes conventional databases to store the data and intuitive form-based interfaces to facilitate easy querying of the data. These interfaces could be seen as implementing a set of "pre-canned" queries commonly used by the life science researchers that we study. The second approach is bas… ▽ More We compare two distinct approaches for querying data in the context of the life sciences. The first approach utilizes conventional databases to store the data and intuitive form-based interfaces to facilitate easy querying of the data. These interfaces could be seen as implementing a set of "pre-canned" queries commonly used by the life science researchers that we study. The second approach is based on semantic Web technologies and is knowledge (model) driven. It utilizes a large OWL ontology and same datasets as before but associated as RDF instances of the ontology concepts. An intuitive interface is provided that allows the formulation of RDF triples-based queries. Both these approaches are being used in parallel by a team of cell biologists in their daily research activities, with the objective of gradually replacing the conventional approach with the knowledge-driven one. This provides us with a valuable opportunity to compare and qualitatively evaluate the two approaches. We describe several benefits of the knowledge-driven approach in comparison to the traditional way of accessing data, and highlight a few limitations as well. We believe that our analysis not only explicitly highlights the specific benefits and limitations of semantic Web technologies in our context but also contributes toward effective ways of translating a question in a researcher's mind into precise computational queries with the intent of obtaining effective answers from the data. While researchers often assume the benefits of semantic Web technologies, we explicitly illustrate these in practice. △ Less

Submitted 1 October, 2012; originally announced October 2012.

Showing 101–129 of 129 results for author: Sheth, A