Search | arXiv e-print repository

ADSumm: Annotated Ground-truth Summary Datasets for Disaster Tweet Summarization

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: Online social media platforms, such as Twitter, provide valuable information during disaster events. Existing tweet disaster summarization approaches provide a summary of these events to aid government agencies, humanitarian organizations, etc., to ensure effective disaster response. In the literature, there are two types of approaches for disaster summarization, namely, supervised and unsupervise… ▽ More Online social media platforms, such as Twitter, provide valuable information during disaster events. Existing tweet disaster summarization approaches provide a summary of these events to aid government agencies, humanitarian organizations, etc., to ensure effective disaster response. In the literature, there are two types of approaches for disaster summarization, namely, supervised and unsupervised approaches. Although supervised approaches are typically more effective, they necessitate a sizable number of disaster event summaries for testing and training. However, there is a lack of good number of disaster summary datasets for training and evaluation. This motivates us to add more datasets to make supervised learning approaches more efficient. In this paper, we present ADSumm, which adds annotated ground-truth summaries for eight disaster events which consist of both natural and man-made disaster events belonging to seven different countries. Our experimental analysis shows that the newly added datasets improve the performance of the supervised summarization approaches by 8-28% in terms of ROUGE-N F1-score. Moreover, in newly annotated dataset, we have added a category label for each input tweet which helps to ensure good coverage from different categories in summary. Additionally, we have added two other features relevance label and key-phrase, which provide information about the quality of a tweet and explanation about the inclusion of the tweet into summary, respectively. For ground-truth summary creation, we provide the annotation procedure adapted in detail, which has not been described in existing literature. Experimental analysis shows the quality of ground-truth summary is very good with Coverage, Relevance and Diversity. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06541 [pdf, other]

ATSumm: Auxiliary information enhanced approach for abstractive disaster Tweet Summarization with sparse training data

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: The abundance of situational information on Twitter poses a challenge for users to manually discern vital and relevant information during disasters. A concise and human-interpretable overview of this information helps decision-makers in implementing efficient and quick disaster response. Existing abstractive summarization approaches can be categorized as sentence-based or key-phrase-based approach… ▽ More The abundance of situational information on Twitter poses a challenge for users to manually discern vital and relevant information during disasters. A concise and human-interpretable overview of this information helps decision-makers in implementing efficient and quick disaster response. Existing abstractive summarization approaches can be categorized as sentence-based or key-phrase-based approaches. This paper focuses on sentence-based approach, which is typically implemented as a dual-phase procedure in literature. The initial phase, known as the extractive phase, involves identifying the most relevant tweets. The subsequent phase, referred to as the abstractive phase, entails generating a more human-interpretable summary. In this study, we adopt the methodology from prior research for the extractive phase. For the abstractive phase of summarization, most existing approaches employ deep learning-based frameworks, which can either be pre-trained or require training from scratch. However, to achieve the appropriate level of performance, it is imperative to have substantial training data for both methods, which is not readily available. This work presents an Abstractive Tweet Summarizer (ATSumm) that effectively addresses the issue of data sparsity by using auxiliary information. We introduced the Auxiliary Pointer Generator Network (AuxPGN) model, which utilizes a unique attention mechanism called Key-phrase attention. This attention mechanism incorporates auxiliary information in the form of key-phrases and their corresponding importance scores from the input tweets. We evaluate the proposed approach by comparing it with 10 state-of-the-art approaches across 13 disaster datasets. The evaluation results indicate that ATSumm achieves superior performance compared to state-of-the-art approaches, with improvement of 4-80% in ROUGE-N F1-score. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2401.06810 [pdf, other]

TONE: A 3-Tiered ONtology for Emotion analysis

Authors: Srishti Gupta, Piyush Kumar Garg, Sourav Kumar Dandapat

Abstract: Emotions have played an important part in many sectors, including psychology, medicine, mental health, computer science, and so on, and categorizing them has proven extremely useful in separating one emotion from another. Emotions can be classified using the following two methods: (1) The supervised method's efficiency is strongly dependent on the size and domain of the data collected. A categoriz… ▽ More Emotions have played an important part in many sectors, including psychology, medicine, mental health, computer science, and so on, and categorizing them has proven extremely useful in separating one emotion from another. Emotions can be classified using the following two methods: (1) The supervised method's efficiency is strongly dependent on the size and domain of the data collected. A categorization established using relevant data from one domain may not work well in another. (2) An unsupervised method that uses either domain expertise or a knowledge base of emotion types already exists. Though this second approach provides a suitable and generic categorization of emotions and is cost-effective, the literature doesn't possess a publicly available knowledge base that can be directly applied to any emotion categorization-related task. This pushes us to create a knowledge base that can be used for emotion classification across domains, and ontology is often used for this purpose. In this study, we provide TONE, an emotion-based ontology that effectively creates an emotional hierarchy based on Dr. Gerrod Parrot's group of emotions. In addition to ontology development, we introduce a semi-automated vocabulary construction process to generate a detailed collection of terms for emotions at each tier of the hierarchy. We also demonstrate automated methods for establishing three sorts of dependencies in order to develop linkages between different emotions. Our human and automatic evaluation results show the ontology's quality. Furthermore, we describe three distinct use cases that demonstrate the applicability of our ontology. △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:2305.11592 [pdf, ps, other]

IKDSumm: Incorporating Key-phrases into BERT for extractive Disaster Tweet Summarization

Authors: Piyush Kumar Garg, Roshni Chakraborty, Srishti Gupta, Sourav Kumar Dandapat

Abstract: Online social media platforms, such as Twitter, are one of the most valuable sources of information during disaster events. Therefore, humanitarian organizations, government agencies, and volunteers rely on a summary of this information, i.e., tweets, for effective disaster management. Although there are several existing supervised and unsupervised approaches for automated tweet summary approaches… ▽ More Online social media platforms, such as Twitter, are one of the most valuable sources of information during disaster events. Therefore, humanitarian organizations, government agencies, and volunteers rely on a summary of this information, i.e., tweets, for effective disaster management. Although there are several existing supervised and unsupervised approaches for automated tweet summary approaches, these approaches either require extensive labeled information or do not incorporate specific domain knowledge of disasters. Additionally, the most recent approaches to disaster summarization have proposed BERT-based models to enhance the summary quality. However, for further improved performance, we introduce the utilization of domain-specific knowledge without any human efforts to understand the importance (salience) of a tweet which further aids in summary creation and improves summary quality. In this paper, we propose a disaster-specific tweet summarization framework, IKDSumm, which initially identifies the crucial and important information from each tweet related to a disaster through key-phrases of that tweet. We identify these key-phrases by utilizing the domain knowledge (using existing ontology) of disasters without any human intervention. Further, we utilize these key-phrases to automatically generate a summary of the tweets. Therefore, given tweets related to a disaster, IKDSumm ensures fulfillment of the summarization key objectives, such as information coverage, relevance, and diversity in summary without any human intervention. We evaluate the performance of IKDSumm with 8 state-of-the-art techniques on 12 disaster datasets. The evaluation results show that IKDSumm outperforms existing techniques by approximately 2-79% in terms of ROUGE-N F1-score. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.11536 [pdf, other]

PORTRAIT: a hybrid aPproach tO cReate extractive ground-TRuth summAry for dIsaster evenT

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: Disaster summarization approaches provide an overview of the important information posted during disaster events on social media platforms, such as, Twitter. However, the type of information posted significantly varies across disasters depending on several factors like the location, type, severity, etc. Verification of the effectiveness of disaster summarization approaches still suffer due to the… ▽ More Disaster summarization approaches provide an overview of the important information posted during disaster events on social media platforms, such as, Twitter. However, the type of information posted significantly varies across disasters depending on several factors like the location, type, severity, etc. Verification of the effectiveness of disaster summarization approaches still suffer due to the lack of availability of good spectrum of datasets along with the ground-truth summary. Existing approaches for ground-truth summary generation (ground-truth for extractive summarization) relies on the wisdom and intuition of the annotators. Annotators are provided with a complete set of input tweets from which a subset of tweets is selected by the annotators for the summary. This process requires immense human effort and significant time. Additionally, this intuition-based selection of the tweets might lead to a high variance in summaries generated across annotators. Therefore, to handle these challenges, we propose a hybrid (semi-automated) approach (PORTRAIT) where we partly automate the ground-truth summary generation procedure. This approach reduces the effort and time of the annotators while ensuring the quality of the created ground-truth summary. We validate the effectiveness of PORTRAIT on 5 disaster events through quantitative and qualitative comparisons of ground-truth summaries generated by existing intuitive approaches, a semi-automated approach, and PORTRAIT. We prepare and release the ground-truth summaries for 5 disaster events which consist of both natural and man-made disaster events belonging to 4 different countries. Finally, we provide a study about the performance of various state-of-the-art summarization approaches on the ground-truth summaries generated by PORTRAIT using ROUGE-N F1-scores. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2203.01188 [pdf, ps, other]

EnDSUM: Entropy and Diversity based Disaster Tweet Summarization

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: The huge amount of information shared in Twitter during disaster events are utilized by government agencies and humanitarian organizations to ensure quick crisis response and provide situational updates. However, the huge number of tweets posted makes manual identification of the relevant tweets impossible. To address the information overload, there is a need to automatically generate summary of a… ▽ More The huge amount of information shared in Twitter during disaster events are utilized by government agencies and humanitarian organizations to ensure quick crisis response and provide situational updates. However, the huge number of tweets posted makes manual identification of the relevant tweets impossible. To address the information overload, there is a need to automatically generate summary of all the tweets which can highlight the important aspects of the disaster. In this paper, we propose an entropy and diversity based summarizer, termed as EnDSUM, specifically for disaster tweet summarization. Our comprehensive analysis on 6 datasets indicates the effectiveness of EnDSUM and additionally, highlights the scope of improvement of EnDSUM. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2201.07472 [pdf, other]

Detecting Stance in Tweets : A Signed Network based Approach

Authors: Roshni Chakraborty, Maitry Bhavsar, Sourav Kumar Dandapat, Joydeep Chandra

Abstract: Identifying user stance related to a political event has several applications, like determination of individual stance, sha** of public opinion, identifying popularity of government measures and many others. The huge volume of political discussions on social media platforms, like, Twitter, provide opportunities in develo** automated mechanisms to identify individual stance and subsequently, sc… ▽ More Identifying user stance related to a political event has several applications, like determination of individual stance, sha** of public opinion, identifying popularity of government measures and many others. The huge volume of political discussions on social media platforms, like, Twitter, provide opportunities in develo** automated mechanisms to identify individual stance and subsequently, scale to a large volume of users. However, issues like short text and huge variance in the vocabulary of the tweets make such exercise enormously difficult. Existing stance detection algorithms require either event specific training data or annotated twitter handles and therefore, are difficult to adapt to new events. In this paper, we propose a sign network based framework that use external information sources, like news articles to create a signed network of relevant entities with respect to a news event and subsequently use the same to detect stance of any tweet towards the event. Validation on 5,000 tweets related to 10 events indicates that the proposed approach can ensure over 6.5% increase in average F1 score compared to the existing stance detection approaches. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2201.06545 [pdf, ps, other]

OntoDSumm : Ontology based Tweet Summarization for Disaster Events

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: The huge popularity of social media platforms like Twitter attracts a large fraction of users to share real-time information and short situational messages during disasters. A summary of these tweets is required by the government organizations, agencies, and volunteers for efficient and quick disaster response. However, the huge influx of tweets makes it difficult to manually get a precise overvie… ▽ More The huge popularity of social media platforms like Twitter attracts a large fraction of users to share real-time information and short situational messages during disasters. A summary of these tweets is required by the government organizations, agencies, and volunteers for efficient and quick disaster response. However, the huge influx of tweets makes it difficult to manually get a precise overview of ongoing events. To handle this challenge, several tweet summarization approaches have been proposed. In most of the existing literature, tweet summarization is broken into a two-step process where in the first step, it categorizes tweets, and in the second step, it chooses representative tweets from each category. There are both supervised as well as unsupervised approaches found in literature to solve the problem of first step. Supervised approaches requires huge amount of labelled data which incurs cost as well as time. On the other hand, unsupervised approaches could not clusters tweet properly due to the overlap** keywords, vocabulary size, lack of understanding of semantic meaning etc. While, for the second step of summarization, existing approaches applied different ranking methods where those ranking methods are very generic which fail to compute proper importance of a tweet respect to a disaster. Both the problems can be handled far better with proper domain knowledge. In this paper, we exploited already existing domain knowledge by the means of ontology in both the steps and proposed a novel disaster summarization method OntoDSumm. We evaluate this proposed method with 4 state-of-the-art methods using 10 disaster datasets. Evaluation results reveal that OntoDSumm outperforms existing methods by approximately 2-66% in terms of ROUGE-1 F1 score. △ Less

Submitted 19 November, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

ACM Class: H.0

arXiv:2102.08661 [pdf, ps, other]

doi 10.1109/TCSS.2019.2943238

A Large-Scale Study of the Twitter Follower Network to Characterize the Spread of Prescription Drug Abuse Tweets

Authors: Ryan Sequeira, Avijit Gayen, Niloy Ganguly, Sourav Kumar Dandapat, Joydeep Chandra

Abstract: In this article, we perform a large-scale study of the Twitter follower network, involving around 0.42 million users who justify DA, to characterize the spreading of DA tweets across the network. Our observations reveal the existence of a very large giant component involving 99% of these users with dense local connectivity that facilitates the spreading of such messages. We further identify active… ▽ More In this article, we perform a large-scale study of the Twitter follower network, involving around 0.42 million users who justify DA, to characterize the spreading of DA tweets across the network. Our observations reveal the existence of a very large giant component involving 99% of these users with dense local connectivity that facilitates the spreading of such messages. We further identify active cascades over the network and observe that the cascades of DA tweets get spread over a long distance through the engagement of several closely connected groups of users. Moreover, our observations also reveal a collective phenomenon, involving a large set of active fringe nodes (with a small number of follower and following) along with a small set of well-connected nonfringe nodes that work together toward such spread, thus potentially complicating the process of arresting such cascades. Furthermore, we discovered that the engagement of the users with respect to certain drugs, such as Vicodin, Percocet, and OxyContin, that were observed to be most mentioned in Twitter is instantaneous. On the other hand, for drugs, such as Lortab, that found lesser mentions, the engagement probability becomes high with increasing exposure to such tweets, thereby indicating that drug abusers engaged on Twitter remain vulnerable to adopting newer drugs, aggravating the problem further. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: 13 pages, 9 figures, and accepted by IEEE Transactions on Computational Social Systems

Journal ref: IEEE Transactions on Computational Social Systems, vol. 6, no. 6, pp. 1232-1244, Dec. 2019

arXiv:1901.09334 [pdf]

Predicting Tomorrow's Headline using Today's Twitter Deliberations

Authors: Roshni Chakraborty, Abhijeet Kharat, Apalak Khatua, Sourav Kumar Dandapat, Joydeep Chandra

Abstract: Predicting the popularity of news article is a challenging task. Existing literature mostly focused on article contents and polarity to predict popularity. However, existing research has not considered the users' preference towards a particular article. Understanding users' preference is an important aspect for predicting the popularity of news articles. Hence, we consider the social media data, f… ▽ More Predicting the popularity of news article is a challenging task. Existing literature mostly focused on article contents and polarity to predict popularity. However, existing research has not considered the users' preference towards a particular article. Understanding users' preference is an important aspect for predicting the popularity of news articles. Hence, we consider the social media data, from the Twitter platform, to address this research gap. In our proposed model, we have considered the users' involvement as well as the users' reaction towards an article to predict the popularity of the article. In short, we are predicting tomorrow's headline by probing today's Twitter discussion. We have considered 300 political news article from the New York Post, and our proposed approach has outperformed other baseline models. △ Less

Submitted 27 January, 2019; originally announced January 2019.

Comments: This paper was accepted in CIKM Workshop on News Recommendation and Analytics (INRA), 2018, Turin, Italy

Showing 1–10 of 10 results for author: Dandapat, S K