-
Can GPT-4 Help Detect Quit Va** Intentions? An Exploration of Automatic Data Annotation Approach
Authors:
Sai Krishna Revanth Vuruma,
Dezhi Wu,
Saborny Sen Gupta,
Lucas Aust,
Valerie Lookingbill,
Wyatt Bellamy,
Yang Ren,
Erin Kasson,
Li-Shiun Chen,
Patricia Cavazos-Rehg,
Dian Hu,
Ming Huang
Abstract:
In recent years, the United States has witnessed a significant surge in the popularity of va** or e-cigarette use, leading to a notable rise in cases of e-cigarette and va** use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend va** behaviors and develop effective strategies for cessation. Due…
▽ More
In recent years, the United States has witnessed a significant surge in the popularity of va** or e-cigarette use, leading to a notable rise in cases of e-cigarette and va** use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend va** behaviors and develop effective strategies for cessation. Due to the ubiquity of social media platforms, over 4.7 billion users worldwide use them for connectivity, communications, news, and entertainment with a significant portion of the discourse related to health, thereby establishing social media data as an invaluable organic data resource for public health research. In this study, we extracted a sample dataset from one va** sub-community on Reddit to analyze users' quit-va** intentions. Leveraging OpenAI's latest large language model GPT-4 for sentence-level quit va** intention detection, this study compares the outcomes of this model against layman and clinical expert annotations. Using different prompting strategies such as zero-shot, one-shot, few-shot and chain-of-thought prompting, we developed 8 prompts with varying levels of detail to explain the task to GPT-4 and also evaluated the performance of the strategies against each other. These preliminary findings emphasize the potential of GPT-4 in social media data analysis, especially in identifying users' subtle intentions that may elude human detection.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Utilizing Large Language Models to Identify Reddit Users Considering Va** Cessation for Digital Interventions
Authors:
Sai Krishna Revanth Vuruma,
Dezhi Wu,
Saborny Sen Gupta,
Lucas Aust,
Valerie Lookingbill,
Caleb Henry,
Yang Ren,
Erin Kasson,
Li-Shiun Chen,
Patricia Cavazos-Rehg,
Dian Hu,
Ming Huang
Abstract:
The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of va** or e-cigarette use in the United States and other countr…
▽ More
The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of va** or e-cigarette use in the United States and other countries has caused an outbreak of e-cigarette and va** use-associated lung injury (EVALI), leading to hospitalizations and fatalities in 2019, highlighting the urgency to comprehend va** behaviors and develop effective strategies for cession. In this study, we extracted a sample dataset from one va** sub-community on Reddit to analyze users' quit va** intentions. Leveraging large language models including both the latest GPT-4 and traditional BERT-based language models for sentence-level quit-va** intention prediction tasks, this study compares the outcomes of these models against human annotations. Notably, when compared to human evaluators, GPT-4 model demonstrates superior consistency in adhering to annotation guidelines and processes, showcasing advanced capabilities to detect nuanced user quit-va** intentions that human evaluators might overlook. These preliminary findings emphasize the potential of GPT-4 in enhancing the accuracy and reliability of social media data analysis, especially in identifying subtle users' intentions that may elude human detection.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Map** Violence: Develo** an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions
Authors:
Nazia Tasnim,
Sujan Sen Gupta,
Md. Istiak Hossain Shihab,
Fatiha Islam Juee,
Arunima Tahsin,
Pritom Ghum,
Kanij Fatema,
Marshia Haque,
Wasema Farzana,
Prionti Nasir,
Ashique KhudaBukhsh,
Farig Sadeque,
Asif Sushmit
Abstract:
Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the firs…
▽ More
Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the first comprehensive framework for the automatic detection of communal violence markers in online Bangla content accompanying the largest collection (13K raw sentences) of social media interactions that fall under the definition of four major violence class and their 16 coarse expressions. Our workflow introduces a 7-step expert annotation process incorporating insights from social scientists, linguists, and psychologists. By presenting data statistics and benchmarking performance using this dataset, we have determined that, aside from the category of Non-communal violence, Religio-communal violence is particularly pervasive in Bangla text. Moreover, we have substantiated the effectiveness of fine-tuning language models in identifying violent comments by conducting preliminary benchmarking on the state-of-the-art Bangla deep learning model.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Towards Safer Smart Contracts: A Sequence Learning Approach to Detecting Security Threats
Authors:
Wesley Joon-Wie Tann,
Xing Jie Han,
Sourav Sen Gupta,
Yew-Soon Ong
Abstract:
Symbolic analysis of security exploits in smart contracts has demonstrated to be valuable for analyzing predefined vulnerability properties. While some symbolic tools perform complex analysis steps, they require a predetermined invocation depth to search vulnerable execution paths, and the search time increases with depth. The number of contracts on blockchains like Ethereum has increased 176 fold…
▽ More
Symbolic analysis of security exploits in smart contracts has demonstrated to be valuable for analyzing predefined vulnerability properties. While some symbolic tools perform complex analysis steps, they require a predetermined invocation depth to search vulnerable execution paths, and the search time increases with depth. The number of contracts on blockchains like Ethereum has increased 176 fold since December 2015. If these symbolic tools fail to analyze the increasingly large number of contracts in time, entire classes of exploits could cause irrevocable damage. In this paper, we aim to have safer smart contracts against emerging threats. We propose the approach of sequential learning of smart contract weaknesses using machine learning---long-short term memory (LSTM)---that allows us to be able to detect new attack trends relatively quickly, leading to safer smart contracts. Our experimental studies on 620,000 smart contracts prove that our model can easily scale to analyze a massive amount of contracts; that is, the LSTM maintains near constant analysis time as contracts increase in complexity. In addition, our approach achieves $99\%$ test accuracy and correctly analyzes contracts that were false positive (FP) errors made by a symbolic tool.
△ Less
Submitted 7 June, 2019; v1 submitted 15 November, 2018;
originally announced November 2018.
-
Robust toll pricing: A novel approach
Authors:
Trivikram Dokka,
Alain B. Zemkoho,
Sonali Sen Gupta,
Fabrice T. Nobibon
Abstract:
We study a robust toll pricing problem where toll setters and users have different level of information when taking their decisions. Toll setters do not have full information on the costs of the network and rely on historical information when determining toll rates, whereas users decide on the path to use from origin to destination knowing toll rates and having, in addition, more accurate traffic…
▽ More
We study a robust toll pricing problem where toll setters and users have different level of information when taking their decisions. Toll setters do not have full information on the costs of the network and rely on historical information when determining toll rates, whereas users decide on the path to use from origin to destination knowing toll rates and having, in addition, more accurate traffic data. Toll setters often also face constraints on price experimentation which means less opportunity for price revision. Motivated by this we propose a novel robust pricing methodology for fixing prices where we take non-adversarial view of nature different from the existing robust approaches. We show that our non-adversarial robustness results in less conservative pricing decisions compared to traditional adversarial nature setting. We start by first considering a single origin-destination parallel network in this new robust setting and formulate the robust toll pricing problem as a distributionally robust optimization problem, for which we develop an exact algorithm based on a mixed-integer programming formulation and a heuristic based on two-point support distribution. We further extend our formulations to more general networks and show how our algorithms can be adapted for the general networks. Finally, we illustrate the usefulness of our approach by means of numerical experiments both on randomly generated networks and on the data recorded on the road network of the city of Chicago.
△ Less
Submitted 5 December, 2017;
originally announced December 2017.
-
Generalization of a few results in Integer Partitions
Authors:
Manosij Ghosh Dastidar,
Sourav Sen Gupta
Abstract:
In this paper, we generalize a few important results in Integer Partitions; namely the results known as Stanley's theorem and Elder's theorem, and the congruence results proposed by Ramanujan for the partition function. We generalize the results of Stanley and Elder from a fixed integer to an array of subsequent integers, and propose an analogue of Ramanujan's congruence relations for the `number…
▽ More
In this paper, we generalize a few important results in Integer Partitions; namely the results known as Stanley's theorem and Elder's theorem, and the congruence results proposed by Ramanujan for the partition function. We generalize the results of Stanley and Elder from a fixed integer to an array of subsequent integers, and propose an analogue of Ramanujan's congruence relations for the `number of parts' function instead of the partition function. We also deduce the generating function for the `number of parts', and relate the technical results with their graphical interpretations through a novel use of the Ferrer's diagrams.
△ Less
Submitted 31 October, 2011;
originally announced November 2011.
-
Extension of Stanley's Theorem for Partitions
Authors:
Manosij Ghosh Dastidar,
Sourav Sen Gupta
Abstract:
In this paper we present an extension of Stanley's theorem related to partitions of positive integers. Stanley's theorem states a relation between "the sum of the numbers of distinct members in the partitions of a positive integer $n$" and "the total number of 1's that occur in the partitions of $n$". Our generalization states a similar relation between "the sum of the numbers of distinct members…
▽ More
In this paper we present an extension of Stanley's theorem related to partitions of positive integers. Stanley's theorem states a relation between "the sum of the numbers of distinct members in the partitions of a positive integer $n$" and "the total number of 1's that occur in the partitions of $n$". Our generalization states a similar relation between "the sum of the numbers of distinct members in the partitions of $n$" and the total number of 2's or 3's or any general $k$ that occur in the partitions of $n$ and the subsequent integers. We also apply this result to obtain an array of interesting corollaries, including alternate proofs and analogues of some of the very well-known results in the theory of partitions. We extend Ramanujan's results on congruence behavior of the 'number of partition' function $p(n)$ to get analogous results for the 'number of occurrences of an element $k$ in partitions of $n$'. Moreover, we present an alternate proof of Ramanujan's results in this paper.
△ Less
Submitted 26 December, 2010; v1 submitted 20 July, 2010;
originally announced July 2010.