Search | arXiv e-print repository

Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?

Authors: Nikita Soni, Niranjan Balasubramanian, H. Andrew Schwartz, Dirk Hovy

Abstract: Incorporating human context into language models is the next frontier for human-centered natural language processing. Currently, two pre-training methods exist: group-wise attributes (e.g., over-45-year-olds) or individual traits. Group attributes are coarse -- not all 45-year-olds write the same way -- while modeling individual traits allows for a more personalized representation, but requires mo… ▽ More Incorporating human context into language models is the next frontier for human-centered natural language processing. Currently, two pre-training methods exist: group-wise attributes (e.g., over-45-year-olds) or individual traits. Group attributes are coarse -- not all 45-year-olds write the same way -- while modeling individual traits allows for a more personalized representation, but requires more complex modeling and data. So far, it is unclear which pre-training approach benefits what tasks. We compare pre-training models with human context via 1) group attributes, 2) individual users, and 3) a combined approach on 5 user- and document-level tasks. We find that pre-training with both group and individual features significantly improves the two user-level regression tasks like age estimation and personality assessment. Pre-training on individual users significantly improves the three document-level classification tasks like stance and topic detection. It even does well for downstream tasks without historical user data. Our results suggest both approaches have specific use cases, opening new avenues for human-centered language modeling. △ Less

Submitted 26 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2312.07751 [pdf, other]

Large Human Language Models: A Need and the Challenges

Authors: Nikita Soni, H. Andrew Schwartz, João Sedoc, Niranjan Balasubramanian

Abstract: As research in human-centered NLP advances, there is a growing recognition of the importance of incorporating human and social factors into NLP models. At the same time, our NLP systems have become heavily reliant on LLMs, most of which do not model authors. To build NLP systems that can truly understand human language, we must better integrate human contexts into LLMs. This brings to the fore a r… ▽ More As research in human-centered NLP advances, there is a growing recognition of the importance of incorporating human and social factors into NLP models. At the same time, our NLP systems have become heavily reliant on LLMs, most of which do not model authors. To build NLP systems that can truly understand human language, we must better integrate human contexts into LLMs. This brings to the fore a range of design considerations and challenges in terms of what human aspects to capture, how to represent them, and what modeling strategies to pursue. To address these, we advocate for three positions toward creating large human language models (LHLMs) using concepts from psychological and behavioral sciences: First, LM training should include the human context. Second, LHLMs should recognize that people are more than their group(s). Third, LHLMs should be able to account for the dynamic and temporally-dependent nature of the human context. We refer to relevant advances and present open challenges that need to be addressed and their possible solutions in realizing these goals. △ Less

Submitted 9 May, 2024; v1 submitted 8 November, 2023; originally announced December 2023.

arXiv:2307.16112 [pdf, other]

doi 10.1145/3586183.3606827

Augmented Math: Authoring AR-Based Explorable Explanations by Augmenting Static Math Textbooks

Authors: Neil Chulpongsatorn, Mille Skovhus Lunding, Nishan Soni, Ryo Suzuki

Abstract: We introduce Augmented Math, a machine learning-based approach to authoring AR explorable explanations by augmenting static math textbooks without programming. To augment a static document, our system first extracts mathematical formulas and figures from a given document using optical character recognition (OCR) and computer vision. By binding and manipulating these extracted contents, the user ca… ▽ More We introduce Augmented Math, a machine learning-based approach to authoring AR explorable explanations by augmenting static math textbooks without programming. To augment a static document, our system first extracts mathematical formulas and figures from a given document using optical character recognition (OCR) and computer vision. By binding and manipulating these extracted contents, the user can see the interactive animation overlaid onto the document through mobile AR interfaces. This empowers non-technical users, such as teachers or students, to transform existing math textbooks and handouts into on-demand and personalized explorable explanations. To design our system, we first analyzed existing explorable math explanations to identify common design strategies. Based on the findings, we developed a set of augmentation techniques that can be automatically generated based on the extracted content, which are 1) dynamic values, 2) interactive figures, 3) relationship highlights, 4) concrete examples, and 5) step-by-step hints. To evaluate our system, we conduct two user studies: preliminary user testing and expert interviews. The study results confirm that our system allows more engaging experiences for learning math concepts. △ Less

Submitted 29 July, 2023; originally announced July 2023.

Comments: UIST 2023

arXiv:2302.12952 [pdf]

Robust language-based mental health assessments in time and space through social media

Authors: Siddharth Mangalik, Johannes C. Eichstaedt, Salvatore Giorgi, Jihu Mun, Farhan Ahmed, Gilvir Gill, Adithya V. Ganesan, Shashanka Subrahmanya, Nikita Soni, Sean A. P. Clouston, H. Andrew Schwartz

Abstract: Compared to physical health, population mental health measurement in the U.S. is very coarse-grained. Currently, in the largest population surveys, such as those carried out by the Centers for Disease Control or Gallup, mental health is only broadly captured through "mentally unhealthy days" or "sadness", and limited to relatively infrequent state or metropolitan estimates. Through the large scale… ▽ More Compared to physical health, population mental health measurement in the U.S. is very coarse-grained. Currently, in the largest population surveys, such as those carried out by the Centers for Disease Control or Gallup, mental health is only broadly captured through "mentally unhealthy days" or "sadness", and limited to relatively infrequent state or metropolitan estimates. Through the large scale analysis of social media data, robust estimation of population mental health is feasible at much higher resolutions, up to weekly estimates for counties. In the present work, we validate a pipeline that uses a sample of 1.2 billion Tweets from 2 million geo-located users to estimate mental health changes for the two leading mental health conditions, depression and anxiety. We find moderate to large associations between the language-based mental health assessments and survey scores from Gallup for multiple levels of granularity, down to the county-week (fixed effects $β= .25$ to $1.58$; $p<.001$). Language-based assessment allows for the cost-effective and scalable monitoring of population mental health at weekly time scales. Such spatially fine-grained time series are well suited to monitor effects of societal events and policies as well as enable quasi-experimental study designs in population health and other disciplines. Beyond mental health in the U.S., this method generalizes to a broad set of psychological outcomes and allows for community measurement in under-resourced settings where no traditional survey measures - but social media data - are available. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: 9 pages, 7 figures, pre-print

ACM Class: J.4; I.2.7

arXiv:2209.11282 [pdf]

Automated detection of Alzheimer disease using MRI images and deep neural networks- A review

Authors: Narotam Singh, Patteshwari. D, Neha Soni, Amita Kapoor

Abstract: Early detection of Alzheimer disease is crucial for deploying interventions and slowing the disease progression. A lot of machine learning and deep learning algorithms have been explored in the past decade with the aim of building an automated detection for Alzheimer. Advancements in data augmentation techniques and advanced deep learning architectures have opened up new frontiers in this field, a… ▽ More Early detection of Alzheimer disease is crucial for deploying interventions and slowing the disease progression. A lot of machine learning and deep learning algorithms have been explored in the past decade with the aim of building an automated detection for Alzheimer. Advancements in data augmentation techniques and advanced deep learning architectures have opened up new frontiers in this field, and research is moving at a rapid speed. Hence, the purpose of this survey is to provide an overview of recent research on deep learning models for Alzheimer disease diagnosis. In addition to categorizing the numerous data sources, neural network architectures, and commonly used assessment measures, we also classify implementation and reproducibility. Our objective is to assist interested researchers in kee** up with the newest developments and in reproducing earlier investigations as benchmarks. In addition, we also indicate future research directions for this topic. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: 22 Pages, 5 Figures, 7 Tables

arXiv:2205.05128 [pdf, other]

doi 10.18653/v1/2022.findings-acl.52

Human Language Modeling

Authors: Nikita Soni, Matthew Matero, Niranjan Balasubramanian, H. Andrew Schwartz

Abstract: Natural language is generated by people, yet traditional language modeling views words or documents as if generated independently. Here, we propose human language modeling (HuLM), a hierarchical extension to the language modeling problem whereby a human-level exists to connect sequences of documents (e.g. social media messages) and capture the notion that human language is moderated by changing hu… ▽ More Natural language is generated by people, yet traditional language modeling views words or documents as if generated independently. Here, we propose human language modeling (HuLM), a hierarchical extension to the language modeling problem whereby a human-level exists to connect sequences of documents (e.g. social media messages) and capture the notion that human language is moderated by changing human states. We introduce, HaRT, a large-scale transformer model for the HuLM task, pre-trained on approximately 100,000 social media users, and demonstrate its effectiveness in terms of both language modeling (perplexity) for social media and fine-tuning for 4 downstream tasks spanning document- and user-levels: stance detection, sentiment classification, age estimation, and personality assessment. Results on all tasks meet or surpass the current state-of-the-art. △ Less

Submitted 10 May, 2022; originally announced May 2022.

arXiv:2109.08113 [pdf, other]

MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection

Authors: Matthew Matero, Nikita Soni, Niranjan Balasubramanian, H. Andrew Schwartz

Abstract: Much of natural language processing is focused on leveraging large capacity language models, typically trained over single messages with a task of predicting one or more tokens. However, modeling human language at higher-levels of context (i.e., sequences of messages) is under-explored. In stance detection and other social media tasks where the goal is to predict an attribute of a message, we have… ▽ More Much of natural language processing is focused on leveraging large capacity language models, typically trained over single messages with a task of predicting one or more tokens. However, modeling human language at higher-levels of context (i.e., sequences of messages) is under-explored. In stance detection and other social media tasks where the goal is to predict an attribute of a message, we have contextual data that is loosely semantically connected by authorship. Here, we introduce Message-Level Transformer (MeLT) -- a hierarchical message-encoder pre-trained over Twitter and applied to the task of stance prediction. We focus on stance prediction as a task benefiting from knowing the context of the message (i.e., the sequence of previous messages). The model is trained using a variant of masked-language modeling; where instead of predicting tokens, it seeks to generate an entire masked (aggregated) message vector via reconstruction loss. We find that applying this pre-trained masked message-level transformer to the downstream task of stance detection achieves F1 performance of 67%. △ Less

Submitted 1 November, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

arXiv:2107.02314 [pdf, other]

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

Authors: Ujjwal Baid, Satyam Ghodasara, Suyash Mohan, Michel Bilello, Evan Calabrese, Errol Colak, Keyvan Farahani, Jayashree Kalpathy-Cramer, Felipe C. Kitamura, Sarthak Pati, Luciano M. Prevedello, Jeffrey D. Rudie, Chiharu Sako, Russell T. Shinohara, Timothy Bergquist, Rong Chai, James Eddy, Julia Elliott, Walter Reade, Thomas Schaffter, Thomas Yu, Jiaxin Zheng, Ahmed W. Moawad, Luiz Otavio Coelho, Olivia McDonnell , et al. (78 additional authors not shown)

Abstract: The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with wel… ▽ More The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with well-curated multi-institutional multi-parametric magnetic resonance imaging (mpMRI) data. Gliomas are the most common primary malignancies of the central nervous system, with varying degrees of aggressiveness and prognosis. The RSNA-ASNR-MICCAI BraTS 2021 challenge targets the evaluation of computational algorithms assessing the same tumor compartmentalization, as well as the underlying tumor's molecular characterization, in pre-operative baseline mpMRI data from 2,040 patients. Specifically, the two tasks that BraTS 2021 focuses on are: a) the segmentation of the histologically distinct brain tumor sub-regions, and b) the classification of the tumor's O[6]-methylguanine-DNA methyltransferase (MGMT) promoter methylation status. The performance evaluation of all participating algorithms in BraTS 2021 will be conducted through the Sage Bionetworks Synapse platform (Task 1) and Kaggle (Task 2), concluding in distributing to the top ranked participants monetary awards of $60,000 collectively. △ Less

Submitted 12 September, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: 19 pages, 2 figures, 1 table

arXiv:2106.03843 [pdf, other]

Equivariant Graph Neural Networks for 3D Macromolecular Structure

Authors: Bowen **g, Stephan Eismann, Pratham N. Soni, Ron O. Dror

Abstract: Representing and reasoning about 3D structures of macromolecules is emerging as a distinct challenge in machine learning. Here, we extend recent work on geometric vector perceptrons and apply equivariant graph neural networks to a wide range of tasks from structural biology. Our method outperforms all reference architectures on three out of eight tasks in the ATOM3D benchmark, is tied for first on… ▽ More Representing and reasoning about 3D structures of macromolecules is emerging as a distinct challenge in machine learning. Here, we extend recent work on geometric vector perceptrons and apply equivariant graph neural networks to a wide range of tasks from structural biology. Our method outperforms all reference architectures on three out of eight tasks in the ATOM3D benchmark, is tied for first on two others, and is competitive with equivariant networks using higher-order representations and spherical harmonic convolutions. In addition, we demonstrate that transfer learning can further improve performance on certain downstream tasks. Code is available at https://github.com/drorlab/gvp-pytorch. △ Less

Submitted 13 July, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: WCB @ ICML 2021 + link to code

arXiv:2012.11206 [pdf]

Edge Computing in Transportation: Security Issues and Challenges

Authors: Nikheel Soni, Reza Malekian, Arnav Thakur

Abstract: As the amount of data that needs to be processed in real-time due to recent application developments increase, the need for a new computing paradigm is required. Edge computing resolves this issue by offloading computing resources required by intelligent transportation systems such as the Internet of Vehicles from the cloud closer to the end devices to improve performance however, it is susceptibl… ▽ More As the amount of data that needs to be processed in real-time due to recent application developments increase, the need for a new computing paradigm is required. Edge computing resolves this issue by offloading computing resources required by intelligent transportation systems such as the Internet of Vehicles from the cloud closer to the end devices to improve performance however, it is susceptible to security issues that make the transportation systems vulnerable to attackers. In addition to this, there are security issues in transportation technologies that impact the edge computing paradigm as well. This paper presents some of the main security issues and challenges that are present in edge computing, which are Distributed Denial of Service attacks, side channel attacks, malware injection attacks and authentication and authorization attacks, how these impact intelligent transportation systems and research being done to help realize and mitigate these issues. △ Less

Submitted 21 December, 2020; originally announced December 2020.

arXiv:2006.00876 [pdf]

Algorithms for Computing in Fog Systems: principles, algorithms, and Challenges

Authors: Nikheel Soni, Reza Malekian, Dijana Capeska Bogatinoska

Abstract: Fog computing is an architecture that is used to distribute resources such as computing, storage, and memory closer to end-user to improve applications and service deployment. The idea behind fog computing is to improve cloud computing and IoT infrastructures by reducing compute power, network bandwidth, and latency as well as storage requirements. This paper presents an overview of what fog compu… ▽ More Fog computing is an architecture that is used to distribute resources such as computing, storage, and memory closer to end-user to improve applications and service deployment. The idea behind fog computing is to improve cloud computing and IoT infrastructures by reducing compute power, network bandwidth, and latency as well as storage requirements. This paper presents an overview of what fog computing is, related concepts, algorithms that are present to improve fog computing infrastructure as well as challenges that exist. This paper shows that there is a great advantage of using fog computing to support cloud and IoT systems. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 43rd International Convention on Information, Communication and Electronic Technology, Opatija, Croatia

arXiv:1905.02092 [pdf]

Impact of Artificial Intelligence on Businesses: from Research, Innovation, Market Deployment to Future Shifts in Business Models

Authors: Neha Soni, Enakshi Khular Sharma, Narotam Singh, Amita Kapoor

Abstract: The fast pace of artificial intelligence (AI) and automation is propelling strategists to reshape their business models. This is fostering the integration of AI in the business processes but the consequences of this adoption are underexplored and need attention. This paper focuses on the overall impact of AI on businesses - from research, innovation, market deployment to future shifts in business… ▽ More The fast pace of artificial intelligence (AI) and automation is propelling strategists to reshape their business models. This is fostering the integration of AI in the business processes but the consequences of this adoption are underexplored and need attention. This paper focuses on the overall impact of AI on businesses - from research, innovation, market deployment to future shifts in business models. To access this overall impact, we design a three-dimensional research model, based upon the Neo-Schumpeterian economics and its three forces viz. innovation, knowledge, and entrepreneurship. The first dimension deals with research and innovation in AI. In the second dimension, we explore the influence of AI on the global market and the strategic objectives of the businesses and finally, the third dimension examines how AI is sha** business contexts. Additionally, the paper explores AI implications on actors and its dark sides. △ Less

Submitted 3 May, 2019; originally announced May 2019.

Comments: 38 pages, 10 figures, 3 tables. A part of this work has been presented in DIGITS 2018

Showing 1–12 of 12 results for author: Soni, N