Search | arXiv e-print repository

Use of a Structured Knowledge Base Enhances Metadata Curation by Large Language Models

Authors: Sowmya S. Sundaram, Benjamin Solomon, Avani Khatri, Anisha Laumas, Purvesh Khatri, Mark A. Musen

Abstract: Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets. This paper investigates the potential of large language models (LLMs), specifically GPT-4, to improve adherence to metadata standards. We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository, evaluati… ▽ More Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets. This paper investigates the potential of large language models (LLMs), specifically GPT-4, to improve adherence to metadata standards. We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository, evaluating GPT-4's ability to suggest edits for adherence to metadata standards. We computed the adherence accuracy of field name-field value pairs through a peer review process, and we observed a marginal average improvement in adherence to the standard data dictionary from 79% to 80% (p<0.01). We then prompted GPT-4 with domain information in the form of the textual descriptions of CEDAR templates and recorded a significant improvement to 97% from 79% (p<0.01). These results indicate that, while LLMs may not be able to correct legacy metadata to ensure satisfactory adherence to standards when unaided, they do show promise for use in automated metadata curation when integrated with a structured knowledge base. △ Less

Submitted 17 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2209.06321 [pdf, other]

Alexa, Let's Work Together: Introducing the First Alexa Prize TaskBot Challenge on Conversational Task Assistance

Authors: Anna Gottardi, Osman Ipek, Giuseppe Castellucci, Shui Hu, Lavina Vaz, Yao Lu, Anju Khatri, Anjali Chadha, Desheng Zhang, Sattvik Sahai, Prerna Dwivedi, Hangjie Shi, Lucy Hu, Andy Huang, Luke Dai, Bofei Yang, Varun Somani, Pankaj Rajan, Ron Rezac, Michael Johnston, Savanna Stiff, Leslie Ball, David Carmel, Yang Liu, Dilek Hakkani-Tur , et al. (5 additional authors not shown)

Abstract: Since its inception in 2016, the Alexa Prize program has enabled hundreds of university students to explore and compete to develop conversational agents through the SocialBot Grand Challenge. The goal of the challenge is to build agents capable of conversing coherently and engagingly with humans on popular topics for 20 minutes, while achieving an average rating of at least 4.0/5.0. However, as co… ▽ More Since its inception in 2016, the Alexa Prize program has enabled hundreds of university students to explore and compete to develop conversational agents through the SocialBot Grand Challenge. The goal of the challenge is to build agents capable of conversing coherently and engagingly with humans on popular topics for 20 minutes, while achieving an average rating of at least 4.0/5.0. However, as conversational agents attempt to assist users with increasingly complex tasks, new conversational AI techniques and evaluation platforms are needed. The Alexa Prize TaskBot challenge, established in 2021, builds on the success of the SocialBot challenge by introducing the requirements of interactively assisting humans with real-world Cooking and Do-It-Yourself tasks, while making use of both voice and visual modalities. This challenge requires the TaskBots to identify and understand the user's need, identify and integrate task and domain knowledge into the interaction, and develop new ways of engaging the user without distracting them from the task at hand, among other challenges. This paper provides an overview of the TaskBot challenge, describes the infrastructure support provided to the teams with the CoBot Toolkit, and summarizes the approaches the participating teams took to overcome the research challenges. Finally, it analyzes the performance of the competing TaskBots during the first year of the competition. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 14 pages, Proceedings of Alexa Prize Taskbot (Alexa Prize 2021)

ACM Class: I.2.7; J.0; H.5.1; H.5.2

arXiv:2006.11512 [pdf]

Sarcasm Detection in Tweets with BERT and GloVe Embeddings

Authors: Akshay Khatri, Pranav P, Dr. Anand Kumar M

Abstract: Sarcasm is a form of communication in whichthe person states opposite of what he actually means. It is ambiguous in nature. In this paper, we propose using machine learning techniques with BERT and GloVe embeddings to detect sarcasm in tweets. The dataset is preprocessed before extracting the embeddings. The proposed model also uses the context in which the user is reacting to along with his actua… ▽ More Sarcasm is a form of communication in whichthe person states opposite of what he actually means. It is ambiguous in nature. In this paper, we propose using machine learning techniques with BERT and GloVe embeddings to detect sarcasm in tweets. The dataset is preprocessed before extracting the embeddings. The proposed model also uses the context in which the user is reacting to along with his actual response. △ Less

Submitted 20 June, 2020; originally announced June 2020.

Comments: 5 pages Submitted to ACL 2020 conference

arXiv:1605.00398 [pdf]

Dynamic Address Allocation Algorithm for Mobile Ad hoc Networks

Authors: Akshay Khatri, Sankalp Kolhe, Nupur Giri

Abstract: A Mobile Ad hoc network (MANET) consists of nodes which use multi-hop communication to establish connection between nodes. Traditional infrastructure based systems use a centralized architecture for address allocation. However, this is not possible in Ad hoc networks due to their dynamic structure. Many schemes have been proposed to solve this problem, but most of them use network-wide broadcasts… ▽ More A Mobile Ad hoc network (MANET) consists of nodes which use multi-hop communication to establish connection between nodes. Traditional infrastructure based systems use a centralized architecture for address allocation. However, this is not possible in Ad hoc networks due to their dynamic structure. Many schemes have been proposed to solve this problem, but most of them use network-wide broadcasts to ensure the availability of a new address. This becomes extremely difficult as network size grows. In this paper, we propose an address allocation algorithm which avoids network-wide broadcasts to allocate address to a new node. Moreover, the algorithm allocates addresses dynamically such that the network maintains an "IP resembles topology" state. In such a state, routing becomes easier and the overall overhead in communication is reduced. This algorithm is particularly useful for routing protocols which use topology information to route messages in the network. Our solution is designed with scalability in mind such that the cost of address assignment to a new node is independent of the number of nodes in the network. △ Less

Submitted 2 May, 2016; originally announced May 2016.

Showing 1–4 of 4 results for author: Khatri, A