Search | arXiv e-print repository

SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages

Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Meriem Beloucif, Christine De Kock, Oumaima Hourrane, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Krishnapriya Vishnubhotla, Seid Muhie Yimam, Saif M. Mohammad

Abstract: We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. The… ▽ More We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia -- regions characterised by the relatively limited availability of NLP resources. Each instance in the datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. Participating systems were asked to rank sentence pairs by their closeness in meaning (i.e., their degree of semantic relatedness) in the 14 languages in three main tracks: (a) supervised, (b) unsupervised, and (c) crosslingual. The task attracted 163 participants. We received 70 submissions in total (across all tasks) from 51 different teams, and 38 system description papers. We report on the best-performing systems as well as the most common and the most effective approaches for the three different tracks. △ Less

Submitted 17 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: SemEval 2024 Task Description Paper. arXiv admin note: text overlap with arXiv:2402.08638

arXiv:2403.14651 [pdf, other]

DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures

Authors: Agrima Seth, Sanchit Ahuja, Kalika Bali, Sunayana Sitaram

Abstract: Generative models are increasingly being used in various applications, such as text generation, commonsense reasoning, and question-answering. To be effective globally, these models must be aware of and account for local socio-cultural contexts, making it necessary to have benchmarks to evaluate the models for their cultural familiarity. Since the training data for LLMs is web-based and the Web is… ▽ More Generative models are increasingly being used in various applications, such as text generation, commonsense reasoning, and question-answering. To be effective globally, these models must be aware of and account for local socio-cultural contexts, making it necessary to have benchmarks to evaluate the models for their cultural familiarity. Since the training data for LLMs is web-based and the Web is limited in its representation of information, it does not capture knowledge present within communities that are not on the Web. Thus, these models exacerbate the inequities, semantic misalignment, and stereotypes from the Web. There has been a growing call for community-centered participatory research methods in NLP. In this work, we respond to this call by using participatory research methods to introduce $\textit{DOSA}$, the first community-generated $\textbf{D}$ataset $\textbf{o}$f 615 $\textbf{S}$ocial $\textbf{A}$rtifacts, by engaging with 260 participants from 19 different Indian geographic subcultures. We use a gamified framework that relies on collective sensemaking to collect the names and descriptions of these artifacts such that the descriptions semantically align with the shared sensibilities of the individuals from those cultures. Next, we benchmark four popular LLMs and find that they show significant variation across regional sub-cultures in their ability to infer the artifacts. △ Less

Submitted 23 February, 2024; originally announced March 2024.

arXiv:2402.08638 [pdf, other]

SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages

Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine De Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Winata , et al. (2 additional authors not shown)

Abstract: Exploring and quantifying semantic relatedness is central to representing language and holds significant implications across various NLP tasks. While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present \textit{SemRel}, a new semantic relatedness dat… ▽ More Exploring and quantifying semantic relatedness is central to representing language and holds significant implications across various NLP tasks. While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present \textit{SemRel}, a new semantic relatedness dataset collection annotated by native speakers across 13 languages: \textit{Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Spanish,} and \textit{Telugu}. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia -- regions characterised by a relatively limited availability of NLP resources. Each instance in the SemRel datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. The scores are obtained using a comparative annotation framework. We describe the data collection and annotation processes, challenges when building the datasets, baseline experiments, and their impact and utility in NLP. △ Less

Submitted 31 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Comments: Accepted to the Findings of ACL 2024

arXiv:2401.06413 [pdf]

Why Doesn't Microsoft Let Me Sleep? How Automaticity of Windows Updates Impacts User Autonomy

Authors: Sanju Ahuja, Ridhi Jain, Jyoti Kumar

Abstract: 'Automating the user away' has been designated as a dark pattern in literature for performing tasks without user consent or confirmation. However, limited studies have been reported on how users experience the sense of autonomy when digital systems fully or partially bypass consent. More research is required to understand what makes automaticity a threat to autonomy. To address this gap, a qualita… ▽ More 'Automating the user away' has been designated as a dark pattern in literature for performing tasks without user consent or confirmation. However, limited studies have been reported on how users experience the sense of autonomy when digital systems fully or partially bypass consent. More research is required to understand what makes automaticity a threat to autonomy. To address this gap, a qualitative interview study with 10 users was conducted to investigate the user experience of Microsoft Windows updates. It was found that ten design features of Windows updates impact the autonomy experience. For each design feature, the contextual factors which influence its impact on autonomy were also noted. The findings of this paper can help designers understand the ethical concerns posed by automaticity in design and identify measures to mitigate these concerns. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: 6 pages, 2 figures

arXiv:2311.07463 [pdf, other]

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Authors: Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, Sunayana Sitaram

Abstract: There has been a surge in LLM evaluation research to understand LLM capabilities and limitations. However, much of this research has been confined to English, leaving LLM building and evaluation for non-English languages relatively unexplored. Several new LLMs have been introduced recently, necessitating their evaluation on non-English languages. This study aims to perform a thorough evaluation of… ▽ More There has been a surge in LLM evaluation research to understand LLM capabilities and limitations. However, much of this research has been confined to English, leaving LLM building and evaluation for non-English languages relatively unexplored. Several new LLMs have been introduced recently, necessitating their evaluation on non-English languages. This study aims to perform a thorough evaluation of the non-English capabilities of SoTA LLMs (GPT-3.5-Turbo, GPT-4, PaLM2, Gemini-Pro, Mistral, Llama2, and Gemma) by comparing them on the same set of multilingual datasets. Our benchmark comprises 22 datasets covering 83 languages, including low-resource African languages. We also include two multimodal datasets in the benchmark and compare the performance of LLaVA models, GPT-4-Vision and Gemini-Pro-Vision. Our experiments show that larger models such as GPT-4, Gemini-Pro and PaLM2 outperform smaller models on various tasks, notably on low-resource languages, with GPT-4 outperforming PaLM2 and Gemini-Pro on more datasets. We also perform a study on data contamination and find that several models are likely to be contaminated with multilingual evaluation benchmarks, necessitating approaches to detect and handle contamination while assessing the multilingual performance of LLMs. △ Less

Submitted 2 April, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: 40 pages, 35 figures and 34 tables

arXiv:2305.10528 [pdf, other]

Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems

Authors: Sarthak Ahuja, Mohammad Kachuee, Fateme Sheikholeslami, Weiqing Liu, Jaeyoung Do

Abstract: Off-Policy reinforcement learning has been a driving force for the state-of-the-art conversational AIs leading to more natural humanagent interactions and improving the user satisfaction for goal-oriented agents. However, in large-scale commercial settings, it is often challenging to balance between policy improvements and experience continuity on the broad spectrum of applications handled by such… ▽ More Off-Policy reinforcement learning has been a driving force for the state-of-the-art conversational AIs leading to more natural humanagent interactions and improving the user satisfaction for goal-oriented agents. However, in large-scale commercial settings, it is often challenging to balance between policy improvements and experience continuity on the broad spectrum of applications handled by such system. In the literature, off-policy evaluation and guard-railing on aggregate statistics has been commonly used to address this problem. In this paper, we propose a method for curating and leveraging high-precision samples sourced from historical regression incident reports to validate, safe-guard, and improve policies prior to the online deployment. We conducted extensive experiments using data from a real-world conversational system and actual regression incidents. The proposed method is currently deployed in our production system to protect customers against broken experiences and enable long-term policy improvements. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023 Industry Track

arXiv:2305.07961 [pdf, other]

Leveraging Large Language Models in Conversational Recommender Systems

Authors: Luke Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexi Chen, Manoj Tiwari

Abstract: A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this pa… ▽ More A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this paradigm. However, effectively leveraging LLMs within a CRS introduces new technical challenges, including properly understanding and controlling a complex conversation and retrieving from external sources of information. These issues are exacerbated by a large, evolving item corpus and a lack of conversational data for training. In this paper, we provide a roadmap for building an end-to-end large-scale CRS using LLMs. In particular, we propose new implementations for user preference understanding, flexible dialogue management and explainable recommendations as part of an integrated architecture powered by LLMs. For improved personalization, we describe how an LLM can consume interpretable natural language user profiles and use them to modulate session-level context. To overcome conversational data limitations in the absence of an existing production CRS, we propose techniques for building a controllable LLM-based user simulator to generate synthetic conversations. As a proof of concept we introduce RecLLM, a large-scale CRS for YouTube videos built on LaMDA, and demonstrate its fluency and diverse functionality through some illustrative example conversations. △ Less

Submitted 16 May, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

arXiv:2204.07135 [pdf, other]

Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems

Authors: Mohammad Kachuee, **seok Nam, Sarthak Ahuja, **-Myung Won, Sung** Lee

Abstract: Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based… ▽ More Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based learning were suggested. However, these approaches: (a) do not scale in terms of the number of skills and skill on-boarding, (b) require a very costly expert annotation/rule-design, (c) introduce risks in the user experience with each model update. In this paper, we present a scalable self-learning approach to explore routing alternatives without causing abrupt policy changes that break the user experience, learn from the user interaction, and incrementally improve the routing via frequent model refreshes. To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment decisions without any need for lengthy A/B experimentation. We conduct various offline and online A/B experiments on a commercial large-scale conversational system to demonstrate the effectiveness of the proposed method in real-world production settings. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: NAACL 2022

arXiv:2110.12246 [pdf, other]

Parametric Variational Linear Units (PVLUs) in Deep Convolutional Networks

Authors: Aarush Gupta, Shikhar Ahuja

Abstract: The Rectified Linear Unit is currently a state-of-the-art activation function in deep convolutional neural networks. To combat ReLU's dying neuron problem, we propose the Parametric Variational Linear Unit (PVLU), which adds a sinusoidal function with trainable coefficients to ReLU. Along with introducing nonlinearity and non-zero gradients across the entire real domain, PVLU acts as a mechanism o… ▽ More The Rectified Linear Unit is currently a state-of-the-art activation function in deep convolutional neural networks. To combat ReLU's dying neuron problem, we propose the Parametric Variational Linear Unit (PVLU), which adds a sinusoidal function with trainable coefficients to ReLU. Along with introducing nonlinearity and non-zero gradients across the entire real domain, PVLU acts as a mechanism of fine-tuning when implemented in the context of transfer learning. On a simple, non-transfer sequential CNN, PVLU substitution allowed for relative error decreases of 16.3% and 11.3% (without and with data augmentation) on CIFAR-100. PVLU is also tested on transfer learning models. The VGG-16 and VGG-19 models experience relative error reductions of 9.5% and 10.7% on CIFAR-10, respectively, after the substitution of ReLU with PVLU. When training on Gaussian-filtered CIFAR-10 images, similar improvements are noted for the VGG models. Most notably, fine-tuning using PVLU allows for relative error reductions up to and exceeding 10% for near state-of-the-art residual neural network architectures on the CIFAR datasets. △ Less

Submitted 16 December, 2021; v1 submitted 23 October, 2021; originally announced October 2021.

Comments: Both authors contributed equally to this research

arXiv:1712.03724 [pdf, other]

Cogniculture: Towards a Better Human-Machine Co-evolution

Authors: Rakesh R Pimplikar, Kushal Mukherjee, Gyana Parija, Harit Vishwakarma, Ramasuri Narayanam, Sarthak Ahuja, Rohith D Vallam, Ritwik Chaudhuri, Joydeep Mondal

Abstract: Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opi… ▽ More Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opinions regarding the pros and cons of AI mimicking human behavior. Instead of worrying about AI advancements, we propose a novel idea of cognitive agents, including both human and machines, living together in a complex adaptive ecosystem, collaborating on human computation for producing essential social goods while promoting sustenance, survival and evolution of the agents' life cycle. We highlight several research challenges and technology barriers in achieving this goal. We propose a governance mechanism around this ecosystem to ensure ethical behaviors of all cognitive agents. Along with a novel set of use-cases of Cogniculture, we discuss the road map ahead for this journey. △ Less

Submitted 11 December, 2017; originally announced December 2017.

arXiv:1509.07075 [pdf, other]

doi 10.1002/rob.21616

3D Scan Registration using Curvelet Features in Planetary Environments

Authors: Siddhant Ahuja, Peter Iles, Steven L. Waslander

Abstract: Topographic map** in planetary environments relies on accurate 3D scan registration methods. However, most global registration algorithms relying on features such as FPFH and Harris-3D show poor alignment accuracy in these settings due to the poor structure of the Mars-like terrain and variable resolution, occluded, sparse range data that is hard to register without some a-priori knowledge of th… ▽ More Topographic map** in planetary environments relies on accurate 3D scan registration methods. However, most global registration algorithms relying on features such as FPFH and Harris-3D show poor alignment accuracy in these settings due to the poor structure of the Mars-like terrain and variable resolution, occluded, sparse range data that is hard to register without some a-priori knowledge of the environment. In this paper, we propose an alternative approach to 3D scan registration using the curvelet transform that performs multi-resolution geometric analysis to obtain a set of coefficients indexed by scale (coarsest to finest), angle and spatial position. Features are detected in the curvelet domain to take advantage of the directional selectivity of the transform. A descriptor is computed for each feature by calculating the 3D spatial histogram of the image gradients, and nearest neighbor based matching is used to calculate the feature correspondences. Correspondence rejection using Random Sample Consensus identifies inliers, and a locally optimal Singular Value Decomposition-based estimation of the rigid-body transformation aligns the laser scans given the re-projected correspondences in the metric space. Experimental results on a publicly available data-set of planetary analogue indoor facility, as well as simulated and real-world scans from Neptec Design Group's IVIGMS 3D laser rangefinder at the outdoor CSA Mars yard demonstrates improved performance over existing methods in the challenging sparse Mars-like terrain. △ Less

Submitted 23 September, 2015; originally announced September 2015.

Comments: 27 pages in Journal of Field Robotics, 2015

arXiv:1206.3667 [pdf]

Information Retrieval in Intelligent Systems: Current Scenario & Issues

Authors: Sudhir Ahuja, Mr. Rinkaj Goyal

Abstract: Web space is the huge repository of data. Everyday lots of new information get added to this web space. The more the information, more is demand for tools to access that information. Answering users' queries about the online information intelligently is one of the great challenges in information retrieval in intelligent systems. In this paper, we will start with the brief introduction on informati… ▽ More Web space is the huge repository of data. Everyday lots of new information get added to this web space. The more the information, more is demand for tools to access that information. Answering users' queries about the online information intelligently is one of the great challenges in information retrieval in intelligent systems. In this paper, we will start with the brief introduction on information retrieval and intelligent systems and explain how swoogle, the semantic search engine, uses its algorithms and techniques to search for the desired contents in the web. We then continue with the clustering technique that is used to group the similar things together and discuss the machine learning technique called Self-organizing maps [6] or SOM, which is a data visualization technique that reduces the dimensions of data through the use of self-organizing neural networks. We then discuss how SOM is used to visualize the contents of the data, by following some lines of algorithm, in the form of maps. So, we could say that websites or machines can be used to retrieve the information that what exactly users want from them. △ Less

Submitted 16 June, 2012; originally announced June 2012.

arXiv:1005.4264 [pdf]

Bio-Authentication based Secure Transmission System using Steganography

Authors: Najme Zehra, Mansi Sharma, Somya Ahuja, Shubha Bansal

Abstract: Biometrics deals with identity verification of an individual by using certain physiological or behavioral features associated with a person. Biometric identification systems using fingerprints patterns are called AFIS (Automatic Fingerprint Identification System). In this paper a composite method for Fingerprint recognition is considered using a combination of Fast Fourier Transform (FFT) and Sobe… ▽ More Biometrics deals with identity verification of an individual by using certain physiological or behavioral features associated with a person. Biometric identification systems using fingerprints patterns are called AFIS (Automatic Fingerprint Identification System). In this paper a composite method for Fingerprint recognition is considered using a combination of Fast Fourier Transform (FFT) and Sobel Filters for improvement of a poor quality fingerprint image. Steganography hides messages inside other messages in such a way that an "adversary" would not even know a secret message were present. The objective of our paper is to make a bio-secure system. In this paper bio-authentication has been implemented in terms of finger print recognition and the second part of the paper is an interactive steganographic system hides the user's data by two options- creating a songs list or hiding the data in an image. △ Less

Submitted 24 May, 2010; originally announced May 2010.

Comments: IEEE Publication format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 8 No. 1, April 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis/

Showing 1–13 of 13 results for author: Ahuja, S