Search | arXiv e-print repository

DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning

Authors: Suyash Vardhan Mathur, Akshett Rai **dal, Manish Shrivastava

Abstract: While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective and defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions… ▽ More While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective and defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions -- Sentence Puzzles and Word Puzzles that defy conventional common-sense reasoning and constraints. In this paper, we tackle both types of questions using few-shot prompting on GPT-3.5 and gain insights regarding the difference in the nature of the two types. Our prompting strategy placed us 26th on the leaderboard for the Sentence Puzzle and 15th on the Word Puzzle task. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2404.02088 [pdf, other]

LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task

Authors: Suyash Vardhan Mathur, Akshett Rai **dal, Hardik Mittal, Manish Shrivastava

Abstract: Conversation is the most natural form of human communication, where each utterance can range over a variety of possible emotions. While significant work has been done towards the detection of emotions in text, relatively little work has been done towards finding the cause of the said emotions, especially in multimodal settings. SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis… ▽ More Conversation is the most natural form of human communication, where each utterance can range over a variety of possible emotions. While significant work has been done towards the detection of emotions in text, relatively little work has been done towards finding the cause of the said emotions, especially in multimodal settings. SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis in Conversations, which aims to extract emotions reflected in individual utterances in a conversation involving multiple modalities (textual, audio, and visual modalities) along with the corresponding utterances that were the cause for the emotion. In this paper, we propose models that tackle this task as an utterance labeling and a sequence labeling problem and perform a comparative study of these models, involving baselines using different encoders, using BiLSTM for adding contextual information of the conversation, and finally adding a CRF layer to try to model the inter-dependencies between adjacent utterances more effectively. In the official leaderboard for the task, our architecture was ranked 8th, achieving an F1-score of 0.1759 on the leaderboard. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2303.03975 [pdf, other]

GATE: A Challenge Set for Gender-Ambiguous Translation Examples

Authors: Spencer Rarrick, Ranjita Naik, Varun Mathur, Sundar Poudel, Vishal Chowdhary

Abstract: Although recent years have brought significant progress in improving translation of unambiguously gendered sentences, translation of ambiguously gendered input remains relatively unexplored. When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias. Recent work has led to the development of "gender rewriters" that generat… ▽ More Although recent years have brought significant progress in improving translation of unambiguously gendered sentences, translation of ambiguously gendered input remains relatively unexplored. When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias. Recent work has led to the development of "gender rewriters" that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage. To encourage better performance on this task we present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations. We also provide tools for evaluation and system analysis when using GATE and use them to evaluate our translation rewriter system. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2206.15469 [pdf, other]

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Authors: Siddhant Haldar, Vaibhav Mathur, Denis Yarats, Lerrel Pinto

Abstract: Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions… ▽ More Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8X faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks. △ Less

Submitted 20 February, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: Code and robot videos are available on https://rot-robot.github.io/

arXiv:2110.01188 [pdf, other]

LawSum: A weakly supervised approach for Indian Legal Document Summarization

Authors: Vedant Parikh, Vidit Mathur, Parth Mehta, Namita Mittal, Prasenjit Majumder

Abstract: Unlike the courts in western countries, public records of Indian judiciary are completely unstructured and noisy. No large scale publicly available annotated datasets of Indian legal documents exist till date. This limits the scope for legal analytics research. In this work, we propose a new dataset consisting of over 10,000 judgements delivered by the supreme court of India and their correspondin… ▽ More Unlike the courts in western countries, public records of Indian judiciary are completely unstructured and noisy. No large scale publicly available annotated datasets of Indian legal documents exist till date. This limits the scope for legal analytics research. In this work, we propose a new dataset consisting of over 10,000 judgements delivered by the supreme court of India and their corresponding hand written summaries. The proposed dataset is pre-processed by normalising common legal abbreviations, handling spelling variations in named entities, handling bad punctuations and accurate sentence tokenization. Each sentence is tagged with their rhetorical roles. We also annotate each judgement with several attributes like date, names of the plaintiffs, defendants and the people representing them, judges who delivered the judgement, acts/statutes that are cited and the most common citations used to refer the judgement. Further, we propose an automatic labelling technique for identifying sentences which have summary worthy information. We demonstrate that this auto labeled data can be used effectively to train a weakly supervised sentence extractor with high accuracy. Some possible applications of this dataset besides legal document summarization can be in retrieval, citation analysis and prediction of decisions by a particular judge. △ Less

Submitted 23 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

arXiv:2005.12378 [pdf, ps, other]

Artificial Intelligence for Global Health: Learning From a Decade of Digital Transformation in Health Care

Authors: Varoon Mathur, Saptarshi Purkayastha, Judy Wawira Gichoya

Abstract: The health needs of those living in resource-limited settings are a vastly overlooked and understudied area in the intersection of machine learning (ML) and health care. While the use of ML in health care is more recently popularized over the last few years from the advancement of deep learning, low-and-middle income countries (LMICs) have already been undergoing a digital transformation of their… ▽ More The health needs of those living in resource-limited settings are a vastly overlooked and understudied area in the intersection of machine learning (ML) and health care. While the use of ML in health care is more recently popularized over the last few years from the advancement of deep learning, low-and-middle income countries (LMICs) have already been undergoing a digital transformation of their own in health care over the last decade, leapfrogging milestones due to the adoption of mobile health (mHealth). With the introduction of new technologies, it is common to start afresh with a top-down approach, and implement these technologies in isolation, leading to lack of use and a waste of resources. In this paper, we outline the necessary considerations both from the perspective of current gaps in research, as well as from the lived experiences of health care professionals in resource-limited settings. We also outline briefly several key components of successful implementation and deployment of technologies within health systems in LMICs, including technical and cultural considerations in the development process relevant to the building of machine learning solutions. We then draw on these experiences to address where key opportunities for impact exist in resource-limited settings, and where AI/ML can provide the most benefit. △ Less

Submitted 27 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

Comments: Accepted Paper at ICLR 2020 Workshop on Practical ML for Develo** Countries

arXiv:1912.00323 [pdf, other]

HCA-DBSCAN: HyperCube Accelerated Density Based Spatial Clustering for Applications with Noise

Authors: Vinayak Mathur, **esh Mehta, Sanjay Singh

Abstract: Density-based clustering has found numerous applications across various domains. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is capable of finding clusters of varied shapes that are not linearly separable, at the same time it is not sensitive to outliers in the data. Combined with the fact that the number of clusters in the data are not required apriori makes… ▽ More Density-based clustering has found numerous applications across various domains. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is capable of finding clusters of varied shapes that are not linearly separable, at the same time it is not sensitive to outliers in the data. Combined with the fact that the number of clusters in the data are not required apriori makes DBSCAN really powerfully. Slower performance (O(n2)) limits its applications. In this work, we present a new clustering algorithm, the HyperCube Accelerated DBSCAN(HCA-DBSCAN) which uses a combination of distance-based aggregation by overlaying the data with customized grids. We use representative points to reduce the number of comparisons that need to be computed. Experimental results show that the proposed algorithm achieves a significant run time speedup of up to 58.27% when compared to other improvements that try to reduce the time complexity of theDBSCAN algorithm △ Less

Submitted 1 December, 2019; originally announced December 2019.

Comments: 9 pages, Sets and Partitions workshop at NeurIPS 2019

arXiv:1904.06722 [pdf, other]

doi 10.1145/2984511.2984542

Boomerang: Rebounding the Consequences of Reputation Feedback on Crowdsourcing Platforms

Authors: Snehalkumar, S. Gaikwad, Durim Morina, Adam Ginzberg, Catherine Mullings, Shirish Goyal, Dilrukshi Gamage, Christopher Diemert, Mathias Burton, Sharon Zhou, Mark Whiting, Karolina Ziulkoski, Alipta Ballav, Aaron Gilbee, Senadhipathige S. Niranga, Vibhor Sehgal, Jasmine Lin, Leonardy Kristianto, Angela Richmond-Fuller, Jeff Regino, Nalin Chhibber, Dinesh Majeti, Sachin Sharma, Kamila Mananova, Dinesh Dhakal , et al. (13 additional authors not shown)

Abstract: Paid crowdsourcing platforms suffer from low-quality work and unfair rejections, but paradoxically, most workers and requesters have high reputation scores. These inflated scores, which make high-quality work and workers difficult to find, stem from social pressure to avoid giving negative feedback. We introduce Boomerang, a reputation system for crowdsourcing that elicits more accurate feedback b… ▽ More Paid crowdsourcing platforms suffer from low-quality work and unfair rejections, but paradoxically, most workers and requesters have high reputation scores. These inflated scores, which make high-quality work and workers difficult to find, stem from social pressure to avoid giving negative feedback. We introduce Boomerang, a reputation system for crowdsourcing that elicits more accurate feedback by rebounding the consequences of feedback directly back onto the person who gave it. With Boomerang, requesters find that their highly-rated workers gain earliest access to their future tasks, and workers find tasks from their highly-rated requesters at the top of their task feed. Field experiments verify that Boomerang causes both workers and requesters to provide feedback that is more closely aligned with their private opinions. Inspired by a game-theoretic notion of incentive-compatibility, Boomerang opens opportunities for interaction design to incentivize honest reporting over strategic dishonesty. △ Less

Submitted 14 April, 2019; originally announced April 2019.

ACM Class: H.5.3; H.1.2; J.4; K.4.4; K.4.3

Journal ref: Proceedings of the 29th Annual Symposium on User Interface Software and Technology, 2016

arXiv:1903.09639 [pdf, other]

Understanding Childhood Vulnerability in The City of Surrey

Authors: Cody Griffith, Varoon Mathur, Catherine Lin, Kevin Zhu

Abstract: Understanding the community conditions that best support universal access and improved childhood outcomes allows ultimately to improve decision-making in the areas of planning and investment across the early stages of childhood development. Here we describe two different data-driven approaches to visualizing the lived experiences of children throughout the City of Surrey, combining data derived fr… ▽ More Understanding the community conditions that best support universal access and improved childhood outcomes allows ultimately to improve decision-making in the areas of planning and investment across the early stages of childhood development. Here we describe two different data-driven approaches to visualizing the lived experiences of children throughout the City of Surrey, combining data derived from both public and private sources. In one approach, we find specifically that the Early Development Instrument measuring childhood vulnerabilities across varying domains can be used to cluster neighborhoods, and that census variables can help explain similarities between neighborhoods within these clusters. In our second approach, we use program registration data from the City of Surrey's Community and Recreation Services Division. We also find a critical age of entry and exit for each program related to early childhood development and beyond, and find that certain neighborhoods and recreational programs have larger retention rates than others. This report details the journey of using data to tell the story of these neighborhoods, and provides a lens to which community initiatives can be strategically crafted through their use. △ Less

Submitted 25 March, 2019; originally announced March 2019.

arXiv:1901.01815 [pdf, ps, other]

Literature Review: Smart Contract Semantics

Authors: Varun Mathur

Abstract: This review presents and evaluates various formalisms for the purpose of modelling the semantics of financial derivatives contracts. The formalism proposed by Lee is selected as the best candidate among those initially reviewed. Further examination and evaluation of this formalism is done. This review presents and evaluates various formalisms for the purpose of modelling the semantics of financial derivatives contracts. The formalism proposed by Lee is selected as the best candidate among those initially reviewed. Further examination and evaluation of this formalism is done. △ Less

Submitted 22 December, 2018; originally announced January 2019.

arXiv:1811.09878 [pdf, ps, other]

Hydra: A Peer to Peer Distributed Training & Data Collection Framework

Authors: Vaibhav Mathur, Karanbir Chahal

Abstract: The world needs diverse and unbiased data to train deep learning models. Currently data comes from a variety of sources that are unmoderated to a large extent. The outcomes of training neural networks with unverified data yields biased models with various strains of homophobia, sexism and racism. Another trend observed in the world of deep learning is the rise of distributed training. Although clo… ▽ More The world needs diverse and unbiased data to train deep learning models. Currently data comes from a variety of sources that are unmoderated to a large extent. The outcomes of training neural networks with unverified data yields biased models with various strains of homophobia, sexism and racism. Another trend observed in the world of deep learning is the rise of distributed training. Although cloud companies provide high performance compute for training models in the form of GPU's connected with a low latency network, using these services comes at a high cost. We propose Hydra, a system that seeks to solve both of these problems in a novel manner by proposing a decentralized distributed framework which utilizes the substantial amount of idle compute of everyday electronic devices like smartphones and desktop computers for training and data collection purposes. Hydra couples a specialized distributed training framework on a network of these low powered devices with a reward scheme that incentivizes users to provide high quality data to unleash the compute capability on this training framework. Such a system has the ability to capture data from a wide variety of diverse sources which has been an issue in the current scenario of deep learning. Hydra brings in several new innovations in training on low powered devices including a fault tolerant version of the All Reduce algorithm. Furthermore we introduce a reinforcement learning policy to decide the size of training jobs on different machines on a heterogeneous cluster of devices with varying network latencies for Synchronous SGD. The novel thing about such a network is the ability of each machine to shut down and resume training capabilities at any point of time without restarting the overall training. To enable such an asynchronous behaviour we propose a communication framework inspired by the Bittorrent protocol and the Kademlia DHT. △ Less

Submitted 24 November, 2018; originally announced November 2018.

Comments: 10 pages. arXiv admin note: text overlap with arXiv:1611.01578 by other authors

arXiv:1804.03257 [pdf, other]

Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings

Authors: Haw-Shiuan Chang, Amol Agrawal, Ananya Ganesh, Anirudha Desai, Vinayak Mathur, Alfred Hough, Andrew McCallum

Abstract: Word sense induction (WSI), which addresses polysemy by unsupervised discovery of multiple word senses, resolves ambiguities for downstream NLP tasks and also makes word representations more interpretable. This paper proposes an accurate and efficient graph-based method for WSI that builds a global non-negative vector embedding basis (which are interpretable like topics) and clusters the basis ind… ▽ More Word sense induction (WSI), which addresses polysemy by unsupervised discovery of multiple word senses, resolves ambiguities for downstream NLP tasks and also makes word representations more interpretable. This paper proposes an accurate and efficient graph-based method for WSI that builds a global non-negative vector embedding basis (which are interpretable like topics) and clusters the basis indexes in the ego network of each polysemous word. By adopting distributional inclusion vector embeddings as our basis formation model, we avoid the expensive step of nearest neighbor search that plagues other graph-based methods without sacrificing the quality of sense clusters. Experiments on three datasets show that our proposed method produces similar or better sense clusters and embeddings compared with previous state-of-the-art methods while being significantly more efficient. △ Less

Submitted 29 May, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

Comments: TextGraphs 2018: the Workshop on Graph-based Methods for Natural Language Processing

arXiv:1803.08419 [pdf, other]

The Rapidly Changing Landscape of Conversational Agents

Authors: Vinayak Mathur, Arpit Singh

Abstract: Conversational agents have become ubiquitous, ranging from goal-oriented systems for hel** with reservations to chit-chat models found in modern virtual assistants. In this survey paper, we explore this fascinating field. We look at some of the pioneering work that defined the field and gradually move to the current state-of-the-art models. We look at statistical, neural, generative adversarial… ▽ More Conversational agents have become ubiquitous, ranging from goal-oriented systems for hel** with reservations to chit-chat models found in modern virtual assistants. In this survey paper, we explore this fascinating field. We look at some of the pioneering work that defined the field and gradually move to the current state-of-the-art models. We look at statistical, neural, generative adversarial network based and reinforcement learning based approaches and how they evolved. Along the way we discuss various challenges that the field faces, lack of context in utterances, not having a good quantitative metric to compare models, lack of trust in agents because they do not have a consistent persona etc. We structure this paper in a way that answers these pertinent questions and discusses competing approaches to solve them. △ Less

Submitted 24 March, 2018; v1 submitted 22 March, 2018; originally announced March 2018.

Comments: 14 pages, 7 figures. arXiv admin note: text overlap with arXiv:1704.07130, arXiv:1507.04808, arXiv:1603.06155, arXiv:1611.06997, arXiv:1704.08966 by other authors

arXiv:cs/0611087 [pdf, ps, other]

A Combined LIFO-Priority Scheme for Overload Control of E-commerce Web Servers

Authors: Naresh Singhmar, Vipul Mathur, Varsha Apte, D. Manjunath

Abstract: E-commerce Web-servers often face overload conditions during which revenue-generating requests may be dropped or abandoned due to an increase in the browsing requests. In this paper we present a simple, yet effective, mechanism for overload control of E-commerce Web-servers. We develop an E-commerce workload model that separates the browsing requests from revenue-generating transaction requests.… ▽ More E-commerce Web-servers often face overload conditions during which revenue-generating requests may be dropped or abandoned due to an increase in the browsing requests. In this paper we present a simple, yet effective, mechanism for overload control of E-commerce Web-servers. We develop an E-commerce workload model that separates the browsing requests from revenue-generating transaction requests. During overload, we apply LIFO discipline in the browsing queues and use a dynamic priority model to service them. The transaction queues are given absolute priority over the browsing queues. This is called the LIFO-Pri scheduling discipline. Experimental results show that LIFO-Pri dramatically improves the overall Web-server throughput while also increasing the completion rate of revenue-generating requests. The Web-server was able to operate at nearly 60% of its maximum capacity even when offered load was 1.5 times its capacity. Further, when compared to a single queue FIFO system, there was a seven-fold increase in the number of completed revenue-generating requests during overload. △ Less

Submitted 17 November, 2006; originally announced November 2006.

Comments: 10 pages, 8 figures, presented at the International Infrastructure Survivability Workshop (affiliated with the 25th IEEE International Real-Time Systems Symposium), Lisbon, Portugal, December 2004

Showing 1–14 of 14 results for author: Mathur, V