Search | arXiv e-print repository

Alexa, Let's Work Together: Introducing the First Alexa Prize TaskBot Challenge on Conversational Task Assistance

Authors: Anna Gottardi, Osman Ipek, Giuseppe Castellucci, Shui Hu, Lavina Vaz, Yao Lu, Anju Khatri, Anjali Chadha, Desheng Zhang, Sattvik Sahai, Prerna Dwivedi, Hangjie Shi, Lucy Hu, Andy Huang, Luke Dai, Bofei Yang, Varun Somani, Pankaj Rajan, Ron Rezac, Michael Johnston, Savanna Stiff, Leslie Ball, David Carmel, Yang Liu, Dilek Hakkani-Tur , et al. (5 additional authors not shown)

Abstract: Since its inception in 2016, the Alexa Prize program has enabled hundreds of university students to explore and compete to develop conversational agents through the SocialBot Grand Challenge. The goal of the challenge is to build agents capable of conversing coherently and engagingly with humans on popular topics for 20 minutes, while achieving an average rating of at least 4.0/5.0. However, as co… ▽ More Since its inception in 2016, the Alexa Prize program has enabled hundreds of university students to explore and compete to develop conversational agents through the SocialBot Grand Challenge. The goal of the challenge is to build agents capable of conversing coherently and engagingly with humans on popular topics for 20 minutes, while achieving an average rating of at least 4.0/5.0. However, as conversational agents attempt to assist users with increasingly complex tasks, new conversational AI techniques and evaluation platforms are needed. The Alexa Prize TaskBot challenge, established in 2021, builds on the success of the SocialBot challenge by introducing the requirements of interactively assisting humans with real-world Cooking and Do-It-Yourself tasks, while making use of both voice and visual modalities. This challenge requires the TaskBots to identify and understand the user's need, identify and integrate task and domain knowledge into the interaction, and develop new ways of engaging the user without distracting them from the task at hand, among other challenges. This paper provides an overview of the TaskBot challenge, describes the infrastructure support provided to the teams with the CoBot Toolkit, and summarizes the approaches the participating teams took to overcome the research challenges. Finally, it analyzes the performance of the competing TaskBots during the first year of the competition. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 14 pages, Proceedings of Alexa Prize Taskbot (Alexa Prize 2021)

ACM Class: I.2.7; J.0; H.5.1; H.5.2

arXiv:2206.00167 [pdf, other]

Understanding How People Rate Their Conversations

Authors: Alexandros Papangelis, Nicole Chartier, Pankaj Rajan, Julia Hirschberg, Dilek Hakkani-Tur

Abstract: User ratings play a significant role in spoken dialogue systems. Typically, such ratings tend to be averaged across all users and then utilized as feedback to improve the system or personalize its behavior. While this method can be useful to understand broad, general issues with the system and its behavior, it does not take into account differences between users that affect their ratings. In this… ▽ More User ratings play a significant role in spoken dialogue systems. Typically, such ratings tend to be averaged across all users and then utilized as feedback to improve the system or personalize its behavior. While this method can be useful to understand broad, general issues with the system and its behavior, it does not take into account differences between users that affect their ratings. In this work, we conduct a study to better understand how people rate their interactions with conversational agents. One macro-level characteristic that has been shown to correlate with how people perceive their inter-personal communication is personality. We specifically focus on agreeableness and extraversion as variables that may explain variation in ratings and therefore provide a more meaningful signal for training or personalization. In order to elicit those personality traits during an interaction with a conversational agent, we designed and validated a fictional story, grounded in prior work in psychology. We then implemented the story into an experimental conversational agent that allowed users to opt-in to hearing the story. Our results suggest that for human-conversational agent interactions, extraversion may play a role in user ratings, but more data is needed to determine if the relationship is significant. Agreeableness, on the other hand, plays a statistically significant role in conversation ratings: users who are more agreeable are more likely to provide a higher rating for their interaction. In addition, we found that users who opted to hear the story were, in general, more likely to rate their conversational experience higher than those who did not. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: Published at IWSDS 2021

arXiv:2203.00763 [pdf, other]

Multi-Sentence Knowledge Selection in Open-Domain Dialogue

Authors: Mihail Eric, Nicole Chartier, Behnam Hedayatnia, Karthik Gopalakrishnan, Pankaj Rajan, Yang Liu, Dilek Hakkani-Tur

Abstract: Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task (Dinan et al., 2019), such as the existence of a single relevant knowledge sentence per context. In this work, w… ▽ More Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task (Dinan et al., 2019), such as the existence of a single relevant knowledge sentence per context. In this work, we evaluate the existing state of open-domain conversation knowledge selection, showing where the existing methodologies regarding data and evaluation are flawed. We then improve on them by proposing a new framework for collecting relevant knowledge, and create an augmented dataset based on the Wizard of Wikipedia (WOW) corpus, which we call WOW++. WOW++ averages 8 relevant knowledge sentences per dialogue context, embracing the inherent ambiguity of open-domain dialogue knowledge selection. We then benchmark various knowledge ranking algorithms on this augmented dataset with both intrinsic evaluation and extrinsic measures of response quality, showing that neural rerankers that use WOW++ can outperform rankers trained on standard datasets. △ Less

Submitted 4 October, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: Accepted at INLG 2021. 11 pages, 5 tables, 8 figures

arXiv:2111.10897 [pdf, other]

Health Monitoring of Industrial machines using Scene-Aware Threshold Selection

Authors: Arshdeep Singh, Raju Arvind, Padmanabhan Rajan

Abstract: This paper presents an autoencoder based unsupervised approach to identify anomaly in an industrial machine using sounds produced by the machine. The proposed framework is trained using log-melspectrogram representations of the sound signal. In classification, our hypothesis is that the reconstruction error computed for an abnormal machine is larger than that of the a normal machine, since only no… ▽ More This paper presents an autoencoder based unsupervised approach to identify anomaly in an industrial machine using sounds produced by the machine. The proposed framework is trained using log-melspectrogram representations of the sound signal. In classification, our hypothesis is that the reconstruction error computed for an abnormal machine is larger than that of the a normal machine, since only normal machine sounds are being used to train the autoencoder. A threshold is chosen to discriminate between normal and abnormal machines. However, the threshold changes as surrounding conditions vary. To select an appropriate threshold irrespective of the surrounding, we propose a scene classification framework, which can classify the underlying surrounding. Hence, the threshold can be selected adaptively irrespective of the surrounding. The experiment evaluation is performed on MIMII dataset for industrial machines namely fan, pump, valve and slide rail. Our experiment analysis shows that utilizing adaptive threshold, the performance improves significantly as that obtained using the fixed threshold computed for a given surrounding only. △ Less

Submitted 21 November, 2021; originally announced November 2021.

Comments: 5 pages, 4 figures, 1 Table

arXiv:2007.01833 [pdf, other]

PsychFM: Predicting your next gamble

Authors: Prakash Rajan, Krishna P. Miyapuram

Abstract: There is a sudden surge to model human behavior due to its vast and diverse applications which includes modeling public policies, economic behavior and consumer behavior. Most of the human behavior itself can be modeled into a choice prediction problem. Prospect theory is a theoretical model that tries to explain the anomalies in choice prediction. These theories perform well in terms of explainin… ▽ More There is a sudden surge to model human behavior due to its vast and diverse applications which includes modeling public policies, economic behavior and consumer behavior. Most of the human behavior itself can be modeled into a choice prediction problem. Prospect theory is a theoretical model that tries to explain the anomalies in choice prediction. These theories perform well in terms of explaining the anomalies but they lack precision. Since the behavior is person dependent, there is a need to build a model that predicts choices on a per-person basis. Looking on at the average persons choice may not necessarily throw light on a particular person's choice. Modeling the gambling problem on a per person basis will help in recommendation systems and related areas. A novel hybrid model namely psychological factorisation machine ( PsychFM ) has been proposed that involves concepts from machine learning as well as psychological theories. It outperforms the popular existing models namely random forest and factorisation machines for the benchmark dataset CPC-18. Finally,the efficacy of the proposed hybrid model has been verified by comparing with the existing models. △ Less

Submitted 3 July, 2020; originally announced July 2020.

Comments: To be published in International Joint Conference on Neural Networks (IJCNN) 2020 conference

arXiv:1903.10713 [pdf, other]

doi 10.1121/1.5118245

Multiscale CNN based Deep Metric Learning for Bioacoustic Classification: Overcoming Training Data Scarcity Using Dynamic Triplet Loss

Authors: Anshul Thakur, Daksh Thapar, Padmanabhan Rajan, Aditya Nigam

Abstract: This paper proposes multiscale convolutional neural network (CNN)-based deep metric learning for bioacoustic classification, under low training data conditions. The proposed CNN is characterized by the utilization of four different filter sizes at each level to analyze input feature maps. This multiscale nature helps in describing different bioacoustic events effectively: smaller filters help in l… ▽ More This paper proposes multiscale convolutional neural network (CNN)-based deep metric learning for bioacoustic classification, under low training data conditions. The proposed CNN is characterized by the utilization of four different filter sizes at each level to analyze input feature maps. This multiscale nature helps in describing different bioacoustic events effectively: smaller filters help in learning the finer details of bioacoustic events, whereas, larger filters help in analyzing a larger context leading to global details. A dynamic triplet loss is employed in the proposed CNN architecture to learn a transformation from the input space to the embedding space, where classification is performed. The triplet loss helps in learning this transformation by analyzing three examples, referred to as triplets, at a time where intra-class distance is minimized while maximizing the inter-class separation by a dynamically increasing margin. The number of possible triplets increases cubically with the dataset size, making triplet loss more suitable than the softmax cross-entropy loss in low training data conditions. Experiments on three different publicly available datasets show that the proposed framework performs better than existing bioacoustic classification frameworks. Experimental results also confirm the superiority of the triplet loss over the cross-entropy loss in low training data conditions △ Less

Submitted 27 March, 2019; v1 submitted 26 March, 2019; originally announced March 2019.

Comments: Under Review at JASA. Primitive version of paper. We are still working on getting better performances out of the comparative methods

arXiv:1902.09765 [pdf, other]

Directional Embedding Based Semi-supervised Framework For Bird Vocalization Segmentation

Authors: Anshul Thakur, Padmanabhan Rajan

Abstract: This paper proposes a data-efficient, semi-supervised, two-pass framework for segmenting bird vocalizations. The framework utilizes a binary classification model to categorize frames of an input audio recording into the background or bird vocalization. The first pass of the framework automatically generates training labels from the input recording itself, while model training and classification is… ▽ More This paper proposes a data-efficient, semi-supervised, two-pass framework for segmenting bird vocalizations. The framework utilizes a binary classification model to categorize frames of an input audio recording into the background or bird vocalization. The first pass of the framework automatically generates training labels from the input recording itself, while model training and classification is done during the second pass. The proposed framework utilizes a reference directional model for obtaining a feature representation called directional embeddings (DE). This reference directional model acts as an acoustic model for bird vocalizations and is obtained using the mixtures of Von-Mises Fisher distribution (moVMF). The proposed DE space only contains information about bird vocalizations, while no information about the background disturbances is reflected. The framework employs supervised information only for obtaining the reference directional model and avoids the background modeling. Hence, it can be regarded as semi-supervised in nature. The proposed framework is tested on approximately 79000 vocalizations of seven different bird species. The performance of the framework is also analyzed in the presence of noise at different SNRs. Experimental results convey that the proposed framework performs better than the existing bird vocalization segmentation methods. △ Less

Submitted 26 February, 2019; originally announced February 2019.

Comments: Accepted for publication in Applied Acoustics

arXiv:1902.02498 [pdf, other]

Conv-codes: Audio Hashing For Bird Species Classification

Authors: Anshul Thakur, Pulkit Sharma, Vinayak Abrol, Padmanabhan Rajan

Abstract: In this work, we propose a supervised, convex representation based audio hashing framework for bird species classification. The proposed framework utilizes archetypal analysis, a matrix factorization technique, to obtain convex-sparse representations of a bird vocalization. These convex representations are hashed using Bloom filters with non-cryptographic hash functions to obtain compact binary co… ▽ More In this work, we propose a supervised, convex representation based audio hashing framework for bird species classification. The proposed framework utilizes archetypal analysis, a matrix factorization technique, to obtain convex-sparse representations of a bird vocalization. These convex representations are hashed using Bloom filters with non-cryptographic hash functions to obtain compact binary codes, designated as conv-codes. The conv-codes extracted from the training examples are clustered using class-specific k-medoids clustering with Jaccard coefficient as the similarity metric. A hash table is populated using the cluster centers as keys while hash values/slots are pointers to the species identification information. During testing, the hash table is searched to find the species information corresponding to a cluster center that exhibits maximum similarity with the test conv-code. Hence, the proposed framework classifies a bird vocalization in the conv-code space and requires no explicit classifier or reconstruction error calculations. Apart from that, based on min-hash and direct addressing, we also propose a variant of the proposed framework that provides faster and effective classification. The performances of both these frameworks are compared with existing bird species classification frameworks on the audio recordings of 50 different bird species. △ Less

Submitted 7 February, 2019; originally announced February 2019.

Comments: Accepted for presentation at ICASSP 2019

arXiv:1412.3367 [pdf]

Experimenting with Request Assignment Simulator (RAS)

Authors: R. Arokia Paul Rajan, F. Sagayaraj Francis

Abstract: There is no existence of dedicated simulators on the Internet that studies the impact of load balancing principles of the cloud architectures. Request Assignment Simulator (RAS) is a customizable, visual tool that helps to understand the request assignment to the resources based on the load balancing principles. We have designed this simulator to fit into Infrastructure as a Service (IaaS) cloud m… ▽ More There is no existence of dedicated simulators on the Internet that studies the impact of load balancing principles of the cloud architectures. Request Assignment Simulator (RAS) is a customizable, visual tool that helps to understand the request assignment to the resources based on the load balancing principles. We have designed this simulator to fit into Infrastructure as a Service (IaaS) cloud model. In this paper, we present a working manual useful for the conduct of experiment with RAS. The objective of this paper is to instill the user to understand the pertinent parameters in the cloud, their metrics, load balancing principles, and their impact on the performance. △ Less

Submitted 10 December, 2014; originally announced December 2014.

Comments: November 2014, IJCSE

arXiv:1407.4446 [pdf, other]

Probabilistic Group Testing under Sum Observations: A Parallelizable 2-Approximation for Entropy Loss

Authors: Weidong Han, Purnima Rajan, Peter I. Frazier, Bruno M. Jedynak

Abstract: We consider the problem of group testing with sum observations and noiseless answers, in which we aim to locate multiple objects by querying the number of objects in each of a sequence of chosen sets. We study a probabilistic setting with entropy loss, in which we assume a joint Bayesian prior density on the locations of the objects and seek to choose the sets queried to minimize the expected entr… ▽ More We consider the problem of group testing with sum observations and noiseless answers, in which we aim to locate multiple objects by querying the number of objects in each of a sequence of chosen sets. We study a probabilistic setting with entropy loss, in which we assume a joint Bayesian prior density on the locations of the objects and seek to choose the sets queried to minimize the expected entropy of the Bayesian posterior distribution after a fixed number of questions. We present a new non-adaptive policy, called the dyadic policy, show it is optimal among non-adaptive policies, and is within a factor of two of optimal among adaptive policies. This policy is quick to compute, its nonadaptive nature makes it easy to parallelize, and our bounds show it performs well even when compared with adaptive policies. We also study an adaptive greedy policy, which maximizes the one-step expected reduction in entropy, and show that it performs at least as well as the dyadic policy, offering greater query efficiency but reduced parallelism. Numerical experiments demonstrate that both procedures outperform a divide-and-conquer benchmark policy from the literature, called sequential bifurcation, and show how these procedures may be applied in a stylized computer vision problem. △ Less

Submitted 22 September, 2015; v1 submitted 16 July, 2014; originally announced July 2014.

arXiv:1308.1471 [pdf]

Application of Inventory Management Principles for Efficient Data Placement in Storage Networks

Authors: R. Arokia Paul Rajan, F. Sagayaraj Francis

Abstract: The principles and strategies found in material management are comparable and analogue with the data management. This paper concentrates on the conversion of product inventory management principles into data inventory management principles. Efforts were made to enumerate various impacting parameters that would be appropriate to consider if any data inventory model could be plotted. The principles and strategies found in material management are comparable and analogue with the data management. This paper concentrates on the conversion of product inventory management principles into data inventory management principles. Efforts were made to enumerate various impacting parameters that would be appropriate to consider if any data inventory model could be plotted. △ Less

Submitted 7 August, 2013; originally announced August 2013.

Comments: IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No 2, November 2012

arXiv:1308.1303 [pdf]

Evolution of Cloud Storage as Cloud Computing Infrastructure Service

Authors: Arokia Paul Rajan, Shanmugapriyaa

Abstract: Enterprises are driving towards less cost, more availability, agility, managed risk - all of which is accelerated towards Cloud Computing. Cloud is not a particular product, but a way of delivering IT services that are consumable on demand, elastic to scale up and down as needed, and follow a pay-for-usage model. Out of the three common types of cloud computing service models, Infrastructure as a… ▽ More Enterprises are driving towards less cost, more availability, agility, managed risk - all of which is accelerated towards Cloud Computing. Cloud is not a particular product, but a way of delivering IT services that are consumable on demand, elastic to scale up and down as needed, and follow a pay-for-usage model. Out of the three common types of cloud computing service models, Infrastructure as a Service (IaaS) is a service model that provides servers, computing power, network bandwidth and Storage capacity, as a service to their subscribers. Cloud can relate to many things but without the fundamental storage pieces, which is provided as a service namely Cloud Storage, none of the other applications is possible. This paper introduces Cloud Storage, which covers the key technologies in cloud computing and Cloud Storage, management insights about cloud computing, different types of cloud services, driving forces of cloud computing and cloud storage, advantages and challenges of cloud storage and concludes by pinpointing few challenges to be addressed by the cloud storage providers. △ Less

Submitted 5 August, 2013; originally announced August 2013.

Showing 1–12 of 12 results for author: Rajan, P