-
Segment, Shuffle, and Stitch: A Simple Mechanism for Improving Time-Series Representations
Authors:
Shivam Grover,
Amin Jalali,
Ali Etemad
Abstract:
Existing approaches for learning representations of time-series keep the temporal arrangement of the time-steps intact with the presumption that the original order is the most optimal for learning. However, non-adjacent sections of real-world time-series may have strong dependencies. Accordingly we raise the question: Is there an alternative arrangement for time-series which could enable more effe…
▽ More
Existing approaches for learning representations of time-series keep the temporal arrangement of the time-steps intact with the presumption that the original order is the most optimal for learning. However, non-adjacent sections of real-world time-series may have strong dependencies. Accordingly we raise the question: Is there an alternative arrangement for time-series which could enable more effective representation learning? To address this, we propose a simple plug-and-play mechanism called Segment, Shuffle, and Stitch (S3) designed to improve time-series representation learning of existing models. S3 works by creating non-overlap** segments from the original sequence and shuffling them in a learned manner that is the most optimal for the task at hand. It then re-attaches the shuffled segments back together and performs a learned weighted sum with the original input to capture both the newly shuffled sequence along with the original sequence. S3 is modular and can be stacked to create various degrees of granularity, and can be added to many forms of neural architectures including CNNs or Transformers with negligible computation overhead. Through extensive experiments on several datasets and state-of-the-art baselines, we show that incorporating S3 results in significant improvements for the tasks of time-series classification and forecasting, improving performance on certain datasets by up to 68\%. We also show that S3 makes the learning more stable with a smoother training loss curve and loss landscape compared to the original baseline. The code is available at https://github.com/shivam-grover/S3-TimeSeries .
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Navigating Hallucinations for Reasoning of Unintentional Activities
Authors:
Shresth Grover,
Vibhav Vineet,
Yogesh S Rawat
Abstract:
In this work we present a novel task of understanding unintentional human activities in videos. We formalize this problem as a reasoning task under zero-shot scenario, where given a video of an unintentional activity we want to know why it transitioned from intentional to unintentional. We first evaluate the effectiveness of current state-of-the-art Large Multimodal Models on this reasoning task a…
▽ More
In this work we present a novel task of understanding unintentional human activities in videos. We formalize this problem as a reasoning task under zero-shot scenario, where given a video of an unintentional activity we want to know why it transitioned from intentional to unintentional. We first evaluate the effectiveness of current state-of-the-art Large Multimodal Models on this reasoning task and observe that they suffer from hallucination. We further propose a novel prompting technique,termed as Dream of Thoughts (DoT), which allows the model to navigate through hallucinated thoughts to achieve better reasoning. To evaluate the performance on this task, we also introduce three different specialized metrics designed to quantify the models reasoning capability. We perform our experiments on two different datasets, OOPs and UCF-Crimes, and our findings show that DOT prompting technique is able to outperform standard prompting, while minimizing hallucinations.
△ Less
Submitted 3 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
A Domain-Independent Agent Architecture for Adaptive Operation in Evolving Open Worlds
Authors:
Shiwali Mohan,
Wiktor Piotrowski,
Roni Stern,
Sachin Grover,
Sookyung Kim,
Jacob Le,
Johan De Kleer
Abstract:
Model-based reasoning agents are ill-equipped to act in novel situations in which their model of the environment no longer sufficiently represents the world. We propose HYDRA - a framework for designing model-based agents operating in mixed discrete-continuous worlds, that can autonomously detect when the environment has evolved from its canonical setup, understand how it has evolved, and adapt th…
▽ More
Model-based reasoning agents are ill-equipped to act in novel situations in which their model of the environment no longer sufficiently represents the world. We propose HYDRA - a framework for designing model-based agents operating in mixed discrete-continuous worlds, that can autonomously detect when the environment has evolved from its canonical setup, understand how it has evolved, and adapt the agents' models to perform effectively. HYDRA is based upon PDDL+, a rich modeling language for planning in mixed, discrete-continuous environments. It augments the planning module with visual reasoning, task selection, and action execution modules for closed-loop interaction with complex environments. HYDRA implements a novel meta-reasoning process that enables the agent to monitor its own behavior from a variety of aspects. The process employs a diverse set of computational methods to maintain expectations about the agent's own behavior in an environment. Divergences from those expectations are useful in detecting when the environment has evolved and identifying opportunities to adapt the underlying models. HYDRA builds upon ideas from diagnosis and repair and uses a heuristics-guided search over model changes such that they become competent in novel conditions. The HYDRA framework has been used to implement novelty-aware agents for three diverse domains - CartPole++ (a higher dimension variant of a classic control problem), Science Birds (an IJCAI competition problem), and PogoStick (a specific problem domain in Minecraft). We report empirical observations from these domains to demonstrate the efficacy of various components in the novelty meta-reasoning process.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Heuristic Search For Physics-Based Problems: Angry Birds in PDDL+
Authors:
Wiktor Piotrowski,
Yoni Sher,
Sachin Grover,
Roni Stern,
Shiwali Mohan
Abstract:
This paper studies how a domain-independent planner and combinatorial search can be employed to play Angry Birds, a well established AI challenge problem. To model the game, we use PDDL+, a planning language for mixed discrete/continuous domains that supports durative processes and exogenous events. The paper describes the model and identifies key design decisions that reduce the problem complexit…
▽ More
This paper studies how a domain-independent planner and combinatorial search can be employed to play Angry Birds, a well established AI challenge problem. To model the game, we use PDDL+, a planning language for mixed discrete/continuous domains that supports durative processes and exogenous events. The paper describes the model and identifies key design decisions that reduce the problem complexity. In addition, we propose several domain-specific enhancements including heuristics and a search technique similar to preferred operators. Together, they alleviate the complexity of combinatorial search. We evaluate our approach by comparing its performance with dedicated domain-specific solvers on a range of Angry Birds levels. The results show that our performance is on par with these domain-specific approaches in most levels, even without using our domain-specific search enhancements.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
DeepCuts: Single-Shot Interpretability based Pruning for BERT
Authors:
Jasdeep Singh Grover,
Bhavesh Gawri,
Ruskin Raj Manku
Abstract:
As language models have grown in parameters and layers, it has become much harder to train and infer with them on single GPUs. This is severely restricting the availability of large language models such as GPT-3, BERT-Large, and many others. A common technique to solve this problem is pruning the network architecture by removing transformer heads, fully-connected weights, and other modules. The ma…
▽ More
As language models have grown in parameters and layers, it has become much harder to train and infer with them on single GPUs. This is severely restricting the availability of large language models such as GPT-3, BERT-Large, and many others. A common technique to solve this problem is pruning the network architecture by removing transformer heads, fully-connected weights, and other modules. The main challenge is to discern the important parameters from the less important ones. Our goal is to find strong metrics for identifying such parameters. We thus propose two strategies: Cam-Cut based on the GradCAM interpretations, and Smooth-Cut based on the SmoothGrad, for calculating the importance scores. Through this work, we show that our scoring functions are able to assign more relevant task-based scores to the network parameters, and thus both our pruning approaches significantly outperform the standard weight and gradient-based strategies, especially at higher compression ratios in BERT-based models. We also analyze our pruning masks and find them to be significantly different from the ones obtained using standard metrics.
△ Less
Submitted 27 December, 2022;
originally announced December 2022.
-
Control Barrier Functions-based Semi-Definite Programs (CBF-SDPs): Robust Safe Control For Dynamic Systems with Relative Degree Two Safety Indices
Authors:
Jaskaran Singh Grover,
Changliu Liu,
Katia Sycara
Abstract:
In this draft article, we consider the problem of achieving safe control of a dynamic system for which the safety index or (control barrier function (loosely)) has relative degree equal to two. We consider parameter affine nonlinear dynamic systems and assume that the parametric uncertainty is uniform and known a-priori or being updated online through an estimator/parameter adaptation law. Under t…
▽ More
In this draft article, we consider the problem of achieving safe control of a dynamic system for which the safety index or (control barrier function (loosely)) has relative degree equal to two. We consider parameter affine nonlinear dynamic systems and assume that the parametric uncertainty is uniform and known a-priori or being updated online through an estimator/parameter adaptation law. Under this uncertainty, the usual CBF-QP safe control approach takes the form of a robust optimization problem. Both the right hand side and left hand side of the inequality constraints depend on the unknown parameter. With the given representation of uncertainty, the CBF-QP safe control ends up being a convex semi-infinite problem. Using two different philosophies, one based on weak duality and another based on the Lossless s-procedure, we arrive at identical SDP formulations of this robust CBF-QP problem. Thus we show that the problem of computing safe controls with known parametric uncertainty can be posed as a tractable convex problem and be solved online. (This is work in progress).
△ Less
Submitted 25 August, 2022;
originally announced August 2022.
-
Constant factor approximations for Lower and Upper bounded Clusterings
Authors:
Neelima Gupta,
Sapna Grover,
Rajni Dabas
Abstract:
Clustering is one of the most fundamental problem in Machine Learning. Researchers in the field often require a lower bound on the size of the clusters to maintain anonymity and upper bound for the ease of analysis. Specifying an optimal cluster size is a problem often faced by scientists. In this paper, we present a framework to obtain constant factor approximations for some prominent clustering…
▽ More
Clustering is one of the most fundamental problem in Machine Learning. Researchers in the field often require a lower bound on the size of the clusters to maintain anonymity and upper bound for the ease of analysis. Specifying an optimal cluster size is a problem often faced by scientists. In this paper, we present a framework to obtain constant factor approximations for some prominent clustering objectives, with lower and upper bounds on cluster size. This enables scientists to give an approximate cluster size by specifying the lower and the upper bounds for it. Our results preserve the lower bounds but may violate the upper bound a little. %{GroverGD21_LBUBFL_Cocoon} to $2$. %namely, $k$ Center (LUkC) and $k$ Median (LUkM) problem. We study the problems when either of the bounds is uniform. We apply our framework to give the first constant factor approximations for LUkM and its generalization, $k$-facility location problem (LUkFL), with $β+1$ factor violation in upper bounds where $β$ is the violation of upper bounds in solutions of upper bounded $k$-median and $k$-facility location problems respectively. We also present a result on LUkC with uniform upper bounds and, its generalization, lower and (uniform) upper bounded $k$ supplier problem (LUkS). The approach also gives a result on lower and upper bounded facility location problem (LUFL), improving upon the upper bound violation of $5/2$ due to Gupta et al.
We also reduce the violation in upper bounds for a special case when the gap between the lower and upper bounds is not too small.
△ Less
Submitted 26 March, 2022;
originally announced March 2022.
-
GREED: A Neural Framework for Learning Graph Distance Functions
Authors:
Rishabh Ranjan,
Siddharth Grover,
Sourav Medya,
Venkatesan Chakaravarthy,
Yogish Sabharwal,
Sayan Ranu
Abstract:
Among various distance functions for graphs, graph and subgraph edit distances (GED and SED respectively) are two of the most popular and expressive measures. Unfortunately, exact computations for both are NP-hard. To overcome this computational bottleneck, neural approaches to learn and predict edit distance in polynomial time have received much interest. While considerable progress has been made…
▽ More
Among various distance functions for graphs, graph and subgraph edit distances (GED and SED respectively) are two of the most popular and expressive measures. Unfortunately, exact computations for both are NP-hard. To overcome this computational bottleneck, neural approaches to learn and predict edit distance in polynomial time have received much interest. While considerable progress has been made, there exist limitations that need to be addressed. First, the efficacy of an approximate distance function lies not only in its approximation accuracy, but also in the preservation of its properties. To elaborate, although GED is a metric, its neural approximations do not provide such a guarantee. This prohibits their usage in higher order tasks that rely on metric distance functions, such as clustering or indexing. Second, several existing frameworks for GED do not extend to SED due to SED being asymmetric. In this work, we design a novel siamese graph neural network called GREED, which through a carefully crafted inductive bias, learns GED and SED in a property-preserving manner. Through extensive experiments across 10 real graph datasets containing up to 7 million edges, we establish that GREED is not only more accurate than the state of the art, but also up to 3 orders of magnitude faster. Even more significantly, due to preserving the triangle inequality, the generated embeddings are indexable and consequently, even in a CPU-only environment, GREED is up to 50 times faster than GPU-powered baselines for graph / subgraph retrieval.
△ Less
Submitted 21 April, 2023; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency
Authors:
Pakhi Bamdev,
Manraj Singh Grover,
Yaman Kumar Singla,
Payman Vafaee,
Mika Hama,
Rajiv Ratn Shah
Abstract:
English proficiency assessments have become a necessary metric for filtering and selecting prospective candidates for both academia and industry. With the rise in demand for such assessments, it has become increasingly necessary to have the automated human-interpretable results to prevent inconsistencies and ensure meaningful feedback to the second language learners. Feature-based classical approa…
▽ More
English proficiency assessments have become a necessary metric for filtering and selecting prospective candidates for both academia and industry. With the rise in demand for such assessments, it has become increasingly necessary to have the automated human-interpretable results to prevent inconsistencies and ensure meaningful feedback to the second language learners. Feature-based classical approaches have been more interpretable in understanding what the scoring model learns. Therefore, in this work, we utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem, followed by a thorough study to interpret and study the relation between the linguistic cues and the English proficiency level of the speaker. First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses. In comparison, we find that the regression-based models perform equivalent to or better than the classification approach. Second, we perform ablation studies to understand the impact of each of the feature and feature categories on the performance of proficiency grading. Further, to understand individual feature contributions, we present the importance of top features on the best performing algorithm for the grading task. Third, we make use of Partial Dependence Plots and Shapley values to explore feature importance and conclude that the best performing trained model learns the underlying rubrics used for grading the dataset used in this study.
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
SPINE: Soft Piecewise Interpretable Neural Equations
Authors:
Jasdeep Singh Grover,
Harsh Minesh Domadia,
Raj Anant Tapase,
Grishma Sharma
Abstract:
Relu Fully Connected Networks are ubiquitous but uninterpretable because they fit piecewise linear functions emerging from multi-layered structures and complex interactions of model weights. This paper takes a novel approach to piecewise fits by using set operations on individual pieces(parts). This is done by approximating canonical normal forms and using the resultant as a model. This gives spec…
▽ More
Relu Fully Connected Networks are ubiquitous but uninterpretable because they fit piecewise linear functions emerging from multi-layered structures and complex interactions of model weights. This paper takes a novel approach to piecewise fits by using set operations on individual pieces(parts). This is done by approximating canonical normal forms and using the resultant as a model. This gives special advantages like (a)strong correspondence of parameters to pieces of the fit function(High Interpretability); (b)ability to fit any combination of continuous functions as pieces of the piecewise function(Ease of Design); (c)ability to add new non-linearities in a targeted region of the domain(Targeted Learning); (d)simplicity of an equation which avoids layering. It can also be expressed in the general max-min representation of piecewise linear functions which gives theoretical ease and credibility. This architecture is tested on simulated regression and classification tasks and benchmark datasets including UCI datasets, MNIST, FMNIST, and CIFAR 10. This performance is on par with fully connected architectures. It can find a variety of applications where fully connected layers must be replaced by interpretable layers.
△ Less
Submitted 20 November, 2021;
originally announced November 2021.
-
Pipeline for 3D reconstruction of the human body from AR/VR headset mounted egocentric cameras
Authors:
Shivam Grover,
Kshitij Sidana,
Vanita Jain
Abstract:
In this paper, we propose a novel pipeline for the 3D reconstruction of the full body from egocentric viewpoints. 3-D reconstruction of the human body from egocentric viewpoints is a challenging task as the view is skewed and the body parts farther from the cameras are occluded. One such example is the view from cameras installed below VR headsets. To achieve this task, we first make use of condit…
▽ More
In this paper, we propose a novel pipeline for the 3D reconstruction of the full body from egocentric viewpoints. 3-D reconstruction of the human body from egocentric viewpoints is a challenging task as the view is skewed and the body parts farther from the cameras are occluded. One such example is the view from cameras installed below VR headsets. To achieve this task, we first make use of conditional GANs to translate the egocentric views to full body third-person views. This increases the comprehensibility of the image and caters to occlusions. The generated third-person view is further sent through the 3D reconstruction module that generates a 3D mesh of the body. We also train a network that can take the third person full-body view of the subject and generate the texture maps for applying on the mesh. The generated mesh has fairly realistic body proportions and is fully rigged allowing for further applications such as real-time animation and pose transfer in games. This approach can be key to a new domain of mobile human telepresence.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Characterizing Human Explanation Strategies to Inform the Design of Explainable AI for Building Damage Assessment
Authors:
Donghoon Shin,
Sachin Grover,
Kenneth Holstein,
Adam Perer
Abstract:
Explainable AI (XAI) is a promising means of supporting human-AI collaborations for high-stakes visual detection tasks, such as damage detection tasks from satellite imageries, as fully-automated approaches are unlikely to be perfectly safe and reliable. However, most existing XAI techniques are not informed by the understandings of task-specific needs of humans for explanations. Thus, we took a f…
▽ More
Explainable AI (XAI) is a promising means of supporting human-AI collaborations for high-stakes visual detection tasks, such as damage detection tasks from satellite imageries, as fully-automated approaches are unlikely to be perfectly safe and reliable. However, most existing XAI techniques are not informed by the understandings of task-specific needs of humans for explanations. Thus, we took a first step toward understanding what forms of XAI humans require in damage detection tasks. We conducted an online crowdsourced study to understand how people explain their own assessments, when evaluating the severity of building damage based on satellite imagery. Through the study with 60 crowdworkers, we surfaced six major strategies that humans utilize to explain their visual damage assessments. We present implications of our findings for the design of XAI methods for such visual detection contexts, and discuss opportunities for future research.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
COVID-19 India Dataset: Parsing COVID-19 Data in Daily Health Bulletins from States in India
Authors:
Mayank Agarwal,
Tathagata Chakraborti,
Sachin Grover,
Arunima Chaudhary
Abstract:
While India has been one of the hotspots of COVID-19, data about the pandemic from the country has proved to be largely inaccessible at scale. Much of the data exists in unstructured form on the web, and limited aspects of such data are available through public APIs maintained manually through volunteer effort. This has proved to be difficult both in terms of ease of access to detailed data and wi…
▽ More
While India has been one of the hotspots of COVID-19, data about the pandemic from the country has proved to be largely inaccessible at scale. Much of the data exists in unstructured form on the web, and limited aspects of such data are available through public APIs maintained manually through volunteer effort. This has proved to be difficult both in terms of ease of access to detailed data and with regards to the maintenance of manual data-kee** over time. This paper reports on our effort at automating the extraction of such data from public health bulletins with the help of a combination of classical PDF parsers and state-of-the-art machine learning techniques. In this paper, we will describe the automated data-extraction technique, the nature of the generated data, and exciting avenues of ongoing work.
△ Less
Submitted 6 December, 2021; v1 submitted 27 September, 2021;
originally announced October 2021.
-
From Pivots to Graphs: Augmented CycleDensity as a Generalization to One Time InverseConsultation
Authors:
Shashwat Goel,
Kunwar Shaanjeet Singh Grover
Abstract:
This paper describes an approach used to generate new translations using raw bilingual dictionaries as part of the 4th Task Inference Across Dictionaries (TIAD 2021) shared task. We propose Augmented Cycle Density (ACD) as a framework that combines insights from two state of the art methods that require no sense information and parallel corpora: Cycle Density (CD) and One Time Inverse Consultation…
▽ More
This paper describes an approach used to generate new translations using raw bilingual dictionaries as part of the 4th Task Inference Across Dictionaries (TIAD 2021) shared task. We propose Augmented Cycle Density (ACD) as a framework that combines insights from two state of the art methods that require no sense information and parallel corpora: Cycle Density (CD) and One Time Inverse Consultation (OTIC). The task results show that across 3 unseen language pairs, ACD's predictions, has more than double (74%) the coverage of OTIC at almost the same precision (76%). ACD combines CD's scalability - leveraging rich multilingual graphs for better predictions, and OTIC's data efficiency - producing good results with the minimum possible resource of one pivot language.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
First Approximation for Uniform Lower and Upper Bounded Facility Location Problem avoiding violation in Lower Bounds
Authors:
Sapna Grover,
Neelima Gupta,
Rajni Dabas
Abstract:
With growing emphasis on e-commerce marketplace platforms where we have a central platform mediating between the seller and the buyer, it becomes important to keep a check on the availability and profitability of the central store. A store serving too less clients can be non-profitable and a store getting too many orders can lead to bad service to the customers which can be detrimental for the bus…
▽ More
With growing emphasis on e-commerce marketplace platforms where we have a central platform mediating between the seller and the buyer, it becomes important to keep a check on the availability and profitability of the central store. A store serving too less clients can be non-profitable and a store getting too many orders can lead to bad service to the customers which can be detrimental for the business. In this paper, we study the facility location problem(FL) with upper and lower bounds on the number of clients an open facility serves. Constant factor approximations are known for the restricted variants of the problem with only the upper bounds or only the lower bounds. The only work that deals with bounds on both the sides violates both the bounds [8]. In this paper, we present the first (constant factor) approximation for the problem violating the upper bound by a factor of (5/2) without violating the lower bounds when both the lower and the upper bounds are uniform. We first give a tri-criteria (constant factor) approximation violating both the upper and the lower bounds and then get rid of violation in lower bounds by transforming the problem instance to an instance of capacitated facility location problem.
△ Less
Submitted 25 June, 2021; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Classifying CELESTE as NP Complete
Authors:
Zeeshan Ahmed,
Alapan Chaudhuri,
Kunwar Shaanjeet Singh Grover,
Ashwin Rao,
Kushagra Garg,
Pulak Malhotra
Abstract:
We analyze the computational complexity of the video game "CELESTE" and prove that solving a generalized level in it is NP-Complete. Further, we also show how, upon introducing a small change in the game mechanics (adding a new game entity), we can make it PSPACE-complete.
We analyze the computational complexity of the video game "CELESTE" and prove that solving a generalized level in it is NP-Complete. Further, we also show how, upon introducing a small change in the game mechanics (adding a new game entity), we can make it PSPACE-complete.
△ Less
Submitted 1 December, 2022; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Model Elicitation through Direct Questioning
Authors:
Sachin Grover,
David Smith,
Subbarao Kambhampati
Abstract:
The future will be replete with scenarios where humans are robots will be working together in complex environments. Teammates interact, and the robot's interaction has to be about getting useful information about the human's (teammate's) model. There are many challenges before a robot can interact, such as incorporating the structural differences in the human's model, ensuring simpler responses, e…
▽ More
The future will be replete with scenarios where humans are robots will be working together in complex environments. Teammates interact, and the robot's interaction has to be about getting useful information about the human's (teammate's) model. There are many challenges before a robot can interact, such as incorporating the structural differences in the human's model, ensuring simpler responses, etc. In this paper, we investigate how a robot can interact to localize the human model from a set of models. We show how to generate questions to refine the robot's understanding of the teammate's model. We evaluate the method in various planning domains. The evaluation shows that these questions can be generated offline, and can help refine the model through simple answers.
△ Less
Submitted 24 November, 2020;
originally announced November 2020.
-
Parameter Identification for Multirobot Systems Using Optimization Based Controllers (Extended Version)
Authors:
Jaskaran Singh Grover,
Changliu Liu,
Katia Sycara
Abstract:
This paper considers the problem of parameter identification for a multirobot system. We wish to understand when is it feasible for an adversarial observer to reverse-engineer the parameters of tasks being performed by a team of robots by simply observing their positions. We address this question by using the concept of persistency of excitation from system identification. Each robot in the team u…
▽ More
This paper considers the problem of parameter identification for a multirobot system. We wish to understand when is it feasible for an adversarial observer to reverse-engineer the parameters of tasks being performed by a team of robots by simply observing their positions. We address this question by using the concept of persistency of excitation from system identification. Each robot in the team uses optimization-based controllers for mediating between task satisfaction and collision avoidance. These controllers exhibit an implicit dependence on the task's parameters which poses a hurdle for deriving necessary conditions for parameter identification, since such conditions usually require an explicit relation. We address this bottleneck by using duality theory and SVD of active collision avoidance constraints and derive an explicit relation between each robot's task parameters and its control inputs. This allows us to derive the main necessary conditions for successful identification which agree with our intuition. We demonstrate the importance of these conditions through numerical simulations by using (a) an adaptive observer and (b) an unscented Kalman filter for goal estimation in various geometric settings. These simulations show that under circumstances where parameter inference is supposed to be infeasible per our conditions, both these estimators fail and likewise when it is feasible, both converge to the true parameters. Videos of these results are available at https://bit.ly/3kQYj5J.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
audino: A Modern Annotation Tool for Audio and Speech
Authors:
Manraj Singh Grover,
Pakhi Bamdev,
Ratin Kumar Brala,
Yaman Kumar,
Mika Hama,
Rajiv Ratn Shah
Abstract:
In this paper, we introduce a collaborative and modern annotation tool for audio and speech: audino. The tool allows annotators to define and describe temporal segmentation in audios. These segments can be labelled and transcribed easily using a dynamically generated form. An admin can centrally control user roles and project assignment through the admin dashboard. The dashboard also enables descr…
▽ More
In this paper, we introduce a collaborative and modern annotation tool for audio and speech: audino. The tool allows annotators to define and describe temporal segmentation in audios. These segments can be labelled and transcribed easily using a dynamically generated form. An admin can centrally control user roles and project assignment through the admin dashboard. The dashboard also enables describing labels and their values. The annotations can easily be exported in JSON format for further analysis. The tool allows audio data and their corresponding annotations to be uploaded and assigned to a user through a key-based API. The flexibility available in the annotation tool enables annotation for Speech Scoring, Voice Activity Detection (VAD), Speaker Diarisation, Speaker Identification, Speech Recognition, Emotion Recognition tasks and more. The MIT open source license allows it to be used for academic and commercial projects.
△ Less
Submitted 28 November, 2021; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Multi-modal Automated Speech Scoring using Attention Fusion
Authors:
Manraj Singh Grover,
Yaman Kumar,
Sumit Sarin,
Payman Vafaee,
Mika Hama,
Rajiv Ratn Shah
Abstract:
In this study, we propose a novel multi-modal end-to-end neural approach for automated assessment of non-native English speakers' spontaneous speech using attention fusion. The pipeline employs Bi-directional Recurrent Convolutional Neural Networks and Bi-directional Long Short-Term Memory Neural Networks to encode acoustic and lexical cues from spectrograms and transcriptions, respectively. Atten…
▽ More
In this study, we propose a novel multi-modal end-to-end neural approach for automated assessment of non-native English speakers' spontaneous speech using attention fusion. The pipeline employs Bi-directional Recurrent Convolutional Neural Networks and Bi-directional Long Short-Term Memory Neural Networks to encode acoustic and lexical cues from spectrograms and transcriptions, respectively. Attention fusion is performed on these learned predictive features to learn complex interactions between different modalities before final scoring. We compare our model with strong baselines and find combined attention to both lexical and acoustic cues significantly improves the overall performance of the system. Further, we present a qualitative and quantitative analysis of our model.
△ Less
Submitted 28 November, 2021; v1 submitted 17 May, 2020;
originally announced May 2020.
-
Differentiable Set Operations for Algebraic Expressions
Authors:
Jasdeep Singh Grover
Abstract:
Basic principles of set theory have been applied in the context of probability and binary computation. Applying the same principles on inequalities is less common but can be extremely beneficial in a variety of fields. This paper formulates a novel approach to directly apply set operations on inequalities to produce resultant inequalities with differentiable boundaries. The suggested approach uses…
▽ More
Basic principles of set theory have been applied in the context of probability and binary computation. Applying the same principles on inequalities is less common but can be extremely beneficial in a variety of fields. This paper formulates a novel approach to directly apply set operations on inequalities to produce resultant inequalities with differentiable boundaries. The suggested approach uses inequalities of the form Ei: fi(x1,x2,..,xn) and an expression of set operations in terms of Ei like, (E1 and E2) or E3, or can be in any standard form like the Conjunctive Normal Form (CNF) to produce an inequality F(x1,x2,..,xn)<=1 which represents the resulting bounded region from the expressions and has a differentiable boundary. To ensure differentiability of the solution, a trade-off between representation accuracy and curvature at borders (especially corners) is made. A set of parameters is introduced which can be fine-tuned to improve the accuracy of this approach. The various applications of the suggested approach have also been discussed which range from computer graphics to modern machine learning systems to fascinating demonstrations for educational purposes (current use). A python script to parse such expressions is also provided.
△ Less
Submitted 21 December, 2019;
originally announced December 2019.
-
Universal EEG Encoder for Learning Diverse Intelligent Tasks
Authors:
Baani Leen Kaur Jolly,
Palash Aggrawal,
Surabhi S Nath,
Viresh Gupta,
Manraj Singh Grover,
Rajiv Ratn Shah
Abstract:
Brain Computer Interfaces (BCI) have become very popular with Electroencephalography (EEG) being one of the most commonly used signal acquisition techniques. A major challenge in BCI studies is the individualistic analysis required for each task. Thus, task-specific feature extraction and classification are performed, which fails to generalize to other tasks with similar time-series EEG input data…
▽ More
Brain Computer Interfaces (BCI) have become very popular with Electroencephalography (EEG) being one of the most commonly used signal acquisition techniques. A major challenge in BCI studies is the individualistic analysis required for each task. Thus, task-specific feature extraction and classification are performed, which fails to generalize to other tasks with similar time-series EEG input data. To this end, we design a GRU-based universal deep encoding architecture to extract meaningful features from publicly available datasets for five diverse EEG-based classification tasks. Our network can generate task and format-independent data representation and outperform the state of the art EEGNet architecture on most experiments. We also compare our results with CNN-based, and Autoencoder networks, in turn performing local, spatial, temporal and unsupervised analysis on the data.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions
Authors:
Osaid Rehman Nasir,
Shailesh Kumar Jha,
Manraj Singh Grover,
Yi Yu,
Ajit Kumar,
Rajiv Ratn Shah
Abstract:
Powerful generative adversarial networks (GAN) have been developed to automatically synthesize realistic images from text. However, most existing tasks are limited to generating simple images such as flowers from captions. In this work, we extend this problem to the less addressed domain of face generation from fine-grained textual descriptions of face, e.g., "A person has curly hair, oval face, a…
▽ More
Powerful generative adversarial networks (GAN) have been developed to automatically synthesize realistic images from text. However, most existing tasks are limited to generating simple images such as flowers from captions. In this work, we extend this problem to the less addressed domain of face generation from fine-grained textual descriptions of face, e.g., "A person has curly hair, oval face, and mustache". We are motivated by the potential of automated face generation to impact and assist critical tasks such as criminal face reconstruction. Since current datasets for the task are either very small or do not contain captions, we generate captions for images in the CelebA dataset by creating an algorithm to automatically convert a list of attributes to a set of captions. We then model the highly multi-modal problem of text to face generation as learning the conditional distribution of faces (conditioned on text) in same latent space. We utilize the current state-of-the-art GAN (DC-GAN with GAN-CLS loss) for learning conditional multi-modality. The presence of more fine-grained details and variable length of the captions makes the problem easier for a user but more difficult to handle compared to the other text-to-image tasks. We flipped the labels for real and fake images and added noise in discriminator. Generated images for diverse textual descriptions show promising results. In the end, we show how the widely used inceptions score is not a good metric to evaluate the performance of generative models used for synthesizing faces from text.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
Authors:
Karanbir Chahal,
Manraj Singh Grover,
Kuntal Dey
Abstract:
Deep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat however is the substantial amount of compute needed to train these deep learning models. Training a benchmark dataset like ImageNet on a single machine with a modern GPU can take upto a week, distributing training on multiple machines has been observed to drastically bring this time down. Recent wo…
▽ More
Deep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat however is the substantial amount of compute needed to train these deep learning models. Training a benchmark dataset like ImageNet on a single machine with a modern GPU can take upto a week, distributing training on multiple machines has been observed to drastically bring this time down. Recent work has brought down ImageNet training time to a time as low as 4 minutes by using a cluster of 2048 GPUs. This paper surveys the various algorithms and techniques used to distribute training and presents the current state of the art for a modern distributed training framework. More specifically, we explore the synchronous and asynchronous variants of distributed Stochastic Gradient Descent, various All Reduce gradient aggregation strategies and best practices for obtaining higher throughout and lower latency over a cluster such as mixed precision training, large batch training and gradient compression.
△ Less
Submitted 28 October, 2018;
originally announced October 2018.
-
Plan Explanations as Model Reconciliation -- An Empirical Study
Authors:
Tathagata Chakraborti,
Sarath Sreedharan,
Sachin Grover,
Subbarao Kambhampati
Abstract:
Recent work in explanation generation for decision making agents has looked at how unexplained behavior of autonomous systems can be understood in terms of differences in the model of the system and the human's understanding of the same, and how the explanation process as a result of this mismatch can be then seen as a process of reconciliation of these models. Existing algorithms in such settings…
▽ More
Recent work in explanation generation for decision making agents has looked at how unexplained behavior of autonomous systems can be understood in terms of differences in the model of the system and the human's understanding of the same, and how the explanation process as a result of this mismatch can be then seen as a process of reconciliation of these models. Existing algorithms in such settings, while having been built on contrastive, selective and social properties of explanations as studied extensively in the psychology literature, have not, to the best of our knowledge, been evaluated in settings with actual humans in the loop. As such, the applicability of such explanations to human-AI and human-robot interactions remains suspect. In this paper, we set out to evaluate these explanation generation algorithms in a series of studies in a mock search and rescue scenario with an internal semi-autonomous robot and an external human commander. We demonstrate to what extent the properties of these algorithms hold as they are evaluated by humans, and how the dynamics of trust between the human and the robot evolve during the process of these interactions.
△ Less
Submitted 3 February, 2018;
originally announced February 2018.
-
Texture Synthesis with Recurrent Variational Auto-Encoder
Authors:
Rohan Chandra,
Sachin Grover,
Kyungjun Lee,
Moustafa Meshry,
Ahmed Taha
Abstract:
We propose a recurrent variational auto-encoder for texture synthesis. A novel loss function, FLTBNK, is used for training the texture synthesizer. It is rotational and partially color invariant loss function. Unlike L2 loss, FLTBNK explicitly models the correlation of color intensity between pixels. Our texture synthesizer generates neighboring tiles to expand a sample texture and is evaluated us…
▽ More
We propose a recurrent variational auto-encoder for texture synthesis. A novel loss function, FLTBNK, is used for training the texture synthesizer. It is rotational and partially color invariant loss function. Unlike L2 loss, FLTBNK explicitly models the correlation of color intensity between pixels. Our texture synthesizer generates neighboring tiles to expand a sample texture and is evaluated using various texture patterns from Describable Textures Dataset (DTD). We perform both quantitative and qualitative experiments with various loss functions to evaluate the performance of our proposed loss function (FLTBNK) --- a mini-human subject study is used for the qualitative evaluation.
△ Less
Submitted 23 December, 2017;
originally announced December 2017.
-
Constant factor Approximation Algorithms for Uniform Hard Capacitated Facility Location Problems: Natural LP is not too bad
Authors:
Sapna Grover,
Neelima Gupta,
Samir Khuller,
Aditya Pancholi
Abstract:
In this paper, we give first constant factor approximation for capacitated knapsack median problem (CKM) for hard uniform capacities, violating the budget only by an additive factor of $f_{max}$ where $f_{max}$ is the maximum cost of a facility opened by the optimal and violating capacities by $(2+ε)$ factor. Natural LP for the problem is known to have an unbounded integrality gap when any one of…
▽ More
In this paper, we give first constant factor approximation for capacitated knapsack median problem (CKM) for hard uniform capacities, violating the budget only by an additive factor of $f_{max}$ where $f_{max}$ is the maximum cost of a facility opened by the optimal and violating capacities by $(2+ε)$ factor. Natural LP for the problem is known to have an unbounded integrality gap when any one of the two constraints is allowed to be violated by a factor less than $2$. Thus, we present a result which is very close to the best achievable from the natural LP. To the best of our knowledge, the problem has not been studied earlier.
For capacitated facility location problem with uniform capacities, a constant factor approximation algorithm is presented violating the capacities a little ($1 + ε$). Though constant factor results are known for the problem without violating the capacities, the result is interesting as it is obtained by rounding the solution to the natural LP, which is known to have an unbounded integrality gap without violating the capacities. Thus, we achieve the best possible from the natural LP for the problem. The result shows that natural LP is not too bad.
Finally, we raise some issues with the proofs of the results presented in \cite{capkmByrkaFRS2013} for capacitated $k$-facility location problem (C$k$FLP). \cite{capkmByrkaFRS2013} presents $O(1/ε^2)$ approximation violating the capacities by a factor of $(2 + ε)$ using dependent rounding. We first fix these issues using our techniques. Also, it can be argued that (deterministic) pipage rounding cannot be used to open the facilities instead of dependent rounding. Our techniques for CKM provide a constant factor approximation for CkFLP violating the capacities by $(2 + ε)$.
△ Less
Submitted 23 March, 2022; v1 submitted 26 June, 2016;
originally announced June 2016.