Search | arXiv e-print repository

A Precise Characterization of SGD Stability Using Loss Surface Geometry

Authors: Gregory Dexter, Borja Ocejo, Sathiya Keerthi, Aman Gupta, Ayan Acharya, Rajiv Khanna

Abstract: Stochastic Gradient Descent (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding. Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates. Several studies have investigated the linear stability property of SGD in the vicinity of a statio… ▽ More Stochastic Gradient Descent (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding. Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates. Several studies have investigated the linear stability property of SGD in the vicinity of a stationary point as a predictive proxy for sharpness and generalization error in overparameterized neural networks (Wu et al., 2022; Jastrzebski et al., 2019; Cohen et al., 2021). In this paper, we delve deeper into the relationship between linear stability and sharpness. More specifically, we meticulously delineate the necessary and sufficient conditions for linear stability, contingent on hyperparameters of SGD and the sharpness at the optimum. Towards this end, we introduce a novel coherence measure of the loss Hessian that encapsulates pertinent geometric properties of the loss function that are relevant to the linear stability of SGD. It enables us to provide a simplified sufficient condition for identifying linear instability at an optimum. Notably, compared to previous works, our analysis relies on significantly milder assumptions and is applicable for a broader class of loss functions than known before, encompassing not only mean-squared error but also cross-entropy loss. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: To appear at ICLR 2024

arXiv:2312.10049 [pdf]

Knowledge Graph Reasoning Based on Attention GCN

Authors: Meera Gupta, Ravi Khanna, Divya Choudhary, Nandini Rao

Abstract: We propose a novel technique to enhance Knowledge Graph Reasoning by combining Graph Convolution Neural Network (GCN) with the Attention Mechanism. This approach utilizes the Attention Mechanism to examine the relationships between entities and their neighboring nodes, which helps to develop detailed feature vectors for each entity. The GCN uses shared parameters to effectively represent the chara… ▽ More We propose a novel technique to enhance Knowledge Graph Reasoning by combining Graph Convolution Neural Network (GCN) with the Attention Mechanism. This approach utilizes the Attention Mechanism to examine the relationships between entities and their neighboring nodes, which helps to develop detailed feature vectors for each entity. The GCN uses shared parameters to effectively represent the characteristics of adjacent entities. We first learn the similarity of entities for node representation learning. By integrating the attributes of the entities and their interactions, this method generates extensive implicit feature vectors for each entity, improving performance in tasks including entity classification and link prediction, outperforming traditional neural network models. To conclude, this work provides crucial methodological support for a range of applications, such as search engines, question-answering systems, recommendation systems, and data integration tasks. △ Less

Submitted 27 January, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

arXiv:2310.00488 [pdf, other]

On Memorization and Privacy Risks of Sharpness Aware Minimization

Authors: Young In Kim, Pratiksha Agrawal, Johannes O. Royset, Rajiv Khanna

Abstract: In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify wh… ▽ More In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify which data points specifically do algorithms seeking flatter optima do better when compared to vanilla SGD. We find that the generalization gains achieved by Sharpness Aware Minimization (SAM) are particularly pronounced for atypical data points, which necessitate memorization. This insight helps us unearth higher privacy risks associated with SAM, which we verify through exhaustive empirical evaluations. Finally, we propose mitigation strategies to achieve a more desirable accuracy vs privacy tradeoff. △ Less

Submitted 3 January, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

arXiv:2308.05221 [pdf, other]

Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI

Authors: Hangjie Shi, Leslie Ball, Govind Thattai, Desheng Zhang, Lucy Hu, Qiaozi Gao, Suhaila Shakiah, Xiaofeng Gao, Aishwarya Padmakumar, Bofei Yang, Cadence Chung, Dinakar Guthy, Gaurav Sukhatme, Karthika Arumugam, Matthew Wen, Osman Ipek, Patrick Lange, Rohan Khanna, Shreyas Pansare, Vasu Sharma, Chao Zhang, Cris Flagg, Daniel Pressel, Lavina Vaz, Luke Dai , et al. (17 additional authors not shown)

Abstract: The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge. As conversational agents increasingly appear in multimodal and embodied contexts, it is important to explore the affordances of conversational interaction augmented wi… ▽ More The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge. As conversational agents increasingly appear in multimodal and embodied contexts, it is important to explore the affordances of conversational interaction augmented with computer vision and physical embodiment. This paper describes the SimBot Challenge, a new challenge in which university teams compete to build robot assistants that complete tasks in a simulated physical environment. This paper provides an overview of the SimBot Challenge, which included both online and offline challenge phases. We describe the infrastructure and support provided to the teams including Alexa Arena, the simulated environment, and the ML toolkit provided to teams to accelerate their building of vision and language models. We summarize the approaches the participating teams took to overcome research challenges and extract key lessons learned. Finally, we provide analysis of the performance of the competing SimBots during the competition. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2307.02501 [pdf, ps, other]

Generalization Guarantees via Algorithm-dependent Rademacher Complexity

Authors: Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut Simsekli

Abstract: Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure… ▽ More Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information. △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2303.14284 [pdf, ps, other]

Feature Space Sketching for Logistic Regression

Authors: Gregory Dexter, Rajiv Khanna, Jawad Raheel, Petros Drineas

Abstract: We present novel bounds for coreset construction, feature selection, and dimensionality reduction for logistic regression. All three approaches can be thought of as sketching the logistic regression inputs. On the coreset construction front, we resolve open problems from prior work and present novel bounds for the complexity of coreset construction methods. On the feature selection and dimensional… ▽ More We present novel bounds for coreset construction, feature selection, and dimensionality reduction for logistic regression. All three approaches can be thought of as sketching the logistic regression inputs. On the coreset construction front, we resolve open problems from prior work and present novel bounds for the complexity of coreset construction methods. On the feature selection and dimensionality reduction front, we initiate the study of forward error bounds for logistic regression. Our bounds are tight up to constant factors and our forward error bounds can be extended to Generalized Linear Models. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.01586 [pdf, other]

Alexa Arena: A User-Centric Interactive Platform for Embodied AI

Authors: Qiaozi Gao, Govind Thattai, Suhaila Shakiah, Xiaofeng Gao, Shreyas Pansare, Vasu Sharma, Gaurav Sukhatme, Hangjie Shi, Bofei Yang, Desheng Zheng, Lucy Hu, Karthika Arumugam, Shui Hu, Matthew Wen, Dinakar Guthy, Cadence Chung, Rohan Khanna, Osman Ipek, Leslie Ball, Kate Bland, Heather Rocker, Yadunandana Rao, Michael Johnston, Reza Ghanadan, Arindam Mandal , et al. (2 additional authors not shown)

Abstract: We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. Alexa Arena provides a variety of multi-room layouts and interactable objects, for the creation of human-robot interaction (HRI) missions. With user-friendly graphics and control mechanisms, Alexa Arena supports the development of gamified robotic tasks readily accessible to general human users, thus openi… ▽ More We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. Alexa Arena provides a variety of multi-room layouts and interactable objects, for the creation of human-robot interaction (HRI) missions. With user-friendly graphics and control mechanisms, Alexa Arena supports the development of gamified robotic tasks readily accessible to general human users, thus opening a new venue for high-efficiency HRI data collection and EAI system evaluation. Along with the platform, we introduce a dialog-enabled instruction-following benchmark and provide baseline results for it. We make Alexa Arena publicly available to facilitate research in building generalizable and assistive embodied agents. △ Less

Submitted 7 June, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.09693 [pdf, other]

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul Mazumder

Abstract: Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as… ▽ More Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to substantiate this theoretical advancement. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework. △ Less

Submitted 30 September, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2212.04343

arXiv:2202.13718 [pdf, other]

Fast Feature Selection with Fairness Constraints

Authors: Francesco Quinzan, Rajiv Khanna, Moshik Hershcovitch, Sarel Cohen, Daniel G. Waddington, Tobias Friedrich, Michael W. Mahoney

Abstract: We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-… ▽ More We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions. The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work. Furthermore, our extension allows the use of downward-closed constraints, which can be used to encode certain fairness criteria into the feature selection process. We prove strong approximation guarantees for the algorithm based on standard assumptions. These guarantees are applicable to many parametric models, including Generalized Linear Models. Finally, we demonstrate empirically that the proposed algorithm competes favorably with state-of-the-art techniques for feature selection, on real-world and synthetic datasets. △ Less

Submitted 3 February, 2023; v1 submitted 28 February, 2022; originally announced February 2022.

arXiv:2109.13978 [pdf, other]

Identifying Reasoning Flaws in Planning-Based RL Using Tree Explanations

Authors: Kin-Ho Lam, Zhengxian Lin, Jed Irvine, Jonathan Dodge, Zeyad T Shureih, Roli Khanna, Minsuk Kahng, Alan Fern

Abstract: Enabling humans to identify potential flaws in an agent's decision making is an important Explainable AI application. We consider identifying such flaws in a planning-based deep reinforcement learning (RL) agent for a complex real-time strategy game. In particular, the agent makes decisions via tree search using a learned model and evaluation function over interpretable states and actions. This gi… ▽ More Enabling humans to identify potential flaws in an agent's decision making is an important Explainable AI application. We consider identifying such flaws in a planning-based deep reinforcement learning (RL) agent for a complex real-time strategy game. In particular, the agent makes decisions via tree search using a learned model and evaluation function over interpretable states and actions. This gives the potential for humans to identify flaws at the level of reasoning steps in the tree, even if the entire reasoning process is too complex to understand. However, it is unclear whether humans will be able to identify such flaws due to the size and complexity of trees. We describe a user interface and case study, where a small group of AI experts and developers attempt to identify reasoning flaws due to inaccurate agent learning. Overall, the interface allowed the group to identify a number of significant flaws of varying types, demonstrating the promise of this approach. △ Less

Submitted 28 September, 2021; originally announced September 2021.

arXiv:2108.00781 [pdf, other]

Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers

Authors: Liam Hodgkinson, Umut Şimşekli, Rajiv Khanna, Michael W. Mahoney

Abstract: Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, this work mainly relied on continuous-time ap… ▽ More Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, this work mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the lower tail exponent of the transition kernel associated with the optimizer around a local minimum, in both discrete- and continuous-time settings. To achieve this, we first prove a data- and algorithm-dependent generalization bound in terms of the celebrated Fernique-Talagrand functional applied to the trajectory of the optimizer. Then, we specialize this result by exploiting the Markovian structure of stochastic optimizers, and derive bounds in terms of their (data-dependent) transition kernels. We support our theory with empirical results from a variety of neural networks, showing correlations between generalization error and lower tail exponents. △ Less

Submitted 11 July, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: 22 pages, 6 figures

arXiv:2105.07320 [pdf, other]

LocalNewton: Reducing Communication Bottleneck for Distributed Learning

Authors: Vipul Gupta, Avishek Ghosh, Michal Derezinski, Rajiv Khanna, Kannan Ramchandran, Michael Mahoney

Abstract: To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the worke… ▽ More To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the workers run multiple such iterations locally and communicate the models to the master node only once every few (say L) iterations. LocalNewton is highly practical since it requires only one hyperparameter, the number L of local iterations. We use novel matrix concentration-based techniques to obtain theoretical guarantees for LocalNewton, and we validate them with detailed empirical evaluation. To enhance practicability, we devise an adaptive scheme to choose L, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master. Via extensive experiments using several real-world datasets with AWS Lambda workers and an AWS EC2 master, we show that LocalNewton requires fewer than 60% of the communication rounds (between master and workers) and less than 40% of the end-to-end running time, compared to state-of-the-art algorithms, to reach the same training~loss. △ Less

Submitted 15 May, 2021; originally announced May 2021.

Comments: To be published in Uncertainty in Artificial Intelligence (UAI) 2021

arXiv:2101.12446 [pdf, other]

doi 10.1016/j.artint.2021.103455

Counterfactual State Explanations for Reinforcement Learning Agents via Generative Deep Learning

Authors: Matthew L. Olson, Roli Khanna, Lawrence Neal, Fuxin Li, Weng-Keen Wong

Abstract: Counterfactual explanations, which deal with "why not?" scenarios, can provide insightful explanations to an AI agent's behavior. In this work, we focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanati… ▽ More Counterfactual explanations, which deal with "why not?" scenarios, can provide insightful explanations to an AI agent's behavior. In this work, we focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning. Specifically, a counterfactual state illustrates what minimal change is needed to an Atari game image such that the agent chooses a different action. We also evaluate the effectiveness of counterfactual states on human participants who are not machine learning experts. Our first user study investigates if humans can discern if the counterfactual state explanations are produced by the actual game or produced by a generative deep learning approach. Our second user study investigates if counterfactual state explanations can help non-expert participants identify a flawed agent; we compare against a baseline approach based on a nearest neighbor explanation which uses images from the actual game. Our results indicate that counterfactual state explanations have sufficient fidelity to the actual game images to enable non-experts to more effectively identify a flawed RL agent compared to the nearest neighbor baseline and to having no explanation at all. △ Less

Submitted 29 January, 2021; originally announced January 2021.

Comments: Full source code available at https://github.com/mattolson93/counterfactual-state-explanations

Journal ref: Artificial Intelligence, 2021, 103455, ISSN 0004-3702

arXiv:2007.05869 [pdf, other]

Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification

Authors: Francisco Utrera, Evan Kravitz, N. Benjamin Erichson, Rajiv Khanna, Michael W. Mahoney

Abstract: Transfer learning has emerged as a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains. This process consists of taking a neural network pre-trained on a large feature-rich source dataset, freezing the early layers that encode essential generic image properties, and then fine-tuning the last few layers in order to capture specific informatio… ▽ More Transfer learning has emerged as a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains. This process consists of taking a neural network pre-trained on a large feature-rich source dataset, freezing the early layers that encode essential generic image properties, and then fine-tuning the last few layers in order to capture specific information related to the target situation. This approach is particularly useful when only limited or weakly labeled data are available for the new task. In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models, especially if only limited data are available for the new domain task. Further, we observe that adversarial training biases the learnt representations to retaining shapes, as opposed to textures, which impacts the transferability of the source models. Finally, through the lens of influence functions, we discover that transferred adversarially-trained models contain more human-identifiable semantic information, which explains -- at least partly -- why adversarially-trained models transfer better. △ Less

Submitted 23 April, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

Comments: Published as a conference paper at ICLR 2021

arXiv:2007.05086 [pdf, other]

Boundary thickness and robustness in learning models

Authors: Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

Abstract: Robustness of machine learning models to various adversarial and non-adversarial corruptions continues to be of interest. In this paper, we introduce the notion of the boundary thickness of a classifier, and we describe its connection with and usefulness for model robustness. Thick decision boundaries lead to improved performance, while thin decision boundaries lead to overfitting (e.g., measured… ▽ More Robustness of machine learning models to various adversarial and non-adversarial corruptions continues to be of interest. In this paper, we introduce the notion of the boundary thickness of a classifier, and we describe its connection with and usefulness for model robustness. Thick decision boundaries lead to improved performance, while thin decision boundaries lead to overfitting (e.g., measured by the robust generalization gap between training and testing) and lower robustness. We show that a thicker boundary helps improve robustness against adversarial examples (e.g., improving the robust test accuracy of adversarial training) as well as so-called out-of-distribution (OOD) transforms, and we show that many commonly-used regularization and data augmentation procedures can increase boundary thickness. On the theoretical side, we establish that maximizing boundary thickness during training is akin to the so-called mixup training. Using these observations, we show that noise-augmentation on mixup training further increases boundary thickness, thereby combating vulnerability to various forms of adversarial attacks and OOD transforms. We can also show that the performance improvement in several lines of recent work happens in conjunction with a thicker boundary. △ Less

Submitted 12 January, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

Journal ref: NeurIPS 2020

arXiv:2007.00715 [pdf, other]

Bayesian Coresets: Revisiting the Nonconvex Optimization Perspective

Authors: Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

Abstract: Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference. The Bayesian coreset problem involves selecting a (weighted) subset of the data samples, such that the posterior inference using the selected subset closely approximates the posterior inference using the full dataset. This manuscript revisits Bayesian coresets through the lens of sparsity constrain… ▽ More Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference. The Bayesian coreset problem involves selecting a (weighted) subset of the data samples, such that the posterior inference using the selected subset closely approximates the posterior inference using the full dataset. This manuscript revisits Bayesian coresets through the lens of sparsity constrained optimization. Leveraging recent advances in accelerated optimization methods, we propose and analyze a novel algorithm for coreset selection. We provide explicit convergence rate guarantees and present an empirical evaluation on a variety of benchmark datasets to highlight our proposed algorithm's superior performance compared to state-of-the-art on speed and accuracy. △ Less

Submitted 25 February, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: AISTATS 2021 (Oral)

arXiv:2005.00792 [pdf, other]

ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data

Authors: Woojeong **, Rahul Khanna, Suji Kim, Dong-Ho Lee, Fred Morstatter, Aram Galstyan, Xiang Ren

Abstract: Event forecasting is a challenging, yet important task, as humans seek to constantly plan for the future. Existing automated forecasting studies rely mostly on structured data, such as time-series or event-based knowledge graphs, to help predict future events. In this work, we aim to formulate a task, construct a dataset, and provide benchmarks for develo** methods for event forecasting with lar… ▽ More Event forecasting is a challenging, yet important task, as humans seek to constantly plan for the future. Existing automated forecasting studies rely mostly on structured data, such as time-series or event-based knowledge graphs, to help predict future events. In this work, we aim to formulate a task, construct a dataset, and provide benchmarks for develo** methods for event forecasting with large volumes of unstructured text data. To simulate the forecasting scenario on temporal news documents, we formulate the problem as a restricted-domain, multiple-choice, question-answering (QA) task. Unlike existing QA tasks, our task limits accessible information, and thus a model has to make a forecasting judgement. To showcase the usefulness of this task formulation, we introduce ForecastQA, a question-answering dataset consisting of 10,392 event forecasting questions, which have been collected and verified via crowdsourcing efforts. We present our experiments on ForecastQA using BERT-based models and find that our best model achieves 60.1% accuracy on the dataset, which still lags behind human performance by about 19%. We hope ForecastQA will support future research efforts in bridging this gap. △ Less

Submitted 7 June, 2021; v1 submitted 2 May, 2020; originally announced May 2020.

Comments: Accepted to ACL 2021. Project page: https://inklab.usc.edu/ForecastQA/

arXiv:2005.00782 [pdf, other]

RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms

Authors: Pei Zhou, Rahul Khanna, Seyeon Lee, Bill Yuchen Lin, Daniel Ho, Jay Pujara, Xiang Ren

Abstract: Pre-trained language models (PTLMs) have achieved impressive performance on commonsense inference benchmarks, but their ability to employ commonsense to make robust inferences, which is crucial for effective communications with humans, is debated. In the pursuit of advancing fluid human-AI communication, we propose a new challenge, RICA: Robust Inference capability based on Commonsense Axioms, tha… ▽ More Pre-trained language models (PTLMs) have achieved impressive performance on commonsense inference benchmarks, but their ability to employ commonsense to make robust inferences, which is crucial for effective communications with humans, is debated. In the pursuit of advancing fluid human-AI communication, we propose a new challenge, RICA: Robust Inference capability based on Commonsense Axioms, that evaluates robust commonsense inference despite textual perturbations. To generate data for this challenge, we develop a systematic and scalable procedure using commonsense knowledge bases and probe PTLMs across two different evaluation settings. Extensive experiments on our generated probe sets with more than 10k statements show that PTLMs perform no better than random guessing on the zero-shot setting, are heavily impacted by statistical biases, and are not robust to perturbation attacks. We also find that fine-tuning on similar statements offer limited gains, as PTLMs still fail to generalize to unseen inferences. Our new large-scale benchmark exposes a significant gap between PTLMs and human-level language understanding and offers a new challenge for PTLMs to demonstrate commonsense. △ Less

Submitted 9 September, 2021; v1 submitted 2 May, 2020; originally announced May 2020.

Comments: Accepted in EMNLP 2021 main conference. 20 pages, 8 figures

arXiv:2005.00683 [pdf, other]

Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models

Authors: Bill Yuchen Lin, Seyeon Lee, Rahul Khanna, Xiang Ren

Abstract: Recent works show that pre-trained language models (PTLMs), such as BERT, possess certain commonsense and factual knowledge. They suggest that it is promising to use PTLMs as "neural knowledge bases" via predicting masked words. Surprisingly, we find that this may not work for numerical commonsense knowledge (e.g., a bird usually has two legs). In this paper, we investigate whether and to what ext… ▽ More Recent works show that pre-trained language models (PTLMs), such as BERT, possess certain commonsense and factual knowledge. They suggest that it is promising to use PTLMs as "neural knowledge bases" via predicting masked words. Surprisingly, we find that this may not work for numerical commonsense knowledge (e.g., a bird usually has two legs). In this paper, we investigate whether and to what extent we can induce numerical commonsense knowledge from PTLMs as well as the robustness of this process. To study this, we introduce a novel probing task with a diagnostic dataset, NumerSense, containing 13.6k masked-word-prediction probes (10.5k for fine-tuning and 3.1k for testing). Our analysis reveals that: (1) BERT and its stronger variant RoBERTa perform poorly on the diagnostic dataset prior to any fine-tuning; (2) fine-tuning with distant supervision brings some improvement; (3) the best supervised model still performs poorly as compared to human performance (54.06% vs 96.3% in accuracy). △ Less

Submitted 17 September, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: To appear in Proceedings of EMNLP 2020. Project page: http://inklab.usc.edu/NumerSense/

arXiv:2004.07499 [pdf, other]

LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation

Authors: Dong-Ho Lee, Rahul Khanna, Bill Yuchen Lin, Jamin Chen, Seyeon Lee, Qinyuan Ye, Elizabeth Boschee, Leonardo Neves, Xiang Ren

Abstract: Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only… ▽ More Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only allows an annotator to provide the needed labels for a task, but also enables LearnIng From Explanations for each labeling decision. Such explanations enable us to generate useful additional labeled data from unlabeled instances, bolstering the pool of available training data. On three popular NLP tasks (named entity recognition, relation extraction, sentiment analysis), we find that using this enhanced supervision allows our models to surpass competitive baseline F1 scores by more than 5-10 percentage points, while using 2X times fewer labeled instances. Our framework is the first to utilize this enhanced supervision technique and does so for three important tasks -- thus providing improved annotation recommendations to users and an ability to build datasets of (data, label, explanation) triples instead of the regular (data, label) pair. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: Accepted to the ACL 2020 (demo). The first two authors contributed equally. Project page: http://inklab.usc.edu/leanlife/

arXiv:2004.05712 [pdf, other]

doi 10.1145/3318464.3386135

A1: A Distributed In-Memory Graph Database

Authors: Chiranjeeb Buragohain, Knut Magne Risvik, Paul Brett, Miguel Castro, Wonhee Cho, Joshua Cowhig, Nikolas Gloy, Karthik Kalyanaraman, Richendra Khanna, John Pao, Matthew Renzelmann, Alex Shamis, Timothy Tan, Shuheng Zheng

Abstract: A1 is an in-memory distributed database used by the Bing search engine to support complex queries over structured data. The key enablers for A1 are availability of cheap DRAM and high speed RDMA (Remote Direct Memory Access) networking in commodity hardware. A1 uses FaRM as its underlying storage layer and builds the graph abstraction and query engine on top. The combination of in-memory storage a… ▽ More A1 is an in-memory distributed database used by the Bing search engine to support complex queries over structured data. The key enablers for A1 are availability of cheap DRAM and high speed RDMA (Remote Direct Memory Access) networking in commodity hardware. A1 uses FaRM as its underlying storage layer and builds the graph abstraction and query engine on top. The combination of in-memory storage and RDMA access requires rethinking how data is allocated, organized and queried in a large distributed system. A single A1 cluster can store tens of billions of vertices and edges and support a throughput of 350+ million of vertex reads per second with end to end query latency in single digit milliseconds. In this paper we describe the A1 data model, RDMA optimized data structures and query execution. △ Less

Submitted 12 April, 2020; originally announced April 2020.

arXiv:2002.09073 [pdf, other]

Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nyström method

Authors: Michał Dereziński, Rajiv Khanna, Michael W. Mahoney

Abstract: The Column Subset Selection Problem (CSSP) and the Nyström method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing. A fundamental question in this area is: how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix… ▽ More The Column Subset Selection Problem (CSSP) and the Nyström method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing. A fundamental question in this area is: how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worst-case analysis. Our approach leads to significantly better bounds for datasets with known rates of singular value decay, e.g., polynomial or exponential decay. Our analysis also reveals an intriguing phenomenon: the approximation factor as a function of k may exhibit multiple peaks and valleys, which we call a multiple-descent curve. A lower bound we establish shows that this behavior is not an artifact of our analysis, but rather it is an inherent property of the CSSP and Nyström tasks. Finally, using the example of a radial basis function (RBF) kernel, we show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter. △ Less

Submitted 18 December, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

Comments: Minor typo corrections and clarifications; slight change in the title; moved part of the related work and background discussion to the appendix

arXiv:2001.06976 [pdf, ps, other]

doi 10.1007/978-981-15-1611-5_13

The quotient Unimodular Vector group is nilpotent

Authors: Reema Khanna, Selby Jose, Sampat Sharma, Ravi A. Rao

Abstract: Jose-Rao introduced and studied the Special Unimodular Vector group $SUm_r(R)$ and $EUm_r(R)$, its Elementary Unimodular Vector subgroup. They proved that for $r \geq 2$, $EUm_r(R)$ is a normal subgroup of $SUm_r(R)$. The Jose-Rao theorem says that the quotient Unimodular Vector group, $SUm_r(R)/EUm_r(R)$, for $r \geq 2$, is a subgroup of the orthogonal quotient group… ▽ More Jose-Rao introduced and studied the Special Unimodular Vector group $SUm_r(R)$ and $EUm_r(R)$, its Elementary Unimodular Vector subgroup. They proved that for $r \geq 2$, $EUm_r(R)$ is a normal subgroup of $SUm_r(R)$. The Jose-Rao theorem says that the quotient Unimodular Vector group, $SUm_r(R)/EUm_r(R)$, for $r \geq 2$, is a subgroup of the orthogonal quotient group $SO_{2(r+1)}(R)/EO_{2(r + 1)}(R)$. The latter group is known to be nilpotent by the work of Hazrat-Vavilov, following methods of A. Bak; and so is the former. In this article we give a direct proof, following ideas of A. Bak, to show that the quotient Unimodular Vector group is nilpotent of class $\leq d = \dim(R)$. We also use the Quillen-Suslin theory, inspired by A. Bak's method, to prove that if $R = A[X]$, with $A$ a local ring, then the quotient Unimodular Vector group is abelian. △ Less

Submitted 20 January, 2020; originally announced January 2020.

Journal ref: Leavitt Path algebra and Classical K-Theory, (2020) 225-240. Indian statistical institute series. Springer, Singapore

arXiv:1911.03098 [pdf, other]

doi 10.1109/MRA.2020.3012492

Building an Aerial-Ground Robotics System for Precision Farming: An Adaptable Solution

Authors: Alberto Pretto, Stéphanie Aravecchia, Wolfram Burgard, Nived Chebrolu, Christian Dornhege, Tillmann Falck, Freya Fleckenstein, Alessandra Fontenla, Marco Imperoli, Raghav Khanna, Frank Liebisch, Philipp Lottes, Andres Milioto, Daniele Nardi, Sandro Nardi, Johannes Pfeifer, Marija Popović, Ciro Potena, Cédric Pradalier, Elisa Rothacker-Feder, Inkyu Sa, Alexander Schaefer, Roland Siegwart, Cyrill Stachniss, Achim Walter , et al. (3 additional authors not shown)

Abstract: The application of autonomous robots in agriculture is gaining increasing popularity thanks to the high impact it may have on food security, sustainability, resource use efficiency, reduction of chemical treatments, and the optimization of human effort and yield. With this vision, the Flourish research project aimed to develop an adaptable robotic solution for precision farming that combines the a… ▽ More The application of autonomous robots in agriculture is gaining increasing popularity thanks to the high impact it may have on food security, sustainability, resource use efficiency, reduction of chemical treatments, and the optimization of human effort and yield. With this vision, the Flourish research project aimed to develop an adaptable robotic solution for precision farming that combines the aerial survey capabilities of small autonomous unmanned aerial vehicles (UAVs) with targeted intervention performed by multi-purpose unmanned ground vehicles (UGVs). This paper presents an overview of the scientific and technological advances and outcomes obtained in the project. We introduce multi-spectral perception algorithms and aerial and ground-based systems developed for monitoring crop density, weed pressure, crop nitrogen nutrition status, and to accurately classify and locate weeds. We then introduce the navigation and map** systems tailored to our robots in the agricultural environment, as well as the modules for collaborative map**. We finally present the ground intervention hardware, software solutions, and interfaces we implemented and tested in different field conditions and with different crops. We describe a real use case in which a UAV collaborates with a UGV to monitor the field and to perform selective spraying without human intervention. △ Less

Submitted 7 June, 2022; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: Published in IEEE Robotics & Automation Magazine, vol. 28, no. 3, pp. 29-49, Sept. 2021

Journal ref: IEEE Robotics & Automation Magazine, vol. 28, no. 3, pp. 29-49, Sept. 2021

arXiv:1910.13389 [pdf, ps, other]

Learning Sparse Distributions using Iterative Hard Thresholding

Authors: Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

Abstract: Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference. In this work, we consider IHT as a solution to the problem of learning sparse discrete distributions. We study the hardness of using IHT on the space of measures. As a practical alternative, we propo… ▽ More Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference. In this work, we consider IHT as a solution to the problem of learning sparse discrete distributions. We study the hardness of using IHT on the space of measures. As a practical alternative, we propose a greedy approximate projection which simultaneously captures appropriate notions of sparsity in distributions, while satisfying the simplex constraint, and investigate the convergence behavior of the resulting procedure in various settings. Our results show, both in theory and practice, that IHT can achieve state of the art results for learning sparse distributions. △ Less

Submitted 30 January, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: NeurIPS 2019

arXiv:1909.09548 [pdf, other]

doi 10.1109/LRA.2020.2969191

An Efficient Sampling-based Method for Online Informative Path Planning in Unknown Environments

Authors: Lukas Schmid, Michael Pantic, Raghav Khanna, Lionel Ott, Roland Siegwart, Juan Nieto

Abstract: The ability to plan informative paths online is essential to robot autonomy. In particular, sampling-based approaches are often used as they are capable of using arbitrary information gain formulations. However, they are prone to local minima, resulting in sub-optimal trajectories, and sometimes do not reach global coverage. In this paper, we present a new RRT*-inspired online informative path pla… ▽ More The ability to plan informative paths online is essential to robot autonomy. In particular, sampling-based approaches are often used as they are capable of using arbitrary information gain formulations. However, they are prone to local minima, resulting in sub-optimal trajectories, and sometimes do not reach global coverage. In this paper, we present a new RRT*-inspired online informative path planning algorithm. Our method continuously expands a single tree of candidate trajectories and rewires segments to maintain the tree and refine intermediate trajectories. This allows the algorithm to achieve global coverage and maximize the utility of a path in a global context, using a single objective function. We demonstrate the algorithm's capabilities in the applications of autonomous indoor exploration as well as accurate Truncated Signed Distance Field (TSDF)-based 3D reconstruction on-board a Micro Aerial vehicle (MAV). We study the impact of commonly used information gain and cost formulations in these scenarios and propose a novel TSDF-based 3D reconstruction gain and cost-utility formulation. Detailed evaluation in realistic simulation environments show that our approach outperforms state of the art methods in these tasks. Experiments on a real MAV demonstrate the ability of our method to robustly plan in real-time, exploring an indoor environment solely with on-board sensing and computation. We make our framework available for future research. △ Less

Submitted 14 January, 2020; v1 submitted 20 September, 2019; originally announced September 2019.

Comments: 8 pages, 6 figures, video: https://youtu.be/lEadqJ1_8Do, framework: https://github.com/ethz-asl/mav_active_3d_planning

Journal ref: IEEE Robotics and Automation Letters, Vol. 5, Iss. 2, April 2020

arXiv:1907.08410 [pdf, other]

Geometric Rates of Convergence for Kernel-based Sampling Algorithms

Authors: Rajiv Khanna, Liam Hodgkinson, Michael W. Mahoney

Abstract: The rate of convergence of weighted kernel herding (WKH) and sequential Bayesian quadrature (SBQ), two kernel-based sampling algorithms for estimating integrals with respect to some target probability measure, is investigated. Under verifiable conditions on the chosen kernel and target measure, we establish a near-geometric rate of convergence for target measures that are nearly atomic. Furthermor… ▽ More The rate of convergence of weighted kernel herding (WKH) and sequential Bayesian quadrature (SBQ), two kernel-based sampling algorithms for estimating integrals with respect to some target probability measure, is investigated. Under verifiable conditions on the chosen kernel and target measure, we establish a near-geometric rate of convergence for target measures that are nearly atomic. Furthermore, we show these algorithms perform comparably to the theoretical best possible sampling algorithm under the maximum mean discrepancy. An analysis is also conducted in a distributed setting. Our theoretical developments are supported by empirical observations on simulated data as well as a real world application. △ Less

Submitted 31 October, 2021; v1 submitted 19 July, 2019; originally announced July 2019.

Comments: Accepted to UAI 2021 (Oral)

arXiv:1810.10118 [pdf, other]

Interpreting Black Box Predictions using Fisher Kernels

Authors: Rajiv Khanna, Been Kim, Joydeep Ghosh, Oluwasanmi Koyejo

Abstract: Research in both machine learning and psychology suggests that salient examples can help humans to interpret learning models. To this end, we take a novel look at black box interpretation of test predictions in terms of training examples. Our goal is to ask `which training examples are most responsible for a given set of predictions'? To answer this question, we make use of Fisher kernels as the d… ▽ More Research in both machine learning and psychology suggests that salient examples can help humans to interpret learning models. To this end, we take a novel look at black box interpretation of test predictions in terms of training examples. Our goal is to ask `which training examples are most responsible for a given set of predictions'? To answer this question, we make use of Fisher kernels as the defining feature embedding of each data point, combined with Sequential Bayesian Quadrature (SBQ) for efficient selection of examples. In contrast to prior work, our method is able to seamlessly handle any sized subset of test predictions in a principled way. We theoretically analyze our approach, providing novel convergence bounds for SBQ over discrete candidate atoms. Our approach recovers the application of influence functions for interpretability as a special case yielding novel insights from this connection. We also present applications of the proposed approach to three use cases: cleaning training data, fixing mislabeled examples and data summarization. △ Less

Submitted 23 October, 2018; originally announced October 2018.

arXiv:1810.03329 [pdf, ps, other]

The Pillars of Relative Quillen--Suslin Theory

Authors: Rabeya Basu, Reema Khanna, Ravi A. Rao

Abstract: We deduce the relative version of the equivalences relating the relative Local Global Principle and the Normality of the relative Elementary subgroups of the traditional classical groups, viz. general linear, symplectic and orthogonal groups. This generalizes our previous result for the absolute case. We deduce the relative version of the equivalences relating the relative Local Global Principle and the Normality of the relative Elementary subgroups of the traditional classical groups, viz. general linear, symplectic and orthogonal groups. This generalizes our previous result for the absolute case. △ Less

Submitted 8 October, 2018; originally announced October 2018.

Comments: 12 pages

MSC Class: 13C10; 11E57; 11E70; 15A63; 19B10; 19B14

arXiv:1810.00457 [pdf, other]

doi 10.1109/LRA.2019.2894468

AgriColMap: Aerial-Ground Collaborative 3D Map** for Precision Farming

Authors: Ciro Potena, Raghav Khanna, Juan Nieto, Roland Siegwart, Daniele Nardi, Alberto Pretto

Abstract: The combination of aerial survey capabilities of Unmanned Aerial Vehicles with targeted intervention abilities of agricultural Unmanned Ground Vehicles can significantly improve the effectiveness of robotic systems applied to precision agriculture. In this context, building and updating a common map of the field is an essential but challenging task. The maps built using robots of different types s… ▽ More The combination of aerial survey capabilities of Unmanned Aerial Vehicles with targeted intervention abilities of agricultural Unmanned Ground Vehicles can significantly improve the effectiveness of robotic systems applied to precision agriculture. In this context, building and updating a common map of the field is an essential but challenging task. The maps built using robots of different types show differences in size, resolution and scale, the associated geolocation data may be inaccurate and biased, while the repetitiveness of both visual appearance and geometric structures found within agricultural contexts render classical map merging techniques ineffective. In this paper we propose AgriColMap, a novel map registration pipeline that leverages a grid-based multimodal environment representation which includes a vegetation index map and a Digital Surface Model. We cast the data association problem between maps built from UAVs and UGVs as a multimodal, large displacement dense optical flow estimation. The dominant, coherent flows, selected using a voting scheme, are used as point-to-point correspondences to infer a preliminary non-rigid alignment between the maps. A final refinement is then performed, by exploiting only meaningful parts of the registered maps. We evaluate our system using real world data for 3 fields with different crop species. The results show that our method outperforms several state of the art map registration and matching techniques by a large margin, and has a higher tolerance to large initial misalignments. We release an implementation of the proposed approach along with the acquired datasets with this paper. △ Less

Submitted 14 March, 2019; v1 submitted 30 September, 2018; originally announced October 2018.

Comments: Published in IEEE Robotics and Automation Letters, 2019

Journal ref: IEEE Robotics and Automation Letters, Vol: 4, Issue: 2, April 2019, pages 1085-1092

arXiv:1808.00100 [pdf, other]

WeedMap: A large-scale semantic weed map** framework using aerial multispectral imaging and deep neural network for precision farming

Authors: Inkyu Sa, Marija Popovic, Raghav Khanna, Zetao Chen, Philipp Lottes, Frank Liebisch, Juan Nieto, Cyrill Stachniss, Achim Walter, Roland Siegwart

Abstract: We present a novel weed segmentation and map** framework that processes multispectral images obtained from an unmanned aerial vehicle (UAV) using a deep neural network (DNN). Most studies on crop/weed semantic segmentation only consider single images for processing and classification. Images taken by UAVs often cover only a few hundred square meters with either color only or color and near-infra… ▽ More We present a novel weed segmentation and map** framework that processes multispectral images obtained from an unmanned aerial vehicle (UAV) using a deep neural network (DNN). Most studies on crop/weed semantic segmentation only consider single images for processing and classification. Images taken by UAVs often cover only a few hundred square meters with either color only or color and near-infrared (NIR) channels. Computing a single large and accurate vegetation map (e.g., crop/weed) using a DNN is non-trivial due to difficulties arising from: (1) limited ground sample distances (GSDs) in high-altitude datasets, (2) sacrificed resolution resulting from downsampling high-fidelity images, and (3) multispectral image alignment. To address these issues, we adopt a stand sliding window approach that operates on only small portions of multispectral orthomosaic maps (tiles), which are channel-wise aligned and calibrated radiometrically across the entire map. We define the tile size to be the same as that of the DNN input to avoid resolution loss. Compared to our baseline model (i.e., SegNet with 3 channel RGB inputs) yielding an area under the curve (AUC) of [background=0.607, crop=0.681, weed=0.576], our proposed model with 9 input channels achieves [0.839, 0.863, 0.782]. Additionally, we provide an extensive analysis of 20 trained models, both qualitatively and quantitatively, in order to evaluate the effects of varying input channels and tunable network hyperparameters. Furthermore, we release a large sugar beet/weed aerial dataset with expertly guided annotations for further research in the fields of remote sensing, precision agriculture, and agricultural robotics. △ Less

Submitted 6 September, 2018; v1 submitted 31 July, 2018; originally announced August 2018.

Comments: 25 pages, 14 figures, MDPI Remote Sensing

arXiv:1806.02185 [pdf, other]

Boosting Black Box Variational Inference

Authors: Francesco Locatello, Gideon Dresdner, Rajiv Khanna, Isabel Valera, Gunnar Rätsch

Abstract: Approximating a probability density in a tractable manner is a central task in Bayesian statistics. Variational Inference (VI) is a popular technique that achieves tractability by choosing a relatively simple variational family. Borrowing ideas from the classic boosting framework, recent approaches attempt to \emph{boost} VI by replacing the selection of a single density with a greedily constructe… ▽ More Approximating a probability density in a tractable manner is a central task in Bayesian statistics. Variational Inference (VI) is a popular technique that achieves tractability by choosing a relatively simple variational family. Borrowing ideas from the classic boosting framework, recent approaches attempt to \emph{boost} VI by replacing the selection of a single density with a greedily constructed mixture of densities. In order to guarantee convergence, previous works impose stringent assumptions that require significant effort for practitioners. Specifically, they require a custom implementation of the greedy step (called the LMO) for every probabilistic model with respect to an unnatural variational family of truncated distributions. Our work fixes these issues with novel theoretical and algorithmic insights. On the theoretical side, we show that boosting VI satisfies a relaxed smoothness assumption which is sufficient for the convergence of the functional Frank-Wolfe (FW) algorithm. Furthermore, we rephrase the LMO problem and propose to maximize the Residual ELBO (RELBO) which replaces the standard ELBO optimization in VI. These theoretical enhancements allow for black box implementation of the boosting subroutine. Finally, we present a stop** criterion drawn from the duality gap in the classic FW analyses and exhaustive experiments to illustrate the usefulness of our theoretical and algorithmic contributions. △ Less

Submitted 28 November, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

arXiv:1712.09379 [pdf, other]

IHT dies hard: Provable accelerated Iterative Hard Thresholding

Authors: Rajiv Khanna, Anastasios Kyrillidis

Abstract: We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods. By simply modifying plain IHT, we investigate its convergence behavior on convex optimization criteria with non-convex constraints, under standard assumptions. In diverse scenaria, we observe that acceleration in IHT leads to significant improvements, compared to state of the… ▽ More We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods. By simply modifying plain IHT, we investigate its convergence behavior on convex optimization criteria with non-convex constraints, under standard assumptions. In diverse scenaria, we observe that acceleration in IHT leads to significant improvements, compared to state of the art projected gradient descent and Frank-Wolfe variants. As a byproduct of our inspection, we study the impact of selecting the momentum parameter: similar to convex settings, two modes of behavior are observed --"rippling" and linear-- depending on the level of momentum. △ Less

Submitted 13 September, 2019; v1 submitted 26 December, 2017; originally announced December 2017.

Comments: accepted to AISTATS 2018

arXiv:1711.00548 [pdf, other]

Autonomous Electric Race Car Design

Authors: Niklas Funk, Nikhilesh Alatur, Robin Deuber, Frederick Gonon, Nico Messikommer, Julian Nubert, Moritz Patriarca, Simon Schaefer, Dominic Scotoni, Nicholas Bünger, Renaud Dube, Raghav Khanna, Mark Pfeiffer, Erik Wilhelm, Roland Siegwart

Abstract: Autonomous driving and electric vehicles are nowadays very active research and development areas. In this paper we present the conversion of a standard Kyburz eRod into an autonomous vehicle that can be operated in challenging environments such as Swiss mountain passes. The overall hardware and software architectures are described in detail with a special emphasis on the sensor requirements for au… ▽ More Autonomous driving and electric vehicles are nowadays very active research and development areas. In this paper we present the conversion of a standard Kyburz eRod into an autonomous vehicle that can be operated in challenging environments such as Swiss mountain passes. The overall hardware and software architectures are described in detail with a special emphasis on the sensor requirements for autonomous vehicles operating in partially structured environments. Furthermore, the design process itself and the finalized system architecture are presented. The work shows state of the art results in localization and controls for self-driving high-performance electric vehicles. Test results of the overall system are presented, which show the importance of generalizable state estimation algorithms to handle a plethora of conditions. △ Less

Submitted 1 November, 2017; originally announced November 2017.

Comments: Paper submitted and accepted to the EVS30 Symposium, from October 9-11, 2017 in Stuttgart, Germany

arXiv:1709.03329 [pdf, other]

weedNet: Dense Semantic Weed Classification Using Multispectral Images and MAV for Smart Farming

Authors: Inkyu Sa, Zetao Chen, Marija Popovic, Raghav Khanna, Frank Liebisch, Juan Nieto, Roland Siegwart

Abstract: Selective weed treatment is a critical step in autonomous crop management as related to crop health and yield. However, a key challenge is reliable, and accurate weed detection to minimize damage to surrounding plants. In this paper, we present an approach for dense semantic weed classification with multispectral images collected by a micro aerial vehicle (MAV). We use the recently developed encod… ▽ More Selective weed treatment is a critical step in autonomous crop management as related to crop health and yield. However, a key challenge is reliable, and accurate weed detection to minimize damage to surrounding plants. In this paper, we present an approach for dense semantic weed classification with multispectral images collected by a micro aerial vehicle (MAV). We use the recently developed encoder-decoder cascaded Convolutional Neural Network (CNN), Segnet, that infers dense semantic classes while allowing any number of input image channels and class balancing with our sugar beet and weed datasets. To obtain training datasets, we established an experimental field with varying herbicide levels resulting in field plots containing only either crop or weed, enabling us to use the Normalized Difference Vegetation Index (NDVI) as a distinguishable feature for automatic ground truth generation. We train 6 models with different numbers of input channels and condition (fine-tune) it to achieve about 0.8 F1-score and 0.78 Area Under the Curve (AUC) classification metrics. For model deployment, an embedded GPU system (Jetson TX2) is tested for MAV integration. Dataset used in this paper is released to support the community and future work. △ Less

Submitted 11 September, 2017; originally announced September 2017.

arXiv:1708.06652 [pdf, other]

doi 10.1109/MRA.2017.2771326

Build Your Own Visual-Inertial Drone: A Cost-Effective and Open-Source Autonomous Drone

Authors: Inkyu Sa, Mina Kamel, Michael Burri, Michael Bloesch, Raghav Khanna, Marija Popovic, Juan Nieto, Roland Siegwart

Abstract: This paper describes an approach to building a cost-effective and research grade visual-inertial odometry aided vertical taking-off and landing (VTOL) platform. We utilize an off-the-shelf visual-inertial sensor, an onboard computer, and a quadrotor platform that are factory-calibrated and mass-produced, thereby sharing similar hardware and sensor specifications (e.g., mass, dimensions, intrinsic… ▽ More This paper describes an approach to building a cost-effective and research grade visual-inertial odometry aided vertical taking-off and landing (VTOL) platform. We utilize an off-the-shelf visual-inertial sensor, an onboard computer, and a quadrotor platform that are factory-calibrated and mass-produced, thereby sharing similar hardware and sensor specifications (e.g., mass, dimensions, intrinsic and extrinsic of camera-IMU systems, and signal-to-noise ratio). We then perform a system calibration and identification enabling the use of our visual-inertial odometry, multi-sensor fusion, and model predictive control frameworks with the off-the-shelf products. This implies that we can partially avoid tedious parameter tuning procedures for building a full system. The complete system is extensively evaluated both indoors using a motion capture system and outdoors using a laser tracker while performing hover and step responses, and trajectory following tasks in the presence of external wind disturbances. We achieve root-mean-square (RMS) pose errors between a reference and actual trajectories of 0.036m, while performing hover. We also conduct relatively long distance flight (~180m) experiments on a farm site and achieve 0.82% drift error of the total distance flight. This paper conveys the insights we acquired about the platform and sensor module and returns to the community as open-source code with tutorial documentation. △ Less

Submitted 6 September, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

Comments: 21 pages, 10 figures, accepted to IEEE Robotics & Automation Magazine

Journal ref: IEEE Robotics & Automation Magazine 2017

arXiv:1708.01733 [pdf, other]

Boosting Variational Inference: an Optimization Perspective

Authors: Francesco Locatello, Rajiv Khanna, Joydeep Ghosh, Gunnar Rätsch

Abstract: Variational inference is a popular technique to approximate a possibly intractable Bayesian posterior with a more tractable one. Recently, boosting variational inference has been proposed as a new paradigm to approximate the posterior by a mixture of densities by greedily adding components to the mixture. However, as is the case with many other variational inference algorithms, its theoretical pro… ▽ More Variational inference is a popular technique to approximate a possibly intractable Bayesian posterior with a more tractable one. Recently, boosting variational inference has been proposed as a new paradigm to approximate the posterior by a mixture of densities by greedily adding components to the mixture. However, as is the case with many other variational inference algorithms, its theoretical properties have not been studied. In the present work, we study the convergence properties of this approach from a modern optimization viewpoint by establishing connections to the classic Frank-Wolfe algorithm. Our analyses yields novel theoretical insights regarding the sufficient conditions for convergence, explicit rates, and algorithmic simplifications. Since a lot of focus in previous works for variational inference has been on tractability, our work is especially important as a much needed attempt to bridge the gap between probabilistic models and their corresponding theoretical properties. △ Less

Submitted 7 March, 2018; v1 submitted 5 August, 2017; originally announced August 2017.

Journal ref: AISTATS 2018

arXiv:1703.02723 [pdf, other]

Scalable Greedy Feature Selection via Weak Submodularity

Authors: Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban, Joydeep Ghosh

Abstract: Greedy algorithms are widely used for problems in machine learning such as feature selection and set function optimization. Unfortunately, for large datasets, the running time of even greedy algorithms can be quite high. This is because for each greedy step we need to refit a model or calculate a function using the previously selected choices and the new candidate. Two algorithms that are faster… ▽ More Greedy algorithms are widely used for problems in machine learning such as feature selection and set function optimization. Unfortunately, for large datasets, the running time of even greedy algorithms can be quite high. This is because for each greedy step we need to refit a model or calculate a function using the previously selected choices and the new candidate. Two algorithms that are faster approximations to the greedy forward selection were introduced recently ([Mirzasoleiman et al. 2013, 2015]). They achieve better performance by exploiting distributed computation and stochastic evaluation respectively. Both algorithms have provable performance guarantees for submodular functions. In this paper we show that divergent from previously held opinion, submodularity is not required to obtain approximation guarantees for these two algorithms. Specifically, we show that a generalized concept of weak submodularity suffices to give multiplicative approximation guarantees. Our result extends the applicability of these algorithms to a larger class of functions. Furthermore, we show that a bounded submodularity ratio can be used to provide data dependent bounds that can sometimes be tighter also for submodular functions. We empirically validate our work by showing superior performance of fast greedy approximations versus several established baselines on artificial and real datasets. △ Less

Submitted 8 March, 2017; originally announced March 2017.

Comments: To appear in AISTATS 2017

arXiv:1703.02721 [pdf, other]

On Approximation Guarantees for Greedy Low Rank Optimization

Authors: Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban

Abstract: We provide new approximation guarantees for greedy low rank matrix estimation under standard assumptions of restricted strong convexity and smoothness. Our novel analysis also uncovers previously unknown connections between the low rank estimation and combinatorial optimization, so much so that our bounds are reminiscent of corresponding approximation bounds in submodular maximization. Additionall… ▽ More We provide new approximation guarantees for greedy low rank matrix estimation under standard assumptions of restricted strong convexity and smoothness. Our novel analysis also uncovers previously unknown connections between the low rank estimation and combinatorial optimization, so much so that our bounds are reminiscent of corresponding approximation bounds in submodular maximization. Additionally, we also provide statistical recovery guarantees. Finally, we present empirical comparison of greedy estimation with established baselines on two important real-world problems. △ Less

Submitted 8 March, 2017; originally announced March 2017.

arXiv:1702.06457 [pdf, other]

A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe

Authors: Francesco Locatello, Rajiv Khanna, Michael Tschannen, Martin Jaggi

Abstract: Two of the most fundamental prototypes of greedy optimization are the matching pursuit and Frank-Wolfe algorithms. In this paper, we take a unified view on both classes of methods, leading to the first explicit convergence rates of matching pursuit methods in an optimization sense, for general sets of atoms. We derive sublinear ($1/t$) convergence for both classes on general smooth objectives, and… ▽ More Two of the most fundamental prototypes of greedy optimization are the matching pursuit and Frank-Wolfe algorithms. In this paper, we take a unified view on both classes of methods, leading to the first explicit convergence rates of matching pursuit methods in an optimization sense, for general sets of atoms. We derive sublinear ($1/t$) convergence for both classes on general smooth objectives, and linear convergence on strongly convex objectives, as well as a clear correspondence of algorithm variants. Our presented algorithms and rates are affine invariant, and do not need any incoherence or sparsity assumptions. △ Less

Submitted 7 March, 2017; v1 submitted 21 February, 2017; originally announced February 2017.

Comments: appearing at AISTATS 2017

arXiv:1701.08623 [pdf, other]

Dynamic System Identification, and Control for a cost effective open-source VTOL MAV

Authors: Inkyu Sa, Mina Kamel, Raghav Khanna, Marija Popovic, Juan Nieto, Roland Siegwart

Abstract: This paper describes dynamic system identification, and full control of a cost-effective vertical take-off and landing (VTOL) multi-rotor micro-aerial vehicle (MAV) --- DJI Matrice 100. The dynamics of the vehicle and autopilot controllers are identified using only a built-in IMU and utilized to design a subsequent model predictive controller (MPC). Experimental results for the control performance… ▽ More This paper describes dynamic system identification, and full control of a cost-effective vertical take-off and landing (VTOL) multi-rotor micro-aerial vehicle (MAV) --- DJI Matrice 100. The dynamics of the vehicle and autopilot controllers are identified using only a built-in IMU and utilized to design a subsequent model predictive controller (MPC). Experimental results for the control performance are evaluated using a motion capture system while performing hover, step responses, and trajectory following tasks in the present of external wind disturbances. We achieve root-mean-square (RMS) errors between the reference and actual trajectory of x=0.021m, y=0.016m, z=0.029m, roll=0.392deg, pitch=0.618deg, and yaw=1.087deg while performing hover. This paper also conveys the insights we have gained about the platform and returned to the community through open-source code, and documentation. △ Less

Submitted 9 March, 2017; v1 submitted 30 January, 2017; originally announced January 2017.

Comments: 8 pages, 12 figures

arXiv:1612.00804 [pdf, other]

Restricted Strong Convexity Implies Weak Submodularity

Authors: Ethan R. Elenberg, Rajiv Khanna, Alexandros G. Dimakis, Sahand Negahban

Abstract: We connect high-dimensional subset selection and submodular maximization. Our results extend the work of Das and Kempe (2011) from the setting of linear regression to arbitrary objective functions. For greedy feature selection, this connection allows us to obtain strong multiplicative performance bounds on several methods without statistical modeling assumptions. We also derive recovery guarantees… ▽ More We connect high-dimensional subset selection and submodular maximization. Our results extend the work of Das and Kempe (2011) from the setting of linear regression to arbitrary objective functions. For greedy feature selection, this connection allows us to obtain strong multiplicative performance bounds on several methods without statistical modeling assumptions. We also derive recovery guarantees of this form under standard assumptions. Our work shows that greedy algorithms perform within a constant factor from the best possible subset-selection solution for a broad class of general objective functions. Our methods allow a direct control over the number of obtained features as opposed to regularization parameters that only implicitly control sparsity. Our proof technique uses the concept of weak submodularity initially defined by Das and Kempe. We draw a connection between convex analysis and submodular set function theory which may be of independent interest for other statistical learning applications that have combinatorial structure. △ Less

Submitted 12 October, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

arXiv:1608.07388 [pdf, other]

doi 10.3847/1538-4357/833/1/86

AstroSat CZT Imager observations of GRB 151006A: timing, spectroscopy, and polarisation study

Authors: A. R. Rao, Vikas Chand, M. K. Hingar, S. Iyyani, Rakesh Khanna, A. P. K. Kutty, J. P. Malkar, D. Paul, V. B. Bhalerao, D. Bhattacharya, G. C. Dewangan, Pramod Pawar, A. M. Vibhute, T. Chattopadhyay, N. P. S. Mithun, S. V. Vadawale, N. Vagshette, R. Basak, P. Pradeep, Essy Samuel, S. Sreekumar, P. Vinod, K. H. Navalgund, R. Pandiyan, K. S. Sarma , et al. (2 additional authors not shown)

Abstract: AstroSat is a multi-wavelength satellite launched on 2015 September 28. The CZT Imager of AstroSat on its very first day of operation detected a long duration gamma-ray burst (GRB) namely GRB 151006A. Using the off-axis imaging and spectral response of the instrument, we demonstrate that CZT Imager can localise this GRB correct to about a few degrees and it can provide, in conjunction with Swift,… ▽ More AstroSat is a multi-wavelength satellite launched on 2015 September 28. The CZT Imager of AstroSat on its very first day of operation detected a long duration gamma-ray burst (GRB) namely GRB 151006A. Using the off-axis imaging and spectral response of the instrument, we demonstrate that CZT Imager can localise this GRB correct to about a few degrees and it can provide, in conjunction with Swift, spectral parameters similar to that obtained from Fermi/GBM. Hence CZT Imager would be a useful addition to the currently operating GRB instruments (Swift and Fermi). Specifically, we argue that the CZT Imager will be most useful for the short hard GRBs by providing localisation for those detected by Fermi and spectral information for those detected only by Swift. We also provide preliminary results on a new exciting capability of this instrument: CZT Imager is able to identify Compton scattered events thereby providing polarisation information for bright GRBs. GRB 151006A, in spite of being relatively faint, shows hints of a polarisation signal at 100-300 keV (though at a low significance level). We point out that CZT Imager should provide significant time resolved polarisation measurements for GRBs that have fluence 3 times higher than that of GRB 151006A. We estimate that the number of such bright GRBs detectable by CZT Imager is 5 - 6 per year. CZT Imager can also act as a good hard X-ray monitoring device for possible electromagnetic counterparts of Gravitational Wave events. △ Less

Submitted 26 August, 2016; originally announced August 2016.

Comments: Submitted to ApJ

arXiv:1608.06038 [pdf, other]

doi 10.1007/s12036-017-9450-0

Charged Particle Monitor on the AstroSat mission

Authors: A. R. Rao, M. H. Patil, Yash Bhargava, Rakesh Khanna, M. K. Hingar, A. P. K. Kutty, J. P. Malkar, Rupal Basak, S. Sreekumar, Essy Samuel, P. Priya, P. Vinod, D. Bhattacharya, V. Bhalerao, S. V. Vadawale, N. P. S. Mithun, R. Pandiyan, K. Subbarao, S. Seetha, K. Suryanarayana Sarma

Abstract: Charged Particle Monitor (CPM) on-board the AstroSat satellite is an instrument designed to detect the flux of charged particles at the satellite location. A Cesium Iodide Thallium (CsI(Tl)) crystal is used with a Kapton window to detect protons with energies greater than 1 MeV. The ground calibration of CPM was done using gamma-rays from radioactive sources and protons from particle accelerators.… ▽ More Charged Particle Monitor (CPM) on-board the AstroSat satellite is an instrument designed to detect the flux of charged particles at the satellite location. A Cesium Iodide Thallium (CsI(Tl)) crystal is used with a Kapton window to detect protons with energies greater than 1 MeV. The ground calibration of CPM was done using gamma-rays from radioactive sources and protons from particle accelerators. Based on the ground calibration results, energy deposition above 1 MeV are accepted and particle counts are recorded. It is found that CPM counts are steady and the signal for the onset and exit of South Atlantic Anomaly (SAA) region are generated in a very reliable and stable manner. △ Less

Submitted 21 August, 2016; originally announced August 2016.

Comments: 9 pages, 8 figures, 5 tables. To appear in JAA special issue on AstroSat

arXiv:1608.03408 [pdf, other]

doi 10.1007/s12036-017-9447-8

The Cadmium Zinc Telluride Imager on AstroSat

Authors: V. Bhalerao, D. Bhattacharya, A. Vibhute, P. Pawar, A. R. Rao, M. K. Hingar, Rakesh Khanna, A. P. K. Kutty, J. P. Malkar, M. H. Patil, Y. K. Arora, S. Sinha, P. Priya, Essy Samuel, S. Sreekumar, P. Vinod, N. P. S. Mithun, S. V. Vadawale, N. Vagshette, K. H. Navalgund, K. S. Sarma, R. Pandiyan, S. Seetha, K. Subbarao

Abstract: The Cadmium Zinc Telluride Imager (CZTI) is a high energy, wide-field imaging instrument on AstroSat. CZT's namesake Cadmium Zinc Telluride detectors cover an energy range from 20 keV to > 200 keV, with 11% energy resolution at 60 keV. The coded aperture mask attains an angular resolution of 17' over a 4.6 deg x 4.6 deg (FWHM) field of view. CZTI functions as an open detector above 100 keV, contin… ▽ More The Cadmium Zinc Telluride Imager (CZTI) is a high energy, wide-field imaging instrument on AstroSat. CZT's namesake Cadmium Zinc Telluride detectors cover an energy range from 20 keV to > 200 keV, with 11% energy resolution at 60 keV. The coded aperture mask attains an angular resolution of 17' over a 4.6 deg x 4.6 deg (FWHM) field of view. CZTI functions as an open detector above 100 keV, continuously sensitive to GRBs and other transients in about 30% of the sky. The pixellated detectors are sensitive to polarisation above ~100 keV, with exciting possibilities for polarisation studies of transients and bright persistent sources. In this paper, we provide details of the complete CZTI instrument, detectors, coded aperture mask, mechanical and electronic configuration, as well as data and products. △ Less

Submitted 11 August, 2016; originally announced August 2016.

Comments: 9 pages, 6 figures, 1 table. To appear in Astrosat special issue of the Journal of Astronomy and Astrophysics

arXiv:1607.03204 [pdf, other]

Information Projection and Approximate Inference for Structured Sparse Variables

Authors: Rajiv Khanna, Joydeep Ghosh, Russell Poldrack, Oluwasanmi Koyejo

Abstract: Approximate inference via information projection has been recently introduced as a general-purpose approach for efficient probabilistic inference given sparse variables. This manuscript goes beyond classical sparsity by proposing efficient algorithms for approximate inference via information projection that are applicable to any structure on the set of variables that admits enumeration using a \em… ▽ More Approximate inference via information projection has been recently introduced as a general-purpose approach for efficient probabilistic inference given sparse variables. This manuscript goes beyond classical sparsity by proposing efficient algorithms for approximate inference via information projection that are applicable to any structure on the set of variables that admits enumeration using a \emph{matroid}. We show that the resulting information projection can be reduced to combinatorial submodular optimization subject to matroid constraints. Further, leveraging recent advances in submodular optimization, we provide an efficient greedy algorithm with strong optimization-theoretic guarantees. The class of probabilistic models that can be expressed in this way is quite broad and, as we show, includes group sparse regression, group sparse principal components analysis and sparse canonical correlation analysis, among others. Moreover, empirical results on simulated data and high dimensional neuroimaging data highlight the superior performance of the information projection approach as compared to established baselines for a range of probabilistic models. △ Less

Submitted 11 July, 2016; originally announced July 2016.

arXiv:1602.04208 [pdf, other]

Pursuits in Structured Non-Convex Matrix Factorizations

Authors: Rajiv Khanna, Michael Tschannen, Martin Jaggi

Abstract: Efficiently representing real world data in a succinct and parsimonious manner is of central importance in many fields. We present a generalized greedy pursuit framework, allowing us to efficiently solve structured matrix factorization problems, where the factors are allowed to be from arbitrary sets of structured vectors. Such structure may include sparsity, non-negativeness, order, or a combinat… ▽ More Efficiently representing real world data in a succinct and parsimonious manner is of central importance in many fields. We present a generalized greedy pursuit framework, allowing us to efficiently solve structured matrix factorization problems, where the factors are allowed to be from arbitrary sets of structured vectors. Such structure may include sparsity, non-negativeness, order, or a combination thereof. The algorithm approximates a given matrix by a linear combination of few rank-1 matrices, each factorized into an outer product of two vector atoms of the desired structure. For the non-convex subproblems of obtaining good rank-1 structured matrix atoms, we employ and analyze a general atomic power method. In addition to the above applications, we prove linear convergence for generalized pursuit variants in Hilbert spaces - for the task of approximation over the linear span of arbitrary dictionaries - which generalizes OMP and is useful beyond matrix problems. Our experiments on real datasets confirm both the efficiency and also the broad applicability of our framework in practice. △ Less

Submitted 12 February, 2016; originally announced February 2016.

arXiv:1511.02024 [pdf, other]

Towards a Better Understanding of Predict and Count Models

Authors: S. Sathiya Keerthi, Tobias Schnabel, Rajiv Khanna

Abstract: In a recent paper, Levy and Goldberg pointed out an interesting connection between prediction-based word embedding models and count models based on pointwise mutual information. Under certain conditions, they showed that both models end up optimizing equivalent objective functions. This paper explores this connection in more detail and lays out the factors leading to differences between these mode… ▽ More In a recent paper, Levy and Goldberg pointed out an interesting connection between prediction-based word embedding models and count models based on pointwise mutual information. Under certain conditions, they showed that both models end up optimizing equivalent objective functions. This paper explores this connection in more detail and lays out the factors leading to differences between these models. We find that the most relevant differences from an optimization perspective are (i) predict models work in a low dimensional space where embedding vectors can interact heavily; (ii) since predict models have fewer parameters, they are less prone to overfitting. Motivated by the insight of our analysis, we show how count models can be regularized in a principled manner and provide closed-form solutions for L1 and L2 regularization. Finally, we propose a new embedding model with a convex objective and the additional benefit of being intelligible. △ Less

Submitted 6 November, 2015; originally announced November 2015.

Comments: 17 pages

arXiv:1507.01135 [pdf, other]

DPM: A State Space Model for Large-Scale Direct Marketing

Authors: Yubin Park, Rajiv Khanna, Joydeep Ghosh, Daniel Mihalko

Abstract: We propose a novel statistical model to answer three challenges in direct marketing: which channel to use, which offer to make, and when to offer. There are several potential applications for the proposed model, for example, develo** personalized marketing strategies and monitoring members' needs. Furthermore, the results from the model can complement and can be integrated with other existing mo… ▽ More We propose a novel statistical model to answer three challenges in direct marketing: which channel to use, which offer to make, and when to offer. There are several potential applications for the proposed model, for example, develo** personalized marketing strategies and monitoring members' needs. Furthermore, the results from the model can complement and can be integrated with other existing models. The proposed model, named Dynamic Propensity Model, is a latent variable time series model that utilizes both marketing and purchase histories of a customer. The latent variable in the model represents the customer's propensity to buy a product. The propensity derives from purchases and other observable responses. Marketing touches increase a member's propensity, and propensity score attenuates and propagates over time as governed by data-driven parameters. To estimate the parameters of the model, a new statistical methodology has been developed. This methodology makes use of particle methods with a stochastic gradient descent approach, resulting in fast estimation of the model coefficients even from big datasets. The model is validated using six months' marketing records from one of the largest insurance companies in the U.S. Experimental results indicate that the effects of marketing touches vary depending on both channels and products. We compare the predictive performance of the proposed model with lagged variable logistic regression. Limitations and extensions of the proposed algorithm are also discussed. △ Less

Submitted 4 July, 2015; originally announced July 2015.

arXiv:1403.2556 [pdf, ps, other]

doi 10.1103/PhysRevE.90.020401

Kinetically engendered sub-spinodal length scales in spontaneous dewetting of thin liquid films

Authors: TirumalaRao Kotni, Jayati Sarkar, Rajesh Khanna

Abstract: Numerical simulations of spontaneous dewetting of non-slip**, variable viscosity unstable thin liquid films on homogeneous substrates reveal the existence of sub-spinodal lengthscales through formation of satellite holes, a marker of nucleated dewetting and/or heterogeneous substrates, in the late stages of dewetting if the liquid viscosity decreases continually with decreasing film thickness. T… ▽ More Numerical simulations of spontaneous dewetting of non-slip**, variable viscosity unstable thin liquid films on homogeneous substrates reveal the existence of sub-spinodal lengthscales through formation of satellite holes, a marker of nucleated dewetting and/or heterogeneous substrates, in the late stages of dewetting if the liquid viscosity decreases continually with decreasing film thickness. These films also show established signatures of slip** films such as faster rupture and flatter morphologies in the early stages even without invoking any slippage. △ Less

Submitted 11 March, 2014; originally announced March 2014.

Comments: 5 pages, to be submitted to PRL

Showing 1–50 of 68 results for author: Khanna, R