-
GPT-4's assessment of its performance in a USMLE-based case study
Authors:
Uttam Dhakal,
Aniket Kumar Singh,
Suman Devkota,
Yogesh Sapkota,
Bishal Lamichhane,
Suprinsa Paudyal,
Chandra Dhakal
Abstract:
This study investigates GPT-4's assessment of its performance in healthcare applications. A simple prompting technique was used to prompt the LLM with questions taken from the United States Medical Licensing Examination (USMLE) questionnaire and it was tasked to evaluate its confidence score before posing the question and after asking the question. The questionnaire was categorized into two groups…
▽ More
This study investigates GPT-4's assessment of its performance in healthcare applications. A simple prompting technique was used to prompt the LLM with questions taken from the United States Medical Licensing Examination (USMLE) questionnaire and it was tasked to evaluate its confidence score before posing the question and after asking the question. The questionnaire was categorized into two groups-questions with feedback (WF) and questions with no feedback(NF) post-question. The model was asked to provide absolute and relative confidence scores before and after each question. The experimental findings were analyzed using statistical tools to study the variability of confidence in WF and NF groups. Additionally, a sequential analysis was conducted to observe the performance variation for the WF and NF groups. Results indicate that feedback influences relative confidence but doesn't consistently increase or decrease it. Understanding the performance of LLM is paramount in exploring its utility in sensitive areas like healthcare. This study contributes to the ongoing discourse on the reliability of AI, particularly of LLMs like GPT-4, within healthcare, offering insights into how feedback mechanisms might be optimized to enhance AI-assisted medical education and decision support.
△ Less
Submitted 26 March, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Improved Differentially Private and Lazy Online Convex Optimization
Authors:
Naman Agarwal,
Satyen Kale,
Karan Singh,
Abhradeep Guha Thakurta
Abstract:
We study the task of $(ε, δ)$-differentially private online convex optimization (OCO). In the online setting, the release of each distinct decision or iterate carries with it the potential for privacy loss. This problem has a long history of research starting with Jain et al. [2012] and the best known results for the regime of ε not being very small are presented in Agarwal et al. [2023]. In this…
▽ More
We study the task of $(ε, δ)$-differentially private online convex optimization (OCO). In the online setting, the release of each distinct decision or iterate carries with it the potential for privacy loss. This problem has a long history of research starting with Jain et al. [2012] and the best known results for the regime of ε not being very small are presented in Agarwal et al. [2023]. In this paper we improve upon the results of Agarwal et al. [2023] in terms of the dimension factors as well as removing the requirement of smoothness. Our results are now the best known rates for DP-OCO in this regime.
Our algorithms builds upon the work of [Asi et al., 2023] which introduced the idea of explicitly limiting the number of switches via rejection sampling. The main innovation in our algorithm is the use of sampling from a strongly log-concave density which allows us to trade-off the dimension factors better leading to improved results.
△ Less
Submitted 20 December, 2023; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Introduction to Online Nonstochastic Control
Authors:
Elad Hazan,
Karan Singh
Abstract:
This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control.
The primary distinction between onli…
▽ More
This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control.
The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.
△ Less
Submitted 29 May, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
Towards Differential Relational Privacy and its use in Question Answering
Authors:
Simone Bombari,
Alessandro Achille,
Zijian Wang,
Yu-Xiang Wang,
Yusheng Xie,
Kunwar Yashraj Singh,
Srikar Appalaraju,
Vijay Mahadevan,
Stefano Soatto
Abstract:
Memorization of the relation between entities in a dataset can lead to privacy issues when using a trained model for question answering. We introduce Relational Memorization (RM) to understand, quantify and control this phenomenon. While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning. The difference…
▽ More
Memorization of the relation between entities in a dataset can lead to privacy issues when using a trained model for question answering. We introduce Relational Memorization (RM) to understand, quantify and control this phenomenon. While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning. The difference is most pronounced when the data distribution is long-tailed, with many queries having only few training examples: Impeding general memorization prevents effective learning, while impeding only relational memorization still allows learning general properties of the underlying concepts. We formalize the notion of Relational Privacy (RP) and, inspired by Differential Privacy (DP), we provide a possible definition of Differential Relational Privacy (DrP). These notions can be used to describe and compute bounds on the amount of RM in a trained model. We illustrate Relational Privacy concepts in experiments with large-scale models for Question Answering.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
VQ-Flows: Vector Quantized Local Normalizing Flows
Authors:
Sahil Sidheekh,
Chris B. Dock,
Tushar Jain,
Radu Balan,
Maneesh K. Singh
Abstract:
Normalizing flows provide an elegant approach to generative modeling that allows for efficient sampling and exact density evaluation of unknown data distributions. However, current techniques have significant limitations in their expressivity when the data distribution is supported on a low-dimensional manifold or has a non-trivial topology. We introduce a novel statistical framework for learning…
▽ More
Normalizing flows provide an elegant approach to generative modeling that allows for efficient sampling and exact density evaluation of unknown data distributions. However, current techniques have significant limitations in their expressivity when the data distribution is supported on a low-dimensional manifold or has a non-trivial topology. We introduce a novel statistical framework for learning a mixture of local normalizing flows as "chart maps" over the data manifold. Our framework augments the expressivity of recent approaches while preserving the signature property of normalizing flows, that they admit exact density evaluation. We learn a suitable atlas of charts for the data manifold via a vector quantized auto-encoder (VQ-AE) and the distributions over them using a conditional flow. We validate experimentally that our probabilistic framework enables existing approaches to better model data distributions over complex manifolds.
△ Less
Submitted 18 June, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural Networks
Authors:
Manish K. Singh,
Vassilis Kekatos,
Georgios B. Giannakis
Abstract:
To shift the computational burden from real-time to offline in delay-critical power systems applications, recent works entertain the idea of using a deep neural network (DNN) to predict the solutions of the AC optimal power flow (AC-OPF) once presented load demands. As network topologies may change, training this DNN in a sample-efficient manner becomes a necessity. To improve data efficiency, thi…
▽ More
To shift the computational burden from real-time to offline in delay-critical power systems applications, recent works entertain the idea of using a deep neural network (DNN) to predict the solutions of the AC optimal power flow (AC-OPF) once presented load demands. As network topologies may change, training this DNN in a sample-efficient manner becomes a necessity. To improve data efficiency, this work utilizes the fact OPF data are not simple training labels, but constitute the solutions of a parametric optimization problem. We thus advocate training a sensitivity-informed DNN (SI-DNN) to match not only the OPF optimizers, but also their partial derivatives with respect to the OPF parameters (loads). It is shown that the required Jacobian matrices do exist under mild conditions, and can be readily computed from the related primal/dual solutions. The proposed SI-DNN is compatible with a broad range of OPF solvers, including a non-convex quadratically constrained quadratic program (QCQP), its semidefinite program (SDP) relaxation, and MATPOWER; while SI-DNN can be seamlessly integrated in other learning-to-OPF schemes. Numerical tests on three benchmark power systems corroborate the advanced generalization and constraint satisfaction capabilities for the OPF solutions predicted by an SI-DNN over a conventionally trained DNN, especially in low-data setups.
△ Less
Submitted 10 November, 2021; v1 submitted 26 March, 2021;
originally announced March 2021.
-
A Regret Minimization Approach to Iterative Learning Control
Authors:
Naman Agarwal,
Elad Hazan,
Anirudha Majumdar,
Karan Singh
Abstract:
We consider the setting of iterative learning control, or model-based policy learning in the presence of uncertain, time-varying dynamics. In this setting, we propose a new performance metric, planning regret, which replaces the standard stochastic uncertainty assumptions with worst case regret. Based on recent advances in non-stochastic control, we design a new iterative algorithm for minimizing…
▽ More
We consider the setting of iterative learning control, or model-based policy learning in the presence of uncertain, time-varying dynamics. In this setting, we propose a new performance metric, planning regret, which replaces the standard stochastic uncertainty assumptions with worst case regret. Based on recent advances in non-stochastic control, we design a new iterative algorithm for minimizing planning regret that is more robust to model mismatch and uncertainty. We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
△ Less
Submitted 26 February, 2021;
originally announced February 2021.
-
Boosting for Online Convex Optimization
Authors:
Elad Hazan,
Karan Singh
Abstract:
We consider the decision-making framework of online convex optimization with a very large number of experts. This setting is ubiquitous in contextual and reinforcement learning problems, where the size of the policy class renders enumeration and search within the policy class infeasible.
Instead, we consider generalizing the methodology of online boosting. We define a weak learning algorithm as…
▽ More
We consider the decision-making framework of online convex optimization with a very large number of experts. This setting is ubiquitous in contextual and reinforcement learning problems, where the size of the policy class renders enumeration and search within the policy class infeasible.
Instead, we consider generalizing the methodology of online boosting. We define a weak learning algorithm as a mechanism that guarantees multiplicatively approximate regret against a base class of experts. In this access model, we give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class. We consider both full and partial (a.k.a. bandit) information feedback models. We also give an analogous efficient boosting algorithm for the i.i.d. statistical setting.
Our results simultaneously generalize online boosting and gradient boosting guarantees to contextual learning model, online convex optimization and bandit linear optimization settings.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
Multitask Bandit Learning Through Heterogeneous Feedback Aggregation
Authors:
Zhi Wang,
Chicheng Zhang,
Manish Kumar Singh,
Laurel D. Riek,
Kamalika Chaudhuri
Abstract:
In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the $ε$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily id…
▽ More
In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the $ε$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily identical. We develop an upper confidence bound-based algorithm, RobustAgg$(ε)$, that adaptively aggregates rewards collected by different players. In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information sharing across players. We complement these upper bounds with nearly matching lower bounds. In the setting where pairwise similarities are unknown, we provide a lower bound, as well as an algorithm that trades off minimax regret guarantees for adaptivity to unknown similarity structure.
△ Less
Submitted 19 July, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics
Authors:
Arun Kumar Singh,
Priyanka Singh
Abstract:
Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human s…
▽ More
Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.
△ Less
Submitted 11 April, 2021; v1 submitted 3 September, 2020;
originally announced September 2020.
-
Dynamic Relational Inference in Multi-Agent Trajectories
Authors:
Ruichao Xiao,
Manish Kumar Singh,
Rose Yu
Abstract:
Inferring interactions from multi-agent trajectories has broad applications in physics, vision and robotics. Neural relational inference (NRI) is a deep generative model that can reason about relations in complex dynamics without supervision. In this paper, we take a careful look at this approach for relational inference in multi-agent trajectories. First, we discover that NRI can be fundamentally…
▽ More
Inferring interactions from multi-agent trajectories has broad applications in physics, vision and robotics. Neural relational inference (NRI) is a deep generative model that can reason about relations in complex dynamics without supervision. In this paper, we take a careful look at this approach for relational inference in multi-agent trajectories. First, we discover that NRI can be fundamentally limited without sufficient long-term observations. Its ability to accurately infer interactions degrades drastically for short output sequences. Next, we consider a more general setting of relational inference when interactions are changing overtime. We propose an extension ofNRI, which we call the DYnamic multi-AgentRelational Inference (DYARI) model that can reason about dynamic relations. We conduct exhaustive experiments to study the effect of model architecture, under-lying dynamics and training scheme on the performance of dynamic relational inference using a simulated physics system. We also showcase the usage of our model on real-world multi-agent basketball trajectories.
△ Less
Submitted 8 October, 2020; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Crop Yield Prediction Integrating Genotype and Weather Variables Using Deep Learning
Authors:
Johnathon Shook,
Tryambak Gangopadhyay,
Linjiang Wu,
Baskar Ganapathysubramanian,
Soumik Sarkar,
Asheesh K. Singh
Abstract:
Accurate prediction of crop yield supported by scientific and domain-relevant insights, can help improve agricultural breeding, provide monitoring across diverse climatic conditions and thereby protect against climatic challenges to crop production including erratic rainfall and temperature variations. We used historical performance records from Uniform Soybean Tests (UST) in North America spannin…
▽ More
Accurate prediction of crop yield supported by scientific and domain-relevant insights, can help improve agricultural breeding, provide monitoring across diverse climatic conditions and thereby protect against climatic challenges to crop production including erratic rainfall and temperature variations. We used historical performance records from Uniform Soybean Tests (UST) in North America spanning 13 years of data to build a Long Short Term Memory - Recurrent Neural Network based model to dissect and predict genotype response in multiple-environments by leveraging pedigree relatedness measures along with weekly weather parameters. Additionally, for providing explainability of the important time-windows in the growing season, we developed a model based on temporal attention mechanism. The combination of these two models outperformed random forest (RF), LASSO regression and the data-driven USDA model for yield prediction. We deployed this deep learning framework as a 'hypotheses generation tool' to unravel GxExM relationships. Attention-based time series models provide a significant advancement in interpretability of yield prediction models. The insights provided by explainable models are applicable in understanding how plant breeding programs can adapt their approaches for global climate change, for example identification of superior varieties for commercial release, intelligent sampling of testing environments in variety development, and integrating weather parameters for a targeted breeding approach. Using DL models as hypothesis generation tools will enable development of varieties with plasticity response in variable climatic conditions. We envision broad applicability of this approach (via conducting sensitivity analysis and "what-if" scenarios) for soybean and other crop species under different climatic conditions.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Training with Multi-Layer Embeddings for Model Reduction
Authors:
Benjamin Ghaemmaghami,
Zihao Deng,
Benjamin Cho,
Leo Orshansky,
Ashish Kumar Singh,
Mattan Erez,
Michael Orshansky
Abstract:
Modern recommendation systems rely on real-valued embeddings of categorical features. Increasing the dimension of embedding vectors improves model accuracy but comes at a high cost to model size. We introduce a multi-layer embedding training (MLET) architecture that trains embeddings via a sequence of linear layers to derive superior embedding accuracy vs. model size trade-off.
Our approach is f…
▽ More
Modern recommendation systems rely on real-valued embeddings of categorical features. Increasing the dimension of embedding vectors improves model accuracy but comes at a high cost to model size. We introduce a multi-layer embedding training (MLET) architecture that trains embeddings via a sequence of linear layers to derive superior embedding accuracy vs. model size trade-off.
Our approach is fundamentally based on the ability of factorized linear layers to produce superior embeddings to that of a single linear layer. We focus on the analysis and implementation of a two-layer scheme. Harnessing the recent results in dynamics of backpropagation in linear neural networks, we explain the ability to get superior multi-layer embeddings via their tendency to have lower effective rank. We show that substantial advantages are obtained in the regime where the width of the hidden layer is much larger than that of the final embedding (d). Crucially, at conclusion of training, we convert the two-layer solution into a single-layer one: as a result, the inference-time model size scales as d.
We prototype the MLET scheme within Facebook's PyTorch-based open-source Deep Learning Recommendation Model. We show that it allows reducing d by 4-8X, with a corresponding improvement in memory footprint, at given model accuracy. The experiments are run on two publicly available click-through-rate prediction benchmarks (Criteo-Kaggle and Avazu). The runtime cost of MLET is 25%, on average.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
DANR: Discrepancy-aware Network Regularization
Authors:
Hongyuan You,
Furkan Kocayusufoglu,
Ambuj K. Singh
Abstract:
Network regularization is an effective tool for incorporating structural prior knowledge to learn coherent models over networks, and has yielded provably accurate estimates in applications ranging from spatial economics to neuroimaging studies. Recently, there has been an increasing interest in extending network regularization to the spatio-temporal case to accommodate the evolution of networks. H…
▽ More
Network regularization is an effective tool for incorporating structural prior knowledge to learn coherent models over networks, and has yielded provably accurate estimates in applications ranging from spatial economics to neuroimaging studies. Recently, there has been an increasing interest in extending network regularization to the spatio-temporal case to accommodate the evolution of networks. However, in both static and spatio-temporal cases, missing or corrupted edge weights can compromise the ability of network regularization to discover desired solutions. To address these gaps, we propose a novel approach---{\it discrepancy-aware network regularization} (DANR)---that is robust to inadequate regularizations and effectively captures model evolution and structural changes over spatio-temporal networks. We develop a distributed and scalable algorithm based on the alternating direction method of multipliers (ADMM) to solve the proposed problem with guaranteed convergence to global optimum solutions. Experimental results on both synthetic and real-world networks demonstrate that our approach achieves improved performance on various tasks, and enables interpretation of model changes in evolving networks.
△ Less
Submitted 30 May, 2020;
originally announced June 2020.
-
Roles of Dynamic State Estimation in Power System Modeling, Monitoring and Operation
Authors:
Junbo Zhao,
Marcos Netto,
Zhenyu Huang,
Samson Shenglong Yu,
Antonio Gomez-Exposito,
Shaobu Wang,
Innocent Kamwa,
Shahrokh Akhlaghi,
Lamine Mili,
Vladimir Terzija,
A. P. Sakis Meliopoulos,
Bikash Pal,
Abhinav Kumar Singh,
Ali Abur,
Tianshu Bi,
Alireza Rouhani
Abstract:
Power system dynamic state estimation (DSE) remains an active research area. This is driven by the absence of accurate models, the increasing availability of fast-sampled, time-synchronized measurements, and the advances in the capability, scalability, and affordability of computing and communications. This paper discusses the advantages of DSE as compared to static state estimation, and the imple…
▽ More
Power system dynamic state estimation (DSE) remains an active research area. This is driven by the absence of accurate models, the increasing availability of fast-sampled, time-synchronized measurements, and the advances in the capability, scalability, and affordability of computing and communications. This paper discusses the advantages of DSE as compared to static state estimation, and the implementation differences between the two, including the measurement configuration, modeling framework and support software features. The important roles of DSE are discussed from modeling, monitoring and operation aspects for today's synchronous machine dominated systems and the future power electronics-interfaced generation systems. Several examples are presented to demonstrate the benefits of DSE on enhancing the operational robustness and resilience of 21st century power system through time critical applications. Future research directions are identified and discussed, paving the way for develo** the next generation of energy management systems.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
On Learning Combinatorial Patterns to Assist Large-Scale Airline Crew Pairing Optimization
Authors:
Divyam Aggarwal,
Yash Kumar Singh,
Dhish Kumar Saxena
Abstract:
Airline Crew Pairing Optimization (CPO) aims at generating a set of legal flight sequences (crew pairings), to cover an airline's flight schedule, at minimum cost. It is usually performed using Column Generation (CG), a mathematical programming technique for guided search-space exploration. CG exploits the interdependencies between the current and the preceding CG-iteration for generating new vari…
▽ More
Airline Crew Pairing Optimization (CPO) aims at generating a set of legal flight sequences (crew pairings), to cover an airline's flight schedule, at minimum cost. It is usually performed using Column Generation (CG), a mathematical programming technique for guided search-space exploration. CG exploits the interdependencies between the current and the preceding CG-iteration for generating new variables (pairings) during the optimization-search. However, with the unprecedented scale and complexity of the emergent flight networks, it has become imperative to learn higher-order interdependencies among the flight-connection graphs, and utilize those to enhance the efficacy of the CPO. In first of its kind and what marks a significant departure from the state-of-the-art, this paper proposes a novel adaptation of the Variational Graph Auto-Encoder for learning plausible combinatorial patterns among the flight-connection data obtained through the search-space exploration by an Airline Crew Pairing Optimizer, AirCROP (developed by the authors and validated by the research consortium's industrial sponsor, GE Aviation). The resulting flight-connection predictions are combined on-the-fly using a novel heuristic to generate new pairings for the optimizer. The utility of the proposed approach is demonstrated on large-scale (over 4200 flights), real-world, complex flight-networks of US-based airlines, characterized by multiple hub-and-spoke subnetworks and several crew bases.
△ Less
Submitted 2 May, 2020; v1 submitted 28 April, 2020;
originally announced April 2020.
-
No-Regret Prediction in Marginally Stable Systems
Authors:
Udaya Ghai,
Holden Lee,
Karan Singh,
Cyril Zhang,
Yi Zhang
Abstract:
We consider the problem of online prediction in a marginally stable linear dynamical system subject to bounded adversarial or (non-isotropic) stochastic perturbations. This poses two challenges. Firstly, the system is in general unidentifiable, so recent and classical results on parameter recovery do not apply. Secondly, because we allow the system to be marginally stable, the state can grow polyn…
▽ More
We consider the problem of online prediction in a marginally stable linear dynamical system subject to bounded adversarial or (non-isotropic) stochastic perturbations. This poses two challenges. Firstly, the system is in general unidentifiable, so recent and classical results on parameter recovery do not apply. Secondly, because we allow the system to be marginally stable, the state can grow polynomially with time; this causes standard regret bounds in online convex optimization to be vacuous. In spite of these challenges, we show that the online least-squares algorithm achieves sublinear regret (improvable to polylogarithmic in the stochastic setting), with polynomial dependence on the system's parameters. This requires a refined regret analysis, including a structural lemma showing the current state of the system to be a small linear combination of past states, even if the state grows polynomially. By applying our techniques to learning an autoregressive filter, we also achieve logarithmic regret in the partially observed setting under Gaussian noise, with polynomial dependence on the memory of the associated Kalman filter.
△ Less
Submitted 23 June, 2020; v1 submitted 5 February, 2020;
originally announced February 2020.
-
Improper Learning for Non-Stochastic Control
Authors:
Max Simchowitz,
Karan Singh,
Elad Hazan
Abstract:
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control. We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller whic…
▽ More
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control. We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies. In the fully-adversarial setting, our controller attains an optimal regret bound of $\sqrt{T}$-when the system is known, and, when combined with an initial stage of least-squares estimation, $T^{2/3}$ when the system is unknown; both yield the first sublinear regret for the partially observed setting.
Our bounds are the first in the non-stochastic control setting that compete with \emph{all} stabilizing linear dynamical controllers, not just state feedback. Moreover, in the presence of semi-adversarial noise containing both stochastic and adversarial components, our controller attains the optimal regret bounds of $\mathrm{poly}(\log T)$ when the system is known, and $\sqrt{T}$ when unknown. To our knowledge, this gives the first end-to-end $\sqrt{T}$ regret for online Linear Quadratic Gaussian controller, and applies in a more general setting with adversarial losses and semi-adversarial noise.
△ Less
Submitted 24 June, 2020; v1 submitted 24 January, 2020;
originally announced January 2020.
-
Federated Learning with Personalization Layers
Authors:
Manoj Ghuhan Arivazhagan,
Vinay Aggarwal,
Aaditya Kumar Singh,
Sunav Choudhary
Abstract:
The emerging paradigm of federated learning strives to enable collaborative training of machine learning models on the network edge without centrally aggregating raw data and hence, improving data privacy. This sharply deviates from traditional machine learning and necessitates the design of algorithms robust to various sources of heterogeneity. Specifically, statistical heterogeneity of data acro…
▽ More
The emerging paradigm of federated learning strives to enable collaborative training of machine learning models on the network edge without centrally aggregating raw data and hence, improving data privacy. This sharply deviates from traditional machine learning and necessitates the design of algorithms robust to various sources of heterogeneity. Specifically, statistical heterogeneity of data across user devices can severely degrade the performance of standard federated averaging for traditional machine learning applications like personalization with deep learning. This paper pro-posesFedPer, a base + personalization layer approach for federated training of deep feedforward neural networks, which can combat the ill-effects of statistical heterogeneity. We demonstrate effectiveness ofFedPerfor non-identical data partitions ofCIFARdatasetsand on a personalized image aesthetics dataset from Flickr.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
The Nonstochastic Control Problem
Authors:
Elad Hazan,
Sham M. Kakade,
Karan Singh
Abstract:
We consider the problem of controlling an unknown linear dynamical system in the presence of (nonstochastic) adversarial perturbations and adversarial convex loss functions. In contrast to classical control, the a priori determination of an optimal controller here is hindered by the latter's dependence on the yet unknown perturbations and costs. Instead, we measure regret against an optimal linear…
▽ More
We consider the problem of controlling an unknown linear dynamical system in the presence of (nonstochastic) adversarial perturbations and adversarial convex loss functions. In contrast to classical control, the a priori determination of an optimal controller here is hindered by the latter's dependence on the yet unknown perturbations and costs. Instead, we measure regret against an optimal linear policy in hindsight, and give the first efficient algorithm that guarantees a sublinear regret bound, scaling as T^{2/3}, in this setting.
△ Less
Submitted 20 January, 2020; v1 submitted 27 November, 2019;
originally announced November 2019.
-
Estimating localized complexity of white-matter wiring with GANs
Authors:
Haraldur T. Hallgrimsson,
Richika Sharan,
Scott T. Grafton,
Ambuj K. Singh
Abstract:
In-vivo examination of the physical connectivity of axonal projections through the white matter of the human brain is made possible by diffusion weighted magnetic resonance imaging (dMRI) Analysis of dMRI commonly considers derived scalar metrics such as fractional anisotrophy as proxies for "white matter integrity," and differences of such measures have been observed as significantly correlating…
▽ More
In-vivo examination of the physical connectivity of axonal projections through the white matter of the human brain is made possible by diffusion weighted magnetic resonance imaging (dMRI) Analysis of dMRI commonly considers derived scalar metrics such as fractional anisotrophy as proxies for "white matter integrity," and differences of such measures have been observed as significantly correlating with various neurological diagnosis and clinical measures such as executive function, presence of multiple sclerosis, and genetic similarity. The analysis of such voxel measures is confounded in areas of more complicated fiber wiring due to crossing, kissing, and dispersing fibers. Recently, Volz et al. introduced a simple probabilistic measure of the count of distinct fiber populations within a voxel, which was shown to reduce variance in group comparisons. We propose a complementary measure that considers the complexity of a voxel in context of its local region, with an aim to quantify the localized wiring complexity of every part of white matter. This allows, for example, identification of particularly ambiguous regions of the brain for tractographic approaches of modeling global wiring connectivity. Our method builds on recent advances in image inpainting, in which the task is to plausibly fill in a missing region of an image. Our proposed method builds on a Bayesian estimate of heteroscedastic aleatoric uncertainty of a region of white matter by inpainting it from its context. We define the localized wiring complexity of white matter as how accurately and confidently a well-trained model can predict the missing patch. In our results, we observe low aleatoric uncertainty along major neuronal pathways which increases at junctions and towards cortex boundaries. This directly quantifies the difficulty of lesion inpainting of dMRI images at all parts of white matter.
△ Less
Submitted 30 November, 2019; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data
Authors:
Utkarsh Ojha,
Krishna Kumar Singh,
Cho-Jui Hsieh,
Yong Jae Lee
Abstract:
We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data. We first investigate the issues surrounding the assumptions about uniformity made by InfoGAN, and demonstrate its ineffectiveness to properly disentangle object identity in imbalanced data. Our key idea is to make the discovery of the discrete latent fa…
▽ More
We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data. We first investigate the issues surrounding the assumptions about uniformity made by InfoGAN, and demonstrate its ineffectiveness to properly disentangle object identity in imbalanced data. Our key idea is to make the discovery of the discrete latent factor of variation invariant to identity-preserving transformations in real images, and use that as a signal to learn the appropriate latent distribution representing object identity. Experiments on both artificial (MNIST, 3D cars, 3D chairs, ShapeNet) and real-world (YouTube-Faces) imbalanced datasets demonstrate the effectiveness of our method in disentangling object identity as a latent factor of variation.
△ Less
Submitted 30 October, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
Effects of green revolution led agricultural expansion on net ecosystem service values in India
Authors:
Srikanta Sannigrahi,
Suman Chakraborti,
Pawan Kumar Joshi,
Saskia Keesstra,
P. S. Roy,
Paul. C. Sutton,
Urs Kreuter,
Saikat Kumar Paul,
Somnath Sen,
Sandeep Bhatt,
Shahid Rahmat,
Shouvik Jha,
Qi Zhang,
Laishram Kanta Singh
Abstract:
Ecosystem Services are a bundle of natural processes and functions that are essential for human well-being, subsistence, and livelihood. The expansion of cultivation and cropland, which is the backbone of the Indian economy, is one of the main drivers of rapid Land Use Land Cover changes in India. To assess the impact of the Green Revolution led agrarian expansion on the total ecosystem service va…
▽ More
Ecosystem Services are a bundle of natural processes and functions that are essential for human well-being, subsistence, and livelihood. The expansion of cultivation and cropland, which is the backbone of the Indian economy, is one of the main drivers of rapid Land Use Land Cover changes in India. To assess the impact of the Green Revolution led agrarian expansion on the total ecosystem service values, we first estimated the ESVs from 1985 to 2005 for eight ecoregions in India using several value transfer approaches. Five explanatory factors such as Total Crop Area, Crop Production, Crop Yield, Net Irrigated Area, and Crop** Intensity representing the crop** scenarios in the country were used in constructing local Geographical Weighted Regression model to explore the cumulative and individual effects on ESVs. A Multi-Layer Perceptron based Artificial Neural Network algorithm was employed to estimate the normalized importance of these explanatory factors. During the observation periods, cropland, forestland, and water bodies have contributed the most and form a significant proportion of ESVs, followed by grassland, mangrove, wetland, and urban builtup. In all three years, among the nine ESs, the highest ESV accounts for water regulation, followed by soil formation and soilwater retention, biodiversity maintenance, waste treatment, climate regulation, and gas regulation. Among the five explanatory factors, TCA, NIA, CP showed a strong positive association with ESVs, while the CI exhibited a negative association. The study reveals a strong association between GR led agricultural expansion and ESVs in India.
△ Less
Submitted 15 November, 2020; v1 submitted 24 September, 2019;
originally announced September 2019.
-
Logarithmic Regret for Online Control
Authors:
Naman Agarwal,
Elad Hazan,
Karan Singh
Abstract:
We study optimal regret bounds for control in linear dynamical systems under adversarially changing strongly convex cost functions, given the knowledge of transition dynamics. This includes several well studied and fundamental frameworks such as the Kalman filter and the linear quadratic regulator. State of the art methods achieve regret which scales as $O(\sqrt{T})$, where $T$ is the time horizon…
▽ More
We study optimal regret bounds for control in linear dynamical systems under adversarially changing strongly convex cost functions, given the knowledge of transition dynamics. This includes several well studied and fundamental frameworks such as the Kalman filter and the linear quadratic regulator. State of the art methods achieve regret which scales as $O(\sqrt{T})$, where $T$ is the time horizon.
We show that the optimal regret in this setting can be significantly smaller, scaling as $O(\text{poly}(\log T))$. This regret bound is achieved by two different efficient iterative methods, online gradient descent and online natural gradient.
△ Less
Submitted 11 September, 2019;
originally announced September 2019.
-
Personalizing Smartwatch Based Activity Recognition Using Transfer Learning
Authors:
Karanpreet Singh,
Rajen Bhatt
Abstract:
Smartwatches are increasingly being used to recognize human daily life activities. These devices may employ different kind of machine learning (ML) solutions. One of such ML models is Gradient Boosting Machine (GBM) which has shown an excellent performance in the literature. The GBM can be trained on available data set before it is deployed on any device. However, this data set may not represent e…
▽ More
Smartwatches are increasingly being used to recognize human daily life activities. These devices may employ different kind of machine learning (ML) solutions. One of such ML models is Gradient Boosting Machine (GBM) which has shown an excellent performance in the literature. The GBM can be trained on available data set before it is deployed on any device. However, this data set may not represent every kind of human behavior in real life. For example, a ML model to detect elder and young persons running activity may give different results because of differences in their activity patterns. This may result in decrease in the accuracy of activity recognition. Therefore, a transfer learning based method is proposed in which user-specific performance can be improved significantly by doing on-device calibration of GBM by just tuning its parameters without retraining its estimators. Results show that this method can significantly improve the user-based accuracy for activity recognition.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Deep Learning in the Automotive Industry: Recent Advances and Application Examples
Authors:
Kanwar Bharat Singh,
Mustafa Ali Arat
Abstract:
One of the most exciting technology breakthroughs in the last few years has been the rise of deep learning. State-of-the-art deep learning models are being widely deployed in academia and industry, across a variety of areas, from image analysis to natural language processing. These models have grown from fledgling research subjects to mature techniques in real-world use. The increasing scale of da…
▽ More
One of the most exciting technology breakthroughs in the last few years has been the rise of deep learning. State-of-the-art deep learning models are being widely deployed in academia and industry, across a variety of areas, from image analysis to natural language processing. These models have grown from fledgling research subjects to mature techniques in real-world use. The increasing scale of data, computational power and the associated algorithmic innovations are the main drivers for the progress we see in this field. These developments also have a huge potential for the automotive industry and therefore the interest in deep learning-based technology is growing. A lot of the product innovations, such as self-driving cars, parking and lane-change assist or safety functions, such as autonomous emergency braking, are powered by deep learning algorithms. Deep learning is poised to offer gains in performance and functionality for most ADAS (Advanced Driver Assistance System) solutions. Virtual sensing for vehicle dynamics application, vehicle inspection/heath monitoring, automated driving and data-driven product development are key areas that are expected to get the most attention. This article provides an overview of the recent advances and some associated challenges in deep learning techniques in the context of automotive applications.
△ Less
Submitted 24 June, 2019; v1 submitted 20 June, 2019;
originally announced June 2019.
-
Submodular Batch Selection for Training Deep Neural Networks
Authors:
K J Joseph,
Vamshi Teja R,
Krishnakant Singh,
Vineeth N Balasubramanian
Abstract:
Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to…
▽ More
Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to this NP-hard combinatorial optimization problem. Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Online Control with Adversarial Disturbances
Authors:
Naman Agarwal,
Brian Bullins,
Elad Hazan,
Sham M. Kakade,
Karan Singh
Abstract:
We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this pro…
▽ More
We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this problem. From a technical standpoint, this work generalizes upon previous work in two main aspects: our model allows for adversarial noise in the dynamics, and allows for general convex costs.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.
-
Multi-task Learning for Target-dependent Sentiment Classification
Authors:
Divam Gupta,
Kushagra Singh,
Soumen Chakrabarti,
Tanmoy Chakraborty
Abstract:
Detecting and aggregating sentiments toward people, organizations, and events expressed in unstructured social media have become critical text mining operations. Early systems detected sentiments over whole passages, whereas more recently, target-specific sentiments have been of greater interest. In this paper, we present MTTDSC, a multi-task target-dependent sentiment classification system that i…
▽ More
Detecting and aggregating sentiments toward people, organizations, and events expressed in unstructured social media have become critical text mining operations. Early systems detected sentiments over whole passages, whereas more recently, target-specific sentiments have been of greater interest. In this paper, we present MTTDSC, a multi-task target-dependent sentiment classification system that is informed by feature representation learnt for the related auxiliary task of passage-level sentiment classification. The auxiliary task uses a gated recurrent unit (GRU) and pools GRU states, followed by an auxiliary fully-connected layer that outputs passage-level predictions. In the main task, these GRUs contribute auxiliary per-token representations over and above word embeddings. The main task has its own, separate GRUs. The auxiliary and main GRUs send their states to a different fully connected layer, trained for the main task. Extensive experiments using two auxiliary datasets and three benchmark datasets (of which one is new, introduced by us) for the main task demonstrate that MTTDSC outperforms state-of-the-art baselines. Using word-level sensitivity analysis, we present anecdotal evidence that prior systems can make incorrect target-specific predictions because they miss sentiments expressed by words independent of target.
△ Less
Submitted 7 February, 2019;
originally announced February 2019.
-
Provably Efficient Maximum Entropy Exploration
Authors:
Elad Hazan,
Sham M. Kakade,
Karan Singh,
Abby Van Soest
Abstract:
Suppose an agent is in a (possibly unknown) Markov Decision Process in the absence of a reward signal, what might we hope that an agent can efficiently learn to do? This work studies a broad class of objectives that are defined solely as functions of the state-visitation frequencies that are induced by how the agent behaves. For example, one natural, intrinsically defined, objective problem is for…
▽ More
Suppose an agent is in a (possibly unknown) Markov Decision Process in the absence of a reward signal, what might we hope that an agent can efficiently learn to do? This work studies a broad class of objectives that are defined solely as functions of the state-visitation frequencies that are induced by how the agent behaves. For example, one natural, intrinsically defined, objective problem is for the agent to learn a policy which induces a distribution over state space that is as uniform as possible, which can be measured in an entropic sense. We provide an efficient algorithm to optimize such such intrinsically defined objectives, when given access to a black box planning oracle (which is robust to function approximation). Furthermore, when restricted to the tabular setting where we have sample based access to the MDP, our proposed algorithm is provably efficient, both in terms of its sample and computational complexities. Key to our algorithmic methodology is utilizing the conditional gradient method (a.k.a. the Frank-Wolfe algorithm) which utilizes an approximate MDP solver.
△ Less
Submitted 25 January, 2019; v1 submitted 6 December, 2018;
originally announced December 2018.
-
Power Maxwell distribution: Statistical Properties, Estimation and Application
Authors:
Abhimanyu Singh Yadav,
Hassan S. Bakouch,
Sanjay Kumar Singh,
Umesh Singh
Abstract:
In this article, we proposed a new probability distribution named as power Maxwell distribution (PMaD). It is another extension of Maxwell distribution (MaD) which would lead more flexibility to analyze the data with non-monotone failure rate. Different statistical properties such as reliability characteristics, moments, quantiles, mean deviation, generating function, conditional moments, stochast…
▽ More
In this article, we proposed a new probability distribution named as power Maxwell distribution (PMaD). It is another extension of Maxwell distribution (MaD) which would lead more flexibility to analyze the data with non-monotone failure rate. Different statistical properties such as reliability characteristics, moments, quantiles, mean deviation, generating function, conditional moments, stochastic ordering, residual lifetime function and various entropy measures have been derived. The estimation of the parameters for the proposed probability distribution has been addressed by maximum likelihood estimation method and Bayes estimation method. The Bayes estimates are obtained under gamma prior using squared error loss function. Lastly, real-life application for the proposed distribution has been illustrated through different lifetime data.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
Understanding Fashionability: What drives sales of a style?
Authors:
Aniket Jain,
Yadunath Gupta,
Pawan Kumar Singh,
Aruna Rajan
Abstract:
We use customer demand data for fashion articles on Myntra, and derive a fashionability or style quotient, which represents customer demand for the stylistic content of a fashion article, decoupled with its commercials (price, offers, etc.). We demonstrate learning for assortment planning in fashion that would aim to keep a healthy mix of breadth and depth across various styles, and we show the re…
▽ More
We use customer demand data for fashion articles on Myntra, and derive a fashionability or style quotient, which represents customer demand for the stylistic content of a fashion article, decoupled with its commercials (price, offers, etc.). We demonstrate learning for assortment planning in fashion that would aim to keep a healthy mix of breadth and depth across various styles, and we show the relationship between a customer's perception of a style vs a merchandiser's catalogue of styles. We also backtest our method to calculate prediction errors in our style quotient and customer demand, and discuss various implications and findings.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Efficient Full-Matrix Adaptive Regularization
Authors:
Naman Agarwal,
Brian Bullins,
Xinyi Chen,
Elad Hazan,
Karan Singh,
Cyril Zhang,
Yi Zhang
Abstract:
Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularizati…
▽ More
Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.
△ Less
Submitted 17 November, 2020; v1 submitted 7 June, 2018;
originally announced June 2018.
-
Spatial Coherence of Oriented White Matter Microstructure: Applications to White Matter Regions Associated with Genetic Similarity
Authors:
Haraldur T. Hallgrímsson,
Matthew Cieslak,
Luca Foschini,
Scott T. Grafton,
Ambuj K. Singh
Abstract:
We present a method to discover differences between populations with respect to the spatial coherence of their oriented white matter microstructure in arbitrarily shaped white matter regions. This method is applied to diffusion MRI scans of a subset of the Human Connectome Project dataset: 57 pairs of monozygotic and 52 pairs of dizygotic twins. After controlling for morphological similarity betwe…
▽ More
We present a method to discover differences between populations with respect to the spatial coherence of their oriented white matter microstructure in arbitrarily shaped white matter regions. This method is applied to diffusion MRI scans of a subset of the Human Connectome Project dataset: 57 pairs of monozygotic and 52 pairs of dizygotic twins. After controlling for morphological similarity between twins, we identify 3.7% of all white matter as being associated with genetic similarity (35.1k voxels, $p < 10^{-4}$, false discovery rate 1.5%), 75% of which spatially clusters into twenty-two contiguous white matter regions. Furthermore, we show that the orientation similarity within these regions generalizes to a subset of 47 pairs of non-twin siblings, and show that these siblings are on average as similar as dizygotic twins. The regions are located in deep white matter including the superior longitudinal fasciculus, the optic radiations, the middle cerebellar peduncle, the corticospinal tract, and within the anterior temporal lobe, as well as the cerebellum, brain stem, and amygdalae.
These results extend previous work using undirected fractional anisotrophy for measuring putative heritable influences in white matter. Our multidirectional extension better accounts for crossing fiber connections within voxels. This bottom up approach has at its basis a novel measurement of coherence within neighboring voxel dyads between subjects, and avoids some of the fundamental ambiguities encountered with tractographic approaches to white matter analysis that estimate global connectivity.
△ Less
Submitted 14 February, 2018;
originally announced February 2018.
-
Spectral Filtering for General Linear Dynamical Systems
Authors:
Elad Hazan,
Holden Lee,
Karan Singh,
Cyril Zhang,
Yi Zhang
Abstract:
We give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system's transition matrix. The algorithm extends the recently introduced technique of spectral filtering, previously applied only to systems with a symmetric transition matrix, using a novel convex relaxation to allow for the e…
▽ More
We give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system's transition matrix. The algorithm extends the recently introduced technique of spectral filtering, previously applied only to systems with a symmetric transition matrix, using a novel convex relaxation to allow for the efficient identification of phases.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Learning Linear Dynamical Systems via Spectral Filtering
Authors:
Elad Hazan,
Karan Singh,
Cyril Zhang
Abstract:
We present an efficient and practical algorithm for the online prediction of discrete-time linear dynamical systems with a symmetric transition matrix. We circumvent the non-convex optimization problem using improper learning: carefully overparameterize the class of LDSs by a polylogarithmic factor, in exchange for convexity of the loss functions. From this arises a polynomial-time algorithm with…
▽ More
We present an efficient and practical algorithm for the online prediction of discrete-time linear dynamical systems with a symmetric transition matrix. We circumvent the non-convex optimization problem using improper learning: carefully overparameterize the class of LDSs by a polylogarithmic factor, in exchange for convexity of the loss functions. From this arises a polynomial-time algorithm with a near-optimal regret guarantee, with an analogous sample complexity bound for agnostic learning. Our algorithm is based on a novel filtering technique, which may be of independent interest: we convolve the time series with the eigenvectors of a certain Hankel matrix.
△ Less
Submitted 2 November, 2017;
originally announced November 2017.
-
Interpretable Deep Learning applied to Plant Stress Phenoty**
Authors:
Sambuddha Ghosal,
David Blystone,
Asheesh K. Singh,
Baskar Ganapathysubramanian,
Arti Singh,
Soumik Sarkar
Abstract:
Availability of an explainable deep learning model that can be applied to practical real world scenarios and in turn, can consistently, rapidly and accurately identify specific and minute traits in applicable fields of biological sciences, is scarce. Here we consider one such real world example viz., accurate identification, classification and quantification of biotic and abiotic stresses in crop…
▽ More
Availability of an explainable deep learning model that can be applied to practical real world scenarios and in turn, can consistently, rapidly and accurately identify specific and minute traits in applicable fields of biological sciences, is scarce. Here we consider one such real world example viz., accurate identification, classification and quantification of biotic and abiotic stresses in crop research and production. Up until now, this has been predominantly done manually by visual inspection and require specialized training. However, such techniques are hindered by subjectivity resulting from inter- and intra-rater cognitive variability. Here, we demonstrate the ability of a machine learning framework to identify and classify a diverse set of foliar stresses in the soybean plant with remarkable accuracy. We also present an explanation mechanism using gradient-weighted class activation map** that isolates the visual symptoms used by the model to make predictions. This unsupervised identification of unique visual symptoms for each stress provides a quantitative measure of stress severity, allowing for identification, classification and quantification in one framework. The learnt model appears to be agnostic to species and make good predictions for other (non-soybean) species, demonstrating an ability of transfer learning.
△ Less
Submitted 28 October, 2017; v1 submitted 24 October, 2017;
originally announced October 2017.
-
Comparative Benchmarking of Causal Discovery Techniques
Authors:
Karamjit Singh,
Garima Gupta,
Vartika Tewari,
Gautam Shroff
Abstract:
In this paper we present a comprehensive view of prominent causal discovery algorithms, categorized into two main categories (1) assuming acyclic and no latent variables, and (2) allowing both cycles and latent variables, along with experimental results comparing them from three perspectives: (a) structural accuracy, (b) standard predictive accuracy, and (c) accuracy of counterfactual inference. F…
▽ More
In this paper we present a comprehensive view of prominent causal discovery algorithms, categorized into two main categories (1) assuming acyclic and no latent variables, and (2) allowing both cycles and latent variables, along with experimental results comparing them from three perspectives: (a) structural accuracy, (b) standard predictive accuracy, and (c) accuracy of counterfactual inference. For (b) and (c) we train causal Bayesian networks with structures as predicted by each causal discovery technique to carry out counterfactual or standard predictive inference. We compare causal algorithms on two pub- licly available and one simulated datasets having different sample sizes: small, medium and large. Experiments show that structural accuracy of a technique does not necessarily correlate with higher accuracy of inferencing tasks. Fur- ther, surveyed structure learning algorithms do not perform well in terms of structural accuracy in case of datasets having large number of variables.
△ Less
Submitted 12 September, 2017; v1 submitted 18 August, 2017;
originally announced August 2017.
-
Efficient Regret Minimization in Non-Convex Games
Authors:
Elad Hazan,
Karan Singh,
Cyril Zhang
Abstract:
We consider regret minimization in repeated games with non-convex loss functions. Minimizing the standard notion of regret is computationally intractable. Thus, we define a natural notion of regret which permits efficient optimization and generalizes offline guarantees for convergence to an approximate local optimum. We give gradient-based methods that achieve optimal regret, which in turn guarant…
▽ More
We consider regret minimization in repeated games with non-convex loss functions. Minimizing the standard notion of regret is computationally intractable. Thus, we define a natural notion of regret which permits efficient optimization and generalizes offline guarantees for convergence to an approximate local optimum. We give gradient-based methods that achieve optimal regret, which in turn guarantee convergence to equilibrium in this framework.
△ Less
Submitted 31 July, 2017;
originally announced August 2017.
-
Dynamic Task Allocation for Crowdsourcing Settings
Authors:
Angela Zhou,
Irineo Cabreros,
Karan Singh
Abstract:
We consider the problem of optimal budget allocation for crowdsourcing problems, allocating users to tasks to maximize our final confidence in the crowdsourced answers. Such an optimized worker assignment method allows us to boost the efficacy of any popular crowdsourcing estimation algorithm. We consider a mutual information interpretation of the crowdsourcing problem, which leads to a stochastic…
▽ More
We consider the problem of optimal budget allocation for crowdsourcing problems, allocating users to tasks to maximize our final confidence in the crowdsourced answers. Such an optimized worker assignment method allows us to boost the efficacy of any popular crowdsourcing estimation algorithm. We consider a mutual information interpretation of the crowdsourcing problem, which leads to a stochastic subset selection problem with a submodular objective function. We present experimental simulation results which demonstrate the effectiveness of our dynamic task allocation method for achieving higher accuracy, possibly requiring fewer labels, as well as improving upon a previous method which is sensitive to the proportion of users to questions.
△ Less
Submitted 25 February, 2017; v1 submitted 30 January, 2017;
originally announced January 2017.
-
The Price of Differential Privacy For Online Learning
Authors:
Naman Agarwal,
Karan Singh
Abstract:
We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal $\tilde{O}(\sqrt{T})$ regret bounds. In the full-information setting, our results demonstrate that $ε$-differential privacy may be ensured for free -- in particular, the regret bounds scale as $O(\sqrt{T})+\tilde{O}\left(\frac{1}ε\right)$. For bandit li…
▽ More
We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal $\tilde{O}(\sqrt{T})$ regret bounds. In the full-information setting, our results demonstrate that $ε$-differential privacy may be ensured for free -- in particular, the regret bounds scale as $O(\sqrt{T})+\tilde{O}\left(\frac{1}ε\right)$. For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of $\tilde{O}\left(\frac{1}ε\sqrt{T}\right)$, while the previously known best regret bound was $\tilde{O}\left(\frac{1}εT^{\frac{2}{3}}\right)$.
△ Less
Submitted 13 June, 2017; v1 submitted 27 January, 2017;
originally announced January 2017.
-
An improved estimator for population mean using auxiliary information in stratified random sampling
Authors:
Sachin Malik,
Viplav Kumar Singh,
Rajesh Singh
Abstract:
In the present study, we propose a new estimator for population mean of the study variable y in the case of stratified random sampling using the information based on auxiliary variable x. Expression for the mean squared error (MSE) of the proposed estimators is derived up to the first order of approximation. The theoretical conditions have also been verified by a numerical example. An empirical st…
▽ More
In the present study, we propose a new estimator for population mean of the study variable y in the case of stratified random sampling using the information based on auxiliary variable x. Expression for the mean squared error (MSE) of the proposed estimators is derived up to the first order of approximation. The theoretical conditions have also been verified by a numerical example. An empirical study is carried out to show the efficiency of the suggested estimator over sample mean estimator, usual separate ratio, separate product estimator and other proposed estimators.
△ Less
Submitted 3 July, 2014;
originally announced July 2014.
-
The inverse Lindley distribution: A stress-strength reliability model
Authors:
Vikas Kumar Sharma,
Sanjay Kumar Singh,
Umesh Singh,
Varun Agiwal
Abstract:
In this article, we proposed an inverse Lindley distribution and studied its fundamental properties such as quantiles, mode, stochastic ordering and entropy measure. The proposed distribution is observed to be a heavy-tailed distribution and has a upside-down bathtub shape for its failure rate. Further, we proposed its applicability as a stress-strength reliability model for survival data analysis…
▽ More
In this article, we proposed an inverse Lindley distribution and studied its fundamental properties such as quantiles, mode, stochastic ordering and entropy measure. The proposed distribution is observed to be a heavy-tailed distribution and has a upside-down bathtub shape for its failure rate. Further, we proposed its applicability as a stress-strength reliability model for survival data analysis. The estimation of stress-strength parameters and $R=P[X>Y]$, the stress-strength reliability has been approached by both classical and Bayesian paradigms. Under Bayesian set-up, non-informative (Jeffrey) and informative (gamma) priors are considered under a symmetric (squared error) and a asymmetric (entropy) loss functions, and a Lindley-approximation technique is used for Bayesian computation. The proposed estimators are compared in terms of their mean squared errors through a simulation study. Two real data sets representing survival of Head and Neck cancer patients are fitted using the inverse Lindley distribution and used to estimate the stress-strength parameters and reliability.
△ Less
Submitted 24 May, 2014;
originally announced May 2014.
-
Fast symmetric factorization of hierarchical matrices with applications
Authors:
Sivaram Ambikasaran,
Michael O'Neil,
Karan Raj Singh
Abstract:
We present a fast direct algorithm for computing symmetric factorizations, i.e. $A = WW^T$, of symmetric positive-definite hierarchical matrices with weak-admissibility conditions. The computational cost for the symmetric factorization scales as $\mathcal{O}(n \log^2 n)$ for hierarchically off-diagonal low-rank matrices. Once this factorization is obtained, the cost for inversion, application, and…
▽ More
We present a fast direct algorithm for computing symmetric factorizations, i.e. $A = WW^T$, of symmetric positive-definite hierarchical matrices with weak-admissibility conditions. The computational cost for the symmetric factorization scales as $\mathcal{O}(n \log^2 n)$ for hierarchically off-diagonal low-rank matrices. Once this factorization is obtained, the cost for inversion, application, and determinant computation scales as $\mathcal{O}(n \log n)$. In particular, this allows for the near optimal generation of correlated random variates in the case where $A$ is a covariance matrix. This symmetric factorization algorithm depends on two key ingredients. First, we present a novel symmetric factorization formula for low-rank updates to the identity of the form $I+UKU^T$. This factorization can be computed in $\mathcal{O}(n)$ time if the rank of the perturbation is sufficiently small. Second, combining this formula with a recursive divide-and-conquer strategy, near linear complexity symmetric factorizations for hierarchically structured matrices can be obtained. We present numerical results for matrices relevant to problems in probability \& statistics (Gaussian processes), interpolation (Radial basis functions), and Brownian dynamics calculations in fluid mechanics (the Rotne-Prager-Yamakawa tensor).
△ Less
Submitted 30 December, 2016; v1 submitted 1 May, 2014;
originally announced May 2014.
-
Block Outlier Methods for Malicious User Detection in Cooperative Spectrum Sensing
Authors:
Sanket S. Kalamkar,
Praveen Kumar Singh,
Adrish Banerjee
Abstract:
Block outlier detection methods, based on Tietjen-Moore (TM) and Shapiro-Wilk (SW) tests, are proposed to detect and suppress spectrum sensing data falsification (SSDF) attacks by malicious users in cooperative spectrum sensing. First, we consider basic and statistical SSDF attacks, where the malicious users attack independently. Then we propose a new SSDF attack, which involves cooperation among…
▽ More
Block outlier detection methods, based on Tietjen-Moore (TM) and Shapiro-Wilk (SW) tests, are proposed to detect and suppress spectrum sensing data falsification (SSDF) attacks by malicious users in cooperative spectrum sensing. First, we consider basic and statistical SSDF attacks, where the malicious users attack independently. Then we propose a new SSDF attack, which involves cooperation among malicious users by masking. In practice, the number of malicious users is unknown. Thus, it is necessary to estimate the number of malicious users, which is found using clustering and largest gap method. However, we show using Monte Carlo simulations that, these methods fail to estimate the exact number of malicious users when they cooperate. To overcome this, we propose a modified largest gap method.
△ Less
Submitted 17 February, 2014;
originally announced February 2014.
-
A Statistical Peek into Average Case Complexity
Authors:
Niraj Kumar Singh,
Soubhik Chakraborty,
Dheeresh Kumar Mallick
Abstract:
The present paper gives a statistical adventure towards exploring the average case complexity behavior of computer algorithms. Rather than following the traditional count based analytical (pen and paper) approach, we instead talk in terms of the weight based analysis that permits mixing of distinct operations into a conceptual bound called the statistical bound and its empirical estimate, the so c…
▽ More
The present paper gives a statistical adventure towards exploring the average case complexity behavior of computer algorithms. Rather than following the traditional count based analytical (pen and paper) approach, we instead talk in terms of the weight based analysis that permits mixing of distinct operations into a conceptual bound called the statistical bound and its empirical estimate, the so called "empirical O". Based on careful analysis of the results obtained, we have introduced two new conjectures in the domain of algorithmic analysis. The analytical way of average case analysis falls flat when it comes to a data model for which the expectation does not exist (e.g. Cauchy distribution for continuous input data and certain discrete distribution inputs as those studied in the paper). The empirical side of our approach, with a thrust in computer experiments and applied statistics in its paradigm, lends a hel** hand by complimenting and supplementing its theoretical counterpart. Computer science is or at least has aspects of an experimental science as well, and hence hopefully, our statistical findings will be equally recognized among theoretical scientists as well.
△ Less
Submitted 17 December, 2013;
originally announced December 2013.
-
An Improved Estimator In Systematic Sampling
Authors:
Rajesh Singh,
Sachin Malik,
Viplav K. Singh
Abstract:
In this paper we have adapted Bahl and Tuteja (1991) estimator in systematic sampling using auxiliary information. Using Bedi (1996) transformation an improved estimator is also proposed under systematic sampling. The expressions of bias and mean square error up to the first order of approximation are derived. It has been shown that the proposed estimator performs better than many existing estimat…
▽ More
In this paper we have adapted Bahl and Tuteja (1991) estimator in systematic sampling using auxiliary information. Using Bedi (1996) transformation an improved estimator is also proposed under systematic sampling. The expressions of bias and mean square error up to the first order of approximation are derived. It has been shown that the proposed estimator performs better than many existing estimators. A numerical example is included to support the theoretical results.
△ Less
Submitted 19 July, 2013;
originally announced July 2013.
-
Reverse Engineering Gene Interaction Networks Using the Phi-Mixing Coefficient
Authors:
Nitin Kumar Singh,
M. Eren Ahsen,
Shiva Mankala,
Hyun-Seok Kim,
Michael A. White,
M. Vidyasagar
Abstract:
Constructing gene interaction networks (GINs) from high-throughput gene expression data is an important and challenging problem in systems biology. Existing algorithms produce networks that either have undirected and unweighted edges, or else are constrained to contain no cycles, both of which are biologically unrealistic. In the present paper we propose a new algorithm, based on a concept from pr…
▽ More
Constructing gene interaction networks (GINs) from high-throughput gene expression data is an important and challenging problem in systems biology. Existing algorithms produce networks that either have undirected and unweighted edges, or else are constrained to contain no cycles, both of which are biologically unrealistic. In the present paper we propose a new algorithm, based on a concept from probability theory known as the phi-mixing coefficient, that produces networks whose edges are weighted and directed, and are permitted to contain cycles. Because there is no "ground truth" for genome-wide networks on a human scale, we analyzed the outcomes of several experiments on lung cancer, and matched the predictions from the inferred networks with experimental results. Specifically, we inferred three networks (NSCLC, Neuro-endocrine NSCLC plus SCLC, and normal) from the gene expression measurements of 157 lung cancer and 59 normal cell lines, compared with the outcomes of siRNA screening of 19,000+ genes on 11 NSCLC cell lines, and analyzed data from a ChIP-Seq experiment to determine putative downstream targets of the lineage specific oncogenic transcription factor ASCL1. The inferred networks displayed a scale-free or power law behavior between the degree of a node and the number of nodes with that degree. There was a strong correlation between the degree of a gene in the inferred NSCLC network and its essentiality for the survival of the cells. The inferred downstream neighborhood genes of ASCL1 in the SCLC network were significantly enriched by ChIP-Seq determined putative target genes, while no such enrichment was found in the inferred NSCLC network.
△ Less
Submitted 12 March, 2016; v1 submitted 20 August, 2012;
originally announced August 2012.
-
Discussion of "Is Bayes Posterior just Quick and Dirty Confidence?" by D. A. S. Fraser
Authors:
Kesar Singh,
Minge Xie
Abstract:
Discussion of "Is Bayes Posterior just Quick and Dirty Confidence?" by D. A. S. Fraser [arXiv:1112.5582].
Discussion of "Is Bayes Posterior just Quick and Dirty Confidence?" by D. A. S. Fraser [arXiv:1112.5582].
△ Less
Submitted 3 February, 2012;
originally announced February 2012.