Search | arXiv e-print repository

On Regularization and Inference with Label Constraints

Authors: Kaifu Wang, Hangfeng He, Tin D. Nguyen, Piyush Kumar, Dan Roth

Abstract: Prior knowledge and symbolic rules in machine learning are often expressed in the form of label constraints, especially in structured prediction problems. In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference, by quantifying their impact on model performance. For regularization, we sho… ▽ More Prior knowledge and symbolic rules in machine learning are often expressed in the form of label constraints, especially in structured prediction problems. In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference, by quantifying their impact on model performance. For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints. However, its preference for small violations introduces a bias toward a suboptimal model. For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage. Given these differences, we further explore the use of two approaches together and propose conditions for constrained inference to compensate for the bias introduced by regularization, aiming to improve both the model complexity and optimal risk. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2306.14817 [pdf, other]

doi 10.1007/JHEP10(2023)107

Black holes and the loss landscape in machine learning

Authors: Pranav Kumar, Taniya Mandal, Swapnamay Mondal

Abstract: Understanding the loss landscape is an important problem in machine learning. One key feature of the loss function, common to many neural network architectures, is the presence of exponentially many low lying local minima. Physical systems with similar energy landscapes may provide useful insights. In this work, we point out that black holes naturally give rise to such landscapes, owing to the exi… ▽ More Understanding the loss landscape is an important problem in machine learning. One key feature of the loss function, common to many neural network architectures, is the presence of exponentially many low lying local minima. Physical systems with similar energy landscapes may provide useful insights. In this work, we point out that black holes naturally give rise to such landscapes, owing to the existence of black hole entropy. For definiteness, we consider 1/8 BPS black holes in $\mathcal{N} = 8$ string theory. These provide an infinite family of potential landscapes arising in the microscopic descriptions of corresponding black holes. The counting of minima amounts to black hole microstate counting. Moreover, the exact numbers of the minima for these landscapes are a priori known from dualities in string theory. Some of the minima are connected by paths of low loss values, resembling mode connectivity. We estimate the number of runs needed to find all the solutions. Initial explorations suggest that Stochastic Gradient Descent can find a significant fraction of the minima. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 32 pages, 4 figures

Report number: DIAS-STP-23-11

Journal ref: JHEP10(2023)107

arXiv:2305.13991 [pdf, other]

Expressive Losses for Verified Robustness via Convex Combinations

Authors: Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Robert Stanforth, Alessio Lomuscio

Abstract: In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypot… ▽ More In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs. △ Less

Submitted 18 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: ICLR 2024

arXiv:2301.03847 [pdf]

Evaluating the Performance of Low-Cost PM2.5 Sensors in Mobile Settings

Authors: Priyanka deSouza, An Wang, Yuki Machida, Tiffany Duhl, Simone Mora, Prashant Kumar, Ralph Kahn, Carlo Ratti, John L. Durant, Neelakshi Hudda

Abstract: Low-cost sensors (LCS) for measuring air pollution are increasingly being deployed in mobile applications but questions concerning the quality of the measurements remain unanswered. For example, what is the best way to correct LCS data in a mobile setting? Which factors most significantly contribute to differences between mobile LCS data and higher-quality instruments? Can data from LCS be used to… ▽ More Low-cost sensors (LCS) for measuring air pollution are increasingly being deployed in mobile applications but questions concerning the quality of the measurements remain unanswered. For example, what is the best way to correct LCS data in a mobile setting? Which factors most significantly contribute to differences between mobile LCS data and higher-quality instruments? Can data from LCS be used to identify hotspots and generate generalizable pollutant concentration maps? To help address these questions we deployed low-cost PM2.5 sensors (Alphasense OPC-N3) and a research-grade instrument (TSI DustTrak) in a mobile laboratory in Boston, MA, USA. We first collocated these instruments with stationary PM2.5 reference monitors at nearby regulatory sites. Next, using the reference measurements, we developed different models to correct the OPC-N3 and DustTrak measurements, and then transferred the corrections to the mobile setting. We observed that more complex correction models appeared to perform better than simpler models in the stationary setting; however, when transferred to the mobile setting, corrected OPC-N3 measurements agreed less well with corrected DustTrak data. In general, corrections developed using minute-level collocation measurements transferred better to the mobile setting than corrections developed using hourly-averaged data. Mobile laboratory speed, OPC-N3 orientation relative to the direction of travel, date, hour-of-the-day, and road class together explain a small but significant amount of variation between corrected OPC-N3 and DustTrak measurements during the mobile deployment. Persistent hotspots identified by the OPC-N3s agreed with those identified by the DustTrak. Similarly, maps of PM2.5 distribution produced from the mobile corrected OPC-N3 and DustTrak measurements agreed well. △ Less

Submitted 10 January, 2023; originally announced January 2023.

Comments: 43 pages

arXiv:2206.14987 [pdf, other]

Lookback for Learning to Branch

Authors: Prateek Gupta, Elias B. Khalil, Didier Chetélat, Maxime Gasse, Yoshua Bengio, Andrea Lodi, M. Pawan Kumar

Abstract: The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound (B&B) solvers. These GNNs are trained, offline and on a collec… ▽ More The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound (B&B) solvers. These GNNs are trained, offline and on a collection of MILPs, to imitate a very good but computationally expensive branching heuristic, strong branching. Given that B&B results in a tree of sub-MILPs, we ask (a) whether there are strong dependencies exhibited by the target heuristic among the neighboring nodes of the B&B tree, and (b) if so, whether we can incorporate them in our training procedure. Specifically, we find that with the strong branching heuristic, a child node's best choice was often the parent's second-best choice. We call this the "lookback" phenomenon. Surprisingly, the typical branching GNN of Gasse et al. (2019) often misses this simple "answer". To imitate the target behavior more closely by incorporating the lookback phenomenon in GNNs, we propose two methods: (a) target smoothing for the standard cross-entropy loss function, and (b) adding a Parent-as-Target (PAT) Lookback regularizer term. Finally, we propose a model selection framework to incorporate harder-to-formulate objectives such as solving time in the final models. Through extensive experimentation on standard benchmark instances, we show that our proposal results in up to 22% decrease in the size of the B&B tree and up to 15% improvement in the solving times. △ Less

Submitted 29 December, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: Published in Transactions on Machine Learning Research (TMLR)

arXiv:2206.14772 [pdf, other]

IBP Regularization for Verified Adversarial Robustness via Branch-and-Bound

Authors: Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Robert Stanforth

Abstract: Recent works have tried to increase the verifiability of adversarially trained networks by running the attacks over domains larger than the original perturbations and adding various regularization terms to the objective. However, these algorithms either underperform or require complex and expensive stage-wise training procedures, hindering their practical applicability. We present IBP-R, a novel v… ▽ More Recent works have tried to increase the verifiability of adversarially trained networks by running the attacks over domains larger than the original perturbations and adding various regularization terms to the objective. However, these algorithms either underperform or require complex and expensive stage-wise training procedures, hindering their practical applicability. We present IBP-R, a novel verified training algorithm that is both simple and effective. IBP-R induces network verifiability by coupling adversarial attacks on enlarged domains with a regularization term, based on inexpensive interval bound propagation, that minimizes the gap between the non-convex verification problem and its approximations. By leveraging recent branch-and-bound frameworks, we show that IBP-R obtains state-of-the-art verified robustness-accuracy trade-offs for small perturbations on CIFAR-10 while training significantly faster than relevant previous work. Additionally, we present UPB, a novel branching strategy that, relying on a simple heuristic based on $β$-CROWN, reduces the cost of state-of-the-art branching algorithms while yielding splits of comparable quality. △ Less

Submitted 31 May, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: ICML 2022 Workshop on Formal Verification of Machine Learning

arXiv:2204.11418 [pdf, other]

Riemannian Hamiltonian methods for min-max optimization on manifolds

Authors: Andi Han, Bamdev Mishra, Pratik Jawanpuria, Pawan Kumar, Junbin Gao

Abstract: In this paper, we study min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak--Łojasiewicz condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is s… ▽ More In this paper, we study min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak--Łojasiewicz condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. For geodesic-bilinear optimization in particular, solving the proxy problem leads to the correct search direction towards global optimality, which becomes challenging with the min-max formulation. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analyses. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks. △ Less

Submitted 24 August, 2023; v1 submitted 24 April, 2022; originally announced April 2022.

Comments: Extended version with proofs

Journal ref: SIAM Journal on Optimization, 33(3), pp.1797-1827, 2023

arXiv:2109.12784 [pdf, other]

Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales

Authors: Tao Liu, P. R. Kumar, Ruida Zhou, Xi Liu

Abstract: Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful. Particularly important is the ability to incorporate domain knowledge of invariances, e.g., translational invariance of images. Kernels based on the \textit{maximum} similarity over a g… ▽ More Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful. Particularly important is the ability to incorporate domain knowledge of invariances, e.g., translational invariance of images. Kernels based on the \textit{maximum} similarity over a group of transformations are not generally positive definite. Perhaps it is for this reason that they have not been studied theoretically. We address this lacuna and show that positive definiteness indeed holds \textit{with high probability} for kernels based on the maximum similarity in the small training sample set regime of interest, and that they do yield the best results in that regime. We also show how additional properties such as their ability to incorporate local features at multiple spatial scales, e.g., as done in CNNs through max pooling, and to provide the benefits of composition through the architecture of multiple layers, can also be embedded into SVMs. We verify through experiments on widely available image sets that the resulting SVMs do provide superior accuracy in comparison to well-established deep neural network benchmarks for small sample sizes. △ Less

Submitted 22 October, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

Comments: Will appear in NeurIPS 2022

arXiv:2104.06718 [pdf, other]

Improved Branch and Bound for Neural Network Verification via Lagrangian Decomposition

Authors: Alessandro De Palma, Rudy Bunel, Alban Desmaison, Krishnamurthy Dvijotham, Pushmeet Kohli, Philip H. S. Torr, M. Pawan Kumar

Abstract: We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. First, we propose novel bounding algorithms based on Lagrangian Decomposition. Previous works have used off-the-shelf solvers to solve relaxations at each node of the BaB tree, or constructed weaker relaxations that can be solved efficiently, but lead to unnecessarily we… ▽ More We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. First, we propose novel bounding algorithms based on Lagrangian Decomposition. Previous works have used off-the-shelf solvers to solve relaxations at each node of the BaB tree, or constructed weaker relaxations that can be solved efficiently, but lead to unnecessarily weak bounds. Our formulation restricts the optimization to a subspace of the dual domain that is guaranteed to contain the optimum, resulting in accelerated convergence. Furthermore, it allows for a massively parallel implementation, which is amenable to GPU acceleration via modern deep learning frameworks. Second, we present a novel activation-based branching strategy. By coupling an inexpensive heuristic with fast dual bounding, our branching scheme greatly reduces the size of the BaB tree compared to previous heuristic methods. Moreover, it performs competitively with a recent strategy based on learning algorithms, without its large offline training cost. Finally, we design a BaB framework, named Branch and Dual Network Bound (BaDNB), based on our novel bounding and branching algorithms. We show that BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial robustness properties. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: Submitted for review to JMLR. This is an extended version of our paper in the UAI-20 conference (arXiv:2002.10410)

arXiv:2103.15569 [pdf, other]

Risk Bounds for Learning via Hilbert Coresets

Authors: Spencer Douglas, Piyush Kumar, R. K. Prasanth

Abstract: We develop a formalism for constructing stochastic upper bounds on the expected full sample risk for supervised classification tasks via the Hilbert coresets approach within a transductive framework. We explicitly compute tight and meaningful bounds for complex datasets and complex hypothesis classes such as state-of-the-art deep neural network architectures. The bounds we develop exhibit nice pro… ▽ More We develop a formalism for constructing stochastic upper bounds on the expected full sample risk for supervised classification tasks via the Hilbert coresets approach within a transductive framework. We explicitly compute tight and meaningful bounds for complex datasets and complex hypothesis classes such as state-of-the-art deep neural network architectures. The bounds we develop exhibit nice properties: i) the bounds are non-uniform in the hypothesis space, ii) in many practical examples, the bounds become effectively deterministic by appropriate choice of prior and training data-dependent posterior distributions on the hypothesis space, and iii) the bounds become significantly better with increase in the size of the training set. We also lay out some ideas to explore for future research. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Comments: 16 pages, 2 figures

ACM Class: F.2.1; F.2.3

arXiv:2010.04091 [pdf, ps, other]

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Authors: Yu-Heng Hung, **-Chun Hsieh, Xi Liu, P. R. Kumar

Abstract: Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems. We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with… ▽ More Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems. We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with the state-of-the-art benchmark methods in extensive experiments. The new policies achieve this with low computation time per pull for linear bandits, and thereby resulting in both favorable regret as well as computational efficiency. △ Less

Submitted 8 October, 2020; originally announced October 2020.

arXiv:2006.15212 [pdf, other]

Hybrid Models for Learning to Branch

Authors: Prateek Gupta, Maxime Gasse, Elias B. Khalil, M. Pawan Kumar, Andrea Lodi, Yoshua Bengio

Abstract: A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two… ▽ More A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two key questions. First, in a more realistic setting where only a CPU is available, is the GNN model still competitive? Second, can we devise an alternate computationally inexpensive model that retains the predictive power of the GNN architecture? We answer the first question in the negative, and address the second question by proposing a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching. We evaluate our methods on four classes of MILP problems, and show that they lead to up to 26% reduction in solver running time compared to state-of-the-art methods without a GPU, while extrapolating to harder problems than it was trained on. The code for this project is publicly available at https://github.com/pg2455/Hybrid-learn2branch. △ Less

Submitted 23 October, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

arXiv:2003.09596 [pdf, ps, other]

Learning in Networked Control Systems

Authors: Rahul Singh, P. R. Kumar

Abstract: We design adaptive controller (learning rule) for a networked control system (NCS) in which data packets containing control information are transmitted across a lossy wireless channel. We propose Upper Confidence Bounds for Networked Control Systems (UCB-NCS), a learning rule that maintains confidence intervals for the estimates of plant parameters $(A_{(\star)},B_{(\star)})$, and channel reliabil… ▽ More We design adaptive controller (learning rule) for a networked control system (NCS) in which data packets containing control information are transmitted across a lossy wireless channel. We propose Upper Confidence Bounds for Networked Control Systems (UCB-NCS), a learning rule that maintains confidence intervals for the estimates of plant parameters $(A_{(\star)},B_{(\star)})$, and channel reliability $p_{(\star)}$, and utilizes the principle of optimism in the face of uncertainty while making control decisions. We provide non-asymptotic performance guarantees for UCB-NCS by analyzing its "regret", i.e., performance gap from the scenario when $(A_{(\star)},B_{(\star)},p_{(\star)})$ are known to the controller. We show that with a high probability the regret can be upper-bounded as $\tilde{O}\left(C\sqrt{T}\right)$\footnote{Here $\tilde{O}$ hides logarithmic factors.}, where $T$ is the operating time horizon of the system, and $C$ is a problem dependent constant. △ Less

Submitted 21 March, 2020; originally announced March 2020.

Comments: Submitted to CDC and LCSS (http://ieee-cssletters.dei.unipd.it/index.php)

arXiv:2002.10410 [pdf, other]

Lagrangian Decomposition for Neural Network Verification

Authors: Rudy Bunel, Alessandro De Palma, Alban Desmaison, Krishnamurthy Dvijotham, Pushmeet Kohli, Philip H. S. Torr, M. Pawan Kumar

Abstract: A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. Previous methods have either used off-the-shelf solvers, discarding the problem structure, or relaxed the problem even further, making the bounds unnecessarily loose. We propose a novel approach based on Lagrangian Decomposition. Our formulation admits an efficient supergradien… ▽ More A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. Previous methods have either used off-the-shelf solvers, discarding the problem structure, or relaxed the problem even further, making the bounds unnecessarily loose. We propose a novel approach based on Lagrangian Decomposition. Our formulation admits an efficient supergradient ascent algorithm, as well as an improved proximal algorithm. Both the algorithms offer three advantages: (i) they yield bounds that are provably at least as tight as previous dual algorithms relying on Lagrangian relaxations; (ii) they are based on operations analogous to forward/backward pass of neural networks layers and are therefore easily parallelizable, amenable to GPU implementation and able to take advantage of the convolutional structure of problems; and (iii) they allow for anytime stop** while still providing valid bounds. Empirically, we show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time, and obtain tighter bounds in the same time as previous dual algorithms. This results in an overall speed-up when employing the bounds for formal verification. Code for our algorithms is available at https://github.com/oval-group/decomposition-plnn-bounds. △ Less

Submitted 17 June, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: UAI 2020 conference paper

arXiv:2001.07608 [pdf, other]

Analytic Properties of Trackable Weak Models

Authors: Mark Chilenski, George Cybenko, Isaac Dekine, Piyush Kumar, Gil Raz

Abstract: We present several new results on the feasibility of inferring the hidden states in strongly-connected trackable weak models. Here, a weak model is a directed graph in which each node is assigned a set of colors which may be emitted when that node is visited. A hypothesis is a node sequence which is consistent with a given color sequence. A weak model is said to be trackable if the worst case numb… ▽ More We present several new results on the feasibility of inferring the hidden states in strongly-connected trackable weak models. Here, a weak model is a directed graph in which each node is assigned a set of colors which may be emitted when that node is visited. A hypothesis is a node sequence which is consistent with a given color sequence. A weak model is said to be trackable if the worst case number of such hypotheses grows as a polynomial in the sequence length. We show that the number of hypotheses in strongly-connected trackable models is bounded by a constant and give an expression for this constant. We also consider the problem of reconstructing which branch was taken at a node with same-colored out-neighbors, and show that it is always eventually possible to identify which branch was taken if the model is strongly connected and trackable. We illustrate these properties by assigning transition probabilities and employing standard tools for analyzing Markov chains. In addition, we present new results for the entropy rates of weak models according to whether they are trackable or not. These theorems indicate that the combination of trackability and strong connectivity dramatically simplifies the task of reconstructing which nodes were visited. This work has implications for any problem which can be described in terms of an agent traversing a colored graph, such as the reconstruction of hidden states in a hidden Markov model (HMM). △ Less

Submitted 8 January, 2020; originally announced January 2020.

Comments: 10 pages, 9 figures

arXiv:2001.00056 [pdf, other]

Deep Attentive Ranking Networks for Learning to Order Sentences

Authors: Pawan Kumar, Dhanajit Brahma, Harish Karnick, Piyush Rai

Abstract: We present an attention-based ranking framework for learning to order sentences given a paragraph. Our framework is built on a bidirectional sentence encoder and a self-attention based transformer network to obtain an input order invariant representation of paragraphs. Moreover, it allows seamless training using a variety of ranking based loss functions, such as pointwise, pairwise, and listwise r… ▽ More We present an attention-based ranking framework for learning to order sentences given a paragraph. Our framework is built on a bidirectional sentence encoder and a self-attention based transformer network to obtain an input order invariant representation of paragraphs. Moreover, it allows seamless training using a variety of ranking based loss functions, such as pointwise, pairwise, and listwise ranking. We apply our framework on two tasks: Sentence Ordering and Order Discrimination. Our framework outperforms various state-of-the-art methods on these tasks on a variety of evaluation metrics. We also show that it achieves better results when using pairwise and listwise ranking losses, rather than the pointwise ranking loss, which suggests that incorporating relative positions of two or more sentences in the loss function contributes to better learning. △ Less

Submitted 31 December, 2019; originally announced January 2020.

Comments: Accepted in AAAI 2020

arXiv:1912.01329 [pdf, other]

Neural Network Branching for Neural Network Verification

Authors: **gyue Lu, M. Pawan Kumar

Abstract: Formal verification of neural networks is essential for their deployment in safety-critical areas. Many available formal verification methods have been shown to be instances of a unified Branch and Bound (BaB) formulation. We propose a novel framework for designing an effective branching strategy for BaB. Specifically, we learn a graph neural network (GNN) to imitate the strong branching heuristic… ▽ More Formal verification of neural networks is essential for their deployment in safety-critical areas. Many available formal verification methods have been shown to be instances of a unified Branch and Bound (BaB) formulation. We propose a novel framework for designing an effective branching strategy for BaB. Specifically, we learn a graph neural network (GNN) to imitate the strong branching heuristic behaviour. Our framework differs from previous methods for learning to branch in two main aspects. Firstly, our framework directly treats the neural network we want to verify as a graph input for the GNN. Secondly, we develop an intuitive forward and backward embedding update schedule. Empirically, our framework achieves roughly $50\%$ reduction in both the number of branches and the time required for verification on various convolutional networks when compared to the best available hand-designed branching strategy. In addition, we show that our GNN model enjoys both horizontal and vertical transferability. Horizontally, the model trained on easy properties performs well on properties of increased difficulty levels. Vertically, the model trained on small neural networks achieves similar performance on large neural networks. △ Less

Submitted 3 December, 2019; originally announced December 2019.

arXiv:1909.06588 [pdf, other]

Branch and Bound for Piecewise Linear Neural Network Verification

Authors: Rudy Bunel, **gyue Lu, Ilker Turkaslan, Philip H. S. Torr, Pushmeet Kohli, M. Pawan Kumar

Abstract: The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models. In this context, verification involves proving or disproving that an NN model satisfies certain input-output properties. Despite the reputation of learned NN models as black boxes, and the theoretical hardness of proving useful propert… ▽ More The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models. In this context, verification involves proving or disproving that an NN model satisfies certain input-output properties. Despite the reputation of learned NN models as black boxes, and the theoretical hardness of proving useful properties about them, researchers have been successful in verifying some classes of models by exploiting their piecewise linear structure and taking insights from formal methods such as Satisifiability Modulo Theory. However, these methods are still far from scaling to realistic neural networks. To facilitate progress on this crucial area, we exploit the Mixed Integer Linear Programming (MIP) formulation of verification to propose a family of algorithms based on Branch-and-Bound (BaB). We show that our family contains previous verification methods as special cases. With the help of the BaB framework, we make three key contributions. Firstly, we identify new methods that combine the strengths of multiple existing approaches, accomplishing significant performance improvements over previous state of the art. Secondly, we introduce an effective branching strategy on ReLU non-linearities. This branching strategy allows us to efficiently and successfully deal with high input dimensional problems with convolutional network architecture, on which previous methods fail frequently. Finally, we propose comprehensive test data sets and benchmarks which includes a collection of previously released testcases. We use the data sets to conduct a thorough experimental comparison of existing and new algorithms and to provide an inclusive analysis of the factors impacting the hardness of verification problems. △ Less

Submitted 26 October, 2020; v1 submitted 14 September, 2019; originally announced September 2019.

arXiv:1907.08674 [pdf]

Deep Learning to Address Candidate Generation and Cold Start Challenges in Recommender Systems: A Research Survey

Authors: Kiran Rama, Pradeep Kumar, Bharat Bhasker

Abstract: Among the machine learning applications to business, recommender systems would take one of the top places when it comes to success and adoption. They help the user in accelerating the process of search while hel** businesses maximize sales. Post phenomenal success in computer vision and speech recognition, deep learning methods are beginning to get applied to recommender systems. Current survey… ▽ More Among the machine learning applications to business, recommender systems would take one of the top places when it comes to success and adoption. They help the user in accelerating the process of search while hel** businesses maximize sales. Post phenomenal success in computer vision and speech recognition, deep learning methods are beginning to get applied to recommender systems. Current survey papers on deep learning in recommender systems provide a historical overview and taxonomy of recommender systems based on type. Our paper addresses the gaps of providing a taxonomy of deep learning approaches to address recommender systems problems in the areas of cold start and candidate generation in recommender systems. We outline different challenges in recommender systems into those related to the recommendations themselves (include relevance, speed, accuracy and scalability), those related to the nature of the data (cold start problem, imbalance and sparsity) and candidate generation. We then provide a taxonomy of deep learning techniques to address these challenges. Deep learning techniques are mapped to the different challenges in recommender systems providing an overview of how deep learning techniques can be used to address them. We contribute a taxonomy of deep learning techniques to address the cold start and candidate generation problems in recommender systems. Cold Start is addressed through additional features (for audio, images, text) and by learning hidden user and item representations. Candidate generation has been addressed by separate networks, RNNs, autoencoders and hybrid methods. We also summarize the advantages and limitations of these techniques while outlining areas for future research. △ Less

Submitted 17 July, 2019; originally announced July 2019.

Comments: 22 pages, Submitted and Presented at PAN IIM Conference in IIM Bangalore

arXiv:1907.01287 [pdf, ps, other]

Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits

Authors: Xi Liu, **-Chun Hsieh, Anirban Bhattacharya, P. R. Kumar

Abstract: Inspired by the Reward-Biased Maximum Likelihood Estimate method of adaptive control, we propose RBMLE -- a novel family of learning algorithms for stochastic multi-armed bandits (SMABs). For a broad range of SMABs including both the parametric Exponential Family as well as the non-parametric sub-Gaussian/Exponential family, we show that RBMLE yields an index policy. To choose the bias-growth rate… ▽ More Inspired by the Reward-Biased Maximum Likelihood Estimate method of adaptive control, we propose RBMLE -- a novel family of learning algorithms for stochastic multi-armed bandits (SMABs). For a broad range of SMABs including both the parametric Exponential Family as well as the non-parametric sub-Gaussian/Exponential family, we show that RBMLE yields an index policy. To choose the bias-growth rate $α(t)$ in RBMLE, we reveal the nontrivial interplay between $α(t)$ and the regret bound that generally applies in both the Exponential Family as well as the sub-Gaussian/Exponential family bandits. To quantify the finite-time performance, we prove that RBMLE attains order-optimality by adaptively estimating the unknown constants in the expression of $α(t)$ for Gaussian and sub-Gaussian bandits. Extensive experiments demonstrate that the proposed RBMLE achieves empirical regret performance competitive with the state-of-the-art methods, while being more computationally efficient and scalable in comparison to the best-performing ones among them. △ Less

Submitted 23 October, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

Comments: ICML 2020

arXiv:1906.05661 [pdf, other]

Training Neural Networks for and by Interpolation

Authors: Leonard Berrada, Andrew Zisserman, M. Pawan Kumar

Abstract: In modern supervised learning, many deep neural networks are able to interpolate the data: the empirical loss can be driven to near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning, which we term Adaptive Learning-rates for Interpolation with Gradients (ALI-G). ALI-G retains the two… ▽ More In modern supervised learning, many deep neural networks are able to interpolate the data: the empirical loss can be driven to near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning, which we term Adaptive Learning-rates for Interpolation with Gradients (ALI-G). ALI-G retains the two main advantages of Stochastic Gradient Descent (SGD), which are (i) a low computational cost per iteration and (ii) good generalization performance in practice. At each iteration, ALI-G exploits the interpolation property to compute an adaptive learning-rate in closed form. In addition, ALI-G clips the learning-rate to a maximal value, which we prove to be helpful for non-convex problems. Crucially, in contrast to the learning-rate of SGD, the maximal learning-rate of ALI-G does not require a decay schedule, which makes it considerably easier to tune. We provide convergence guarantees of ALI-G in various stochastic settings. Notably, we tackle the realistic case where the interpolation property is satisfied up to some tolerance. We provide experiments on a variety of architectures and tasks: (i) learning a differentiable neural computer; (ii) training a wide residual network on the SVHN data set; (iii) training a Bi-LSTM on the SNLI data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets. ALI-G produces state-of-the-art results among adaptive methods, and even yields comparable performance with SGD, which requires manually tuned learning-rate schedules. Furthermore, ALI-G is simple to implement in any standard deep learning framework and can be used as a drop-in replacement in existing code. △ Less

Submitted 1 August, 2020; v1 submitted 13 June, 2019; originally announced June 2019.

Comments: Published at ICML 2020

arXiv:1903.01998 [pdf, other]

doi 10.1088/2632-2153/ac3843

Statistically-informed deep learning for gravitational wave parameter estimation

Authors: Hongyu Shen, E. A. Huerta, Eamonn O'Shea, Prayush Kumar, Zhizhen Zhao

Abstract: We introduce deep learning models to estimate the masses of the binary components of black hole mergers, $(m_1,m_2)$, and three astrophysical properties of the post-merger compact remnant, namely, the final spin, $a_f$, and the frequency and dam** time of the ringdown oscillations of the fundamental $\ell=m=2$ bar mode, $(ω_R, ω_I)$. Our neural networks combine a modified $\texttt{WaveNet}$ arch… ▽ More We introduce deep learning models to estimate the masses of the binary components of black hole mergers, $(m_1,m_2)$, and three astrophysical properties of the post-merger compact remnant, namely, the final spin, $a_f$, and the frequency and dam** time of the ringdown oscillations of the fundamental $\ell=m=2$ bar mode, $(ω_R, ω_I)$. Our neural networks combine a modified $\texttt{WaveNet}$ architecture with contrastive learning and normalizing flow. We validate these models against a Gaussian conjugate prior family whose posterior distribution is described by a closed analytical expression. Upon confirming that our models produce statistically consistent results, we used them to estimate the astrophysical parameters $(m_1,m_2, a_f, ω_R, ω_I)$ of five binary black holes: $\texttt{GW150914}, \texttt{GW170104}, \texttt{GW170814}, \texttt{GW190521}$ and $\texttt{GW190630}$. We use $\texttt{PyCBC Inference}$ to directly compare traditional Bayesian methodologies for parameter estimation with our deep-learning-based posterior distributions. Our results show that our neural network models predict posterior distributions that encode physical correlations, and that our data-driven median results and 90$\%$ confidence intervals are similar to those produced with gravitational wave Bayesian analyses. This methodology requires a single V100 $\texttt{NVIDIA}$ GPU to produce median values and posterior distributions within two milliseconds for each event. This neural network, and a tutorial for its use, are available at the $\texttt{Data and Learning Hub for Science}$. △ Less

Submitted 19 December, 2021; v1 submitted 5 March, 2019; originally announced March 2019.

Comments: v4: 13 pages, 6 figures, First application of Neural Networks for gravitational wave parameter posterior estimation across multiple events with single training

MSC Class: 68T01; 68T35; 83C35; 83C57 ACM Class: I.2

Journal ref: Machine Learning: Science and Technology, Volume 3, Number 1, Year 2022

arXiv:1901.03860 [pdf, ps, other]

Prototypical Metric Transfer Learning for Continuous Speech Keyword Spotting With Limited Training Data

Authors: Harshita Seth, Pulkit Kumar, Muktabh Mayank Srivastava

Abstract: Continuous Speech Keyword Spotting (CSKS) is the problem of spotting keywords in recorded conversations, when a small number of instances of keywords are available in training data. Unlike the more common Keyword Spotting, where an algorithm needs to detect lone keywords or short phrases like "Alexa", "Cortana", "Hi Alexa!", "Whatsup Octavia?" etc. in speech, CSKS needs to filter out embedded word… ▽ More Continuous Speech Keyword Spotting (CSKS) is the problem of spotting keywords in recorded conversations, when a small number of instances of keywords are available in training data. Unlike the more common Keyword Spotting, where an algorithm needs to detect lone keywords or short phrases like "Alexa", "Cortana", "Hi Alexa!", "Whatsup Octavia?" etc. in speech, CSKS needs to filter out embedded words from a continuous flow of speech, ie. spot "Anna" and "github" in "I know a developer named Anna who can look into this github issue." Apart from the issue of limited training data availability, CSKS is an extremely imbalanced classification problem. We address the limitations of simple keyword spotting baselines for both aforementioned challenges by using a novel combination of loss functions (Prototypical networks' loss and metric loss) and transfer learning. Our method improves F1 score by over 10%. △ Less

Submitted 12 January, 2019; originally announced January 2019.

arXiv:1811.07591 [pdf, other]

Deep Frank-Wolfe For Neural Network Optimization

Authors: Leonard Berrada, Andrew Zisserman, M. Pawan Kumar

Abstract: Learning a deep neural network requires solving a challenging optimization problem: it is a high-dimensional, non-convex and non-smooth minimization problem with a large number of terms. The current practice in neural network optimization is to rely on the stochastic gradient descent (SGD) algorithm or its adaptive variants. However, SGD requires a hand-designed schedule for the learning rate. In… ▽ More Learning a deep neural network requires solving a challenging optimization problem: it is a high-dimensional, non-convex and non-smooth minimization problem with a large number of terms. The current practice in neural network optimization is to rely on the stochastic gradient descent (SGD) algorithm or its adaptive variants. However, SGD requires a hand-designed schedule for the learning rate. In addition, its adaptive variants tend to produce solutions that generalize less well on unseen data than SGD with a hand-designed schedule. We present an optimization method that offers empirically the best of both worlds: our algorithm yields good generalization performance while requiring only one hyper-parameter. Our approach is based on a composite proximal framework, which exploits the compositional nature of deep neural networks and can leverage powerful convex optimization algorithms by design. Specifically, we employ the Frank-Wolfe (FW) algorithm for SVM, which computes an optimal step-size in closed-form at each time-step. We further show that the descent direction is given by a simple backward pass in the network, yielding the same computational cost per iteration as SGD. We present experiments on the CIFAR and SNLI data sets, where we demonstrate the significant superiority of our method over Adam, Adagrad, as well as the recently proposed BPGrad and AMSGrad. Furthermore, we compare our algorithm to SGD with a hand-designed learning rate schedule, and show that it provides similar generalization while converging faster. The code is publicly available at https://github.com/oval-group/dfw. △ Less

Submitted 21 February, 2021; v1 submitted 19 November, 2018; originally announced November 2018.

Comments: Published as a conference paper at ICLR 2019, last version fixing an inaccuracy (details in appendix A.5, Proposition 2)

Journal ref: International Conference on Learning Representations 2019

arXiv:1811.07209 [pdf, other]

A Statistical Approach to Assessing Neural Network Robustness

Authors: Stefan Webb, Tom Rainforth, Yee Whye Teh, M. Pawan Kumar

Abstract: We present a new approach to assessing the robustness of neural networks based on estimating the proportion of inputs for which a property is violated. Specifically, we estimate the probability of the event that the property is violated under an input model. Our approach critically varies from the formal verification framework in that when the property can be violated, it provides an informative n… ▽ More We present a new approach to assessing the robustness of neural networks based on estimating the proportion of inputs for which a property is violated. Specifically, we estimate the probability of the event that the property is violated under an input model. Our approach critically varies from the formal verification framework in that when the property can be violated, it provides an informative notion of how robust the network is, rather than just the conventional assertion that the network is not verifiable. Furthermore, it provides an ability to scale to larger networks than formal verification approaches. Though the framework still provides a formal guarantee of satisfiability whenever it successfully finds one or more violations, these advantages do come at the cost of only providing a statistical estimate of unsatisfiability whenever no violation is found. Key to the practical success of our approach is an adaptation of multi-level splitting, a Monte Carlo approach for estimating the probability of rare events, to our statistical robustness framework. We demonstrate that our approach is able to emulate formal verification procedures on benchmark problems, while scaling to larger networks and providing reliable additional information in the form of accurate estimates of the violation probability. △ Less

Submitted 21 February, 2019; v1 submitted 17 November, 2018; originally announced November 2018.

Comments: To appear at the 7th International Conference on Learning Representations (ICLR 2019), New Orleans

arXiv:1811.04803 [pdf, other]

doi 10.1109/TNSE.2019.2948474

Observability Properties of Colored Graphs

Authors: Mark Chilenski, George Cybenko, Isaac Dekine, Piyush Kumar, Gil Raz

Abstract: A colored graph is a directed graph in which nodes or edges have been assigned colors that are not necessarily unique. Observability problems in such graphs consider whether an agent observing the colors of edges or nodes traversed on a path in the graph can determine which node they are at currently or which nodes were visited earlier in the traversal. Previous research efforts have identified se… ▽ More A colored graph is a directed graph in which nodes or edges have been assigned colors that are not necessarily unique. Observability problems in such graphs consider whether an agent observing the colors of edges or nodes traversed on a path in the graph can determine which node they are at currently or which nodes were visited earlier in the traversal. Previous research efforts have identified several different notions of observability as well as the associated properties of graphs for which those observability properties hold. This paper unifies the prior work into a common framework with several new results about relationships between those notions and associated graph properties. The new framework provides an intuitive way to reason about the attainable accuracy as a function of lag and time spent observing, and identifies simple modifications to improve the observability of a given graph. We show that one form of the graph modification problem is in NP-Complete. The intuition of the new framework is borne out with numerical experiments. This work has implications for problems that can be described in terms of an agent traversing a colored graph, including the reconstruction of hidden states in a hidden Markov model (HMM). △ Less

Submitted 16 December, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

Comments: 13 pages, 17 figures

arXiv:1811.02629 [pdf, other]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Authors: Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, Marcel Prastawa, Esther Alberts, Jana Lipkova, John Freymann, Justin Kirby, Michel Bilello, Hassan Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Benedikt Wiestler, Rivka Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko , et al. (402 additional authors not shown)

Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem… ▽ More Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset. △ Less

Submitted 23 April, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: The International Multimodal Brain Tumor Segmentation (BraTS) Challenge

arXiv:1810.12418 [pdf, ps, other]

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

Authors: **-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P. R. Kumar

Abstract: Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a "reneging" phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedast… ▽ More Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a "reneging" phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct "satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves $\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$ regret. Finally, we validate the performance of HR-UCB via simulations. △ Less

Submitted 15 May, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: To appear in ICML 2019. More rounds of experiments are performed before being taken average of compared to versions before

arXiv:1808.01006 [pdf, other]

A Hybrid Variational Autoencoder for Collaborative Filtering

Authors: Kilol Gupta, Mukund Yelahanka Raghuprasad, Pankhuri Kumar

Abstract: In today's day and age when almost every industry has an online presence with users interacting in online marketplaces, personalized recommendations have become quite important. Traditionally, the problem of collaborative filtering has been tackled using Matrix Factorization which is linear in nature. We extend the work of [11] on using variational autoencoders (VAEs) for collaborative filtering w… ▽ More In today's day and age when almost every industry has an online presence with users interacting in online marketplaces, personalized recommendations have become quite important. Traditionally, the problem of collaborative filtering has been tackled using Matrix Factorization which is linear in nature. We extend the work of [11] on using variational autoencoders (VAEs) for collaborative filtering with implicit feedback by proposing a hybrid, multi-modal approach. Our approach combines movie embeddings (learned from a sibling VAE network) with user ratings from the Movielens 20M dataset and applies it to the task of movie recommendation. We empirically show how the VAE network is empowered by incorporating movie embeddings. We also visualize movie and user embeddings by clustering their latent representations obtained from a VAE. △ Less

Submitted 23 September, 2018; v1 submitted 14 July, 2018; originally announced August 2018.

arXiv:1805.11861 [pdf, other]

Foresee: Attentive Future Projections of Chaotic Road Environments with Online Training

Authors: Anil Sharma, Prabhat Kumar

Abstract: In this paper, we train a recurrent neural network to learn dynamics of a chaotic road environment and to project the future of the environment on an image. Future projection can be used to anticipate an unseen environment for example, in autonomous driving. Road environment is highly dynamic and complex due to the interaction among traffic participants such as vehicles and pedestrians. Even in th… ▽ More In this paper, we train a recurrent neural network to learn dynamics of a chaotic road environment and to project the future of the environment on an image. Future projection can be used to anticipate an unseen environment for example, in autonomous driving. Road environment is highly dynamic and complex due to the interaction among traffic participants such as vehicles and pedestrians. Even in this complex environment, a human driver is efficacious to safely drive on chaotic roads irrespective of the number of traffic participants. The proliferation of deep learning research has shown the efficacy of neural networks in learning this human behavior. In the same direction, we investigate recurrent neural networks to understand the chaotic road environment which is shared by pedestrians, vehicles (cars, trucks, bicycles etc.), and sometimes animals as well. We propose \emph{Foresee}, a unidirectional gated recurrent units (GRUs) network with attention to project future of the environment in the form of images. We have collected several videos on Delhi roads consisting of various traffic participants, background and infrastructure differences (like 3D pedestrian crossing) at various times on various days. We train \emph{Foresee} in an unsupervised way and we use online training to project frames up to $0.5$ seconds in advance. We show that our proposed model performs better than state of the art methods (prednet and Enc. Dec. LSTM) and finally, we show that our trained model generalizes to a public dataset for future projections. △ Less

Submitted 30 May, 2018; originally announced May 2018.

arXiv:1802.00086 [pdf, other]

doi 10.1007/s10994-018-5736-y

Optimizing Non-decomposable Measures with Deep Networks

Authors: Amartya Sanyal, Pawan Kumar, Purushottam Kar, Sanjay Chawla, Fabrizio Sebastiani

Abstract: We present a class of algorithms capable of directly training deep neural networks with respect to large families of task-specific performance measures such as the F-measure and the Kullback-Leibler divergence that are structured and non-decomposable. This presents a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable)… ▽ More We present a class of algorithms capable of directly training deep neural networks with respect to large families of task-specific performance measures such as the F-measure and the Kullback-Leibler divergence that are structured and non-decomposable. This presents a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields much faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations have several novel features including (i) convergence to first order stationary points despite optimizing complex objective functions; (ii) use of fewer training samples to achieve a desired level of convergence, (iii) a substantial reduction in training time, and (iv) a seamless integration of our implementation into existing symbolic gradient frameworks. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as some recent approaches to task-specific training of neural networks. △ Less

Submitted 31 January, 2018; originally announced February 2018.

Journal ref: Final version published in Machine Learning, 107(8-10):1597-1620, 2018

arXiv:1801.06490 [pdf, other]

Worst-case Optimal Submodular Extensions for Marginal Estimation

Authors: Pankaj Pansari, Chris Russell, M. Pawan Kumar

Abstract: Submodular extensions of an energy function can be used to efficiently compute approximate marginals via variational inference. The accuracy of the marginals depends crucially on the quality of the submodular extension. To identify the best possible extension, we show an equivalence between the submodular extensions of the energy and the objective functions of linear programming (LP) relaxations f… ▽ More Submodular extensions of an energy function can be used to efficiently compute approximate marginals via variational inference. The accuracy of the marginals depends crucially on the quality of the submodular extension. To identify the best possible extension, we show an equivalence between the submodular extensions of the energy and the objective functions of linear programming (LP) relaxations for the corresponding MAP estimation problem. This allows us to (i) establish the worst-case optimality of the submodular extension for Potts model used in the literature; (ii) identify the worst-case optimal submodular extension for the more general class of metric labeling; and (iii) efficiently compute the marginals for the widely used dense CRF model with the help of a recently proposed Gaussian filtering method. Using synthetic and real data, we show that our approach provides comparable upper bounds on the log-partition function to those obtained using tree-reweighted message passing (TRW) in cases where the latter is computationally feasible. Importantly, unlike TRW, our approach provides the first practical algorithm to compute an upper bound on the dense CRF model. △ Less

Submitted 10 January, 2018; originally announced January 2018.

Comments: Accepted to AISTATS 2018

arXiv:1711.09279 [pdf, other]

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Authors: Anand Gupta, Hardeo Thakur, Ritvik Shrivastava, Pulkit Kumar, Sreyashi Nag

Abstract: With the spreading prevalence of Big Data, many advances have recently been made in this field. Frameworks such as Apache Hadoop and Apache Spark have gained a lot of traction over the past decades and have become massively popular, especially in industries. It is becoming increasingly evident that effective big data analysis is key to solving artificial intelligence problems. Thus, a multi-algori… ▽ More With the spreading prevalence of Big Data, many advances have recently been made in this field. Frameworks such as Apache Hadoop and Apache Spark have gained a lot of traction over the past decades and have become massively popular, especially in industries. It is becoming increasingly evident that effective big data analysis is key to solving artificial intelligence problems. Thus, a multi-algorithm library was implemented in the Spark framework, called MLlib. While this library supports multiple machine learning algorithms, there is still scope to use the Spark setup efficiently for highly time-intensive and computationally expensive procedures like deep learning. In this paper, we propose a novel framework that combines the distributive computational abilities of Apache Spark and the advanced machine learning architecture of a deep multi-layer perceptron (MLP), using the popular concept of Cascade Learning. We conduct empirical analysis of our framework on two real world datasets. The results are encouraging and corroborate our proposed framework, in turn proving that it is an improvement over traditional big data analysis methods that use either Spark or Deep learning as individual elements. △ Less

Submitted 25 November, 2017; originally announced November 2017.

Comments: To be published in IEEE ICDM 2017 (International Conference on Data Mining) Workshop on Data Science and Big Data Analytics (DSBDA)

arXiv:1710.09180 [pdf, other]

Anatomical labeling of brain CT scan anomalies using multi-context nearest neighbor relation networks

Authors: Srikrishna Varadarajan, Muktabh Mayank Srivastava, Monika Grewal, Pulkit Kumar

Abstract: This work is an endeavor to develop a deep learning methodology for automated anatomical labeling of a given region of interest (ROI) in brain computed tomography (CT) scans. We combine both local and global context to obtain a representation of the ROI. We then use Relation Networks (RNs) to predict the corresponding anatomy of the ROI based on its relationship score for each class. Further, we p… ▽ More This work is an endeavor to develop a deep learning methodology for automated anatomical labeling of a given region of interest (ROI) in brain computed tomography (CT) scans. We combine both local and global context to obtain a representation of the ROI. We then use Relation Networks (RNs) to predict the corresponding anatomy of the ROI based on its relationship score for each class. Further, we propose a novel strategy employing nearest neighbors approach for training RNs. We train RNs to learn the relationship of the target ROI with the joint representation of its nearest neighbors in each class instead of all data-points in each class. The proposed strategy leads to better training of RNs along with increased performance as compared to training baseline RN network. △ Less

Submitted 22 January, 2018; v1 submitted 25 October, 2017; originally announced October 2017.

Comments: Accepted as a one page abstract at IEEE International Symposium on Biomedical Imaging (ISBI), 2018

arXiv:1710.04934 [pdf, ps, other]

RADNET: Radiologist Level Accuracy using Deep Learning for HEMORRHAGE detection in CT Scans

Authors: Monika Grewal, Muktabh Mayank Srivastava, Pulkit Kumar, Srikrishna Varadarajan

Abstract: We describe a deep learning approach for automated brain hemorrhage detection from computed tomography (CT) scans. Our model emulates the procedure followed by radiologists to analyse a 3D CT scan in real-world. Similar to radiologists, the model sifts through 2D cross-sectional slices while paying close attention to potential hemorrhagic regions. Further, the model utilizes 3D context from neighb… ▽ More We describe a deep learning approach for automated brain hemorrhage detection from computed tomography (CT) scans. Our model emulates the procedure followed by radiologists to analyse a 3D CT scan in real-world. Similar to radiologists, the model sifts through 2D cross-sectional slices while paying close attention to potential hemorrhagic regions. Further, the model utilizes 3D context from neighboring slices to improve predictions at each slice and subsequently, aggregates the slice-level predictions to provide diagnosis at CT level. We refer to our proposed approach as Recurrent Attention DenseNet (RADnet) as it employs original DenseNet architecture along with adding the components of attention for slice level predictions and recurrent neural network layer for incorporating 3D context. The real-world performance of RADnet has been benchmarked against independent analysis performed by three senior radiologists for 77 brain CTs. RADnet demonstrates 81.82% hemorrhage prediction accuracy at CT level that is comparable to radiologists. Further, RADnet achieves higher recall than two of the three radiologists, which is remarkable. △ Less

Submitted 3 January, 2018; v1 submitted 13 October, 2017; originally announced October 2017.

Comments: Accepted at IEEE Symposium on Biomedical Imaging (ISBI) 2018 as conference paper

arXiv:1612.05021 [pdf, other]

Dynamic Modeling of Price Responsive Demand in Real-time Electricity Market: Empirical Analysis

Authors: Jaeyong An, P. R. Kumar, Le Xie

Abstract: In this paper, we study the price responsiveness of electricity consumption from empirical commercial and industrial load data obtained from Texas. Employing a dynamical system perspective, we show that price responsive demand can be modeled as a hybrid of a Hammerstein model with delay following a price surge, and a linear ARX model under moderate price changes. It is observed that electricity co… ▽ More In this paper, we study the price responsiveness of electricity consumption from empirical commercial and industrial load data obtained from Texas. Employing a dynamical system perspective, we show that price responsive demand can be modeled as a hybrid of a Hammerstein model with delay following a price surge, and a linear ARX model under moderate price changes. It is observed that electricity consumption therefore has unique characteristics including (1) qualitatively distinct response between moderate and extremely high prices; and (2) a time delay associated with the response to high prices. It is shown that these observed features may render traditional approaches to demand response and retail pricing based on classical economic theories ineffective. In particular, ultimate real-time retail pricing may be limitedly beneficial than as considered in classical economic theories. △ Less

Submitted 15 December, 2016; originally announced December 2016.

arXiv:1307.4048 [pdf, ps, other]

doi 10.1109/ASRU.2013.6707725

Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition

Authors: D. S. Pavan Kumar, N. Vishnu Prasad, Vikas Joshi, S. Umesh

Abstract: In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition. The modification is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean… ▽ More In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition. The modification is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. Finally, an MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed. The modified SPLICE shows 8.6% absolute improvement over SPLICE in Test C of Aurora-2 database, and 2.93% overall. Non-stereo method shows 10.37% and 6.93% absolute improvements over Aurora-2 and Aurora-4 baseline models respectively. Run-time adaptation shows 9.89% absolute improvement in modified framework as compared to SPLICE for Test C, and 4.96% overall w.r.t. standard MLLR adaptation on HMMs. △ Less

Submitted 15 July, 2013; originally announced July 2013.

Comments: Submitted to Automatic Speech Recognition and Understanding (ASRU) 2013 Workshop

arXiv:1202.5957 [pdf]

Parameterized Complexity on a New Sorting Algorithm: A Study in Simulation

Authors: Prashant Kumar, Anchala Kumari, Soubhik Chakraborty

Abstract: Sundararajan and Chakraborty (2007) introduced a new sorting algorithm by modifying the fast and popular Quick sort and removing the interchanges. In a subsequent empirical study, Sourabh, Sundararajan and Chakraborty (2007) demonstrated that this algorithm sorts inputs from certain probability distributions faster than others and the authors made a list of some standard probability distributions… ▽ More Sundararajan and Chakraborty (2007) introduced a new sorting algorithm by modifying the fast and popular Quick sort and removing the interchanges. In a subsequent empirical study, Sourabh, Sundararajan and Chakraborty (2007) demonstrated that this algorithm sorts inputs from certain probability distributions faster than others and the authors made a list of some standard probability distributions in decreasing order of speed, namely, Continuous uniform < Discrete uniform < Binomial < Negative Binomial < Poisson < Geometric < Exponential < Standard Normal. It is clear from this interesting second study that the algorithm is sensitive to input probability distribution. Based on these pervious findings, in the present paper we are motivated to do some further study on this sorting algorithm through simulation and determine the appropriate empirical model which explains its average sorting time with special emphasis on parameterized complexity. △ Less

Submitted 27 February, 2012; originally announced February 2012.

Comments: 14 pages

Journal ref: Ann. Univ. Tibiscus Comp. Sci. Series VII/2 (2009), 9-22

Showing 1–38 of 38 results for author: Kumar, P