Search | arXiv e-print repository

Mean Estimation with User-Level Privacy for Spatio-Temporal IoT Datasets

Authors: V. Arvind Rameshwar, Anshoo Tandon, Prajjwal Gupta, Aditya Vikram Singh, Novoneel Chakraborty, Abhay Sharma

Abstract: This paper considers the problem of the private release of sample means of speed values from traffic datasets. Our key contribution is the development of user-level differentially private algorithms that incorporate carefully chosen parameter values to ensure low estimation errors on real-world datasets, while ensuring privacy. We test our algorithms on ITMS (Intelligent Traffic Management System)… ▽ More This paper considers the problem of the private release of sample means of speed values from traffic datasets. Our key contribution is the development of user-level differentially private algorithms that incorporate carefully chosen parameter values to ensure low estimation errors on real-world datasets, while ensuring privacy. We test our algorithms on ITMS (Intelligent Traffic Management System) data from an Indian city, where the speeds of different buses are drawn in a potentially non-i.i.d. manner from an unknown distribution, and where the number of speed samples contributed by different buses is potentially different. We then apply our algorithms to large synthetic datasets, generated based on the ITMS data. Here, we provide theoretical justification for the observed performance trends, and also provide recommendations for the choices of algorithm subroutines that result in low estimation errors. Finally, we characterize the best performance of pseudo-user creation-based algorithms on worst-case datasets via a minimax approach; this then gives rise to a novel procedure for the creation of pseudo-users, which optimizes the worst-case total estimation error. The algorithms discussed in the paper are readily applicable to general spatio-temporal IoT datasets for releasing a differentially private mean of a desired value. △ Less

Submitted 25 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 14 pages, 5 figures, submitted to the ACM for possible publication

arXiv:2401.14283 [pdf, other]

Information Leakage Detection through Approximate Bayes-optimal Prediction

Authors: Pritha Gupta, Marcel Wever, Eyke Hüllermeier

Abstract: In today's data-driven world, the proliferation of publicly available information intensifies the challenge of information leakage (IL), raising security concerns. IL involves unintentionally exposing secret (sensitive) information to unauthorized parties via systems' observable information. Conventional statistical approaches, which estimate mutual information (MI) between observable and secret i… ▽ More In today's data-driven world, the proliferation of publicly available information intensifies the challenge of information leakage (IL), raising security concerns. IL involves unintentionally exposing secret (sensitive) information to unauthorized parties via systems' observable information. Conventional statistical approaches, which estimate mutual information (MI) between observable and secret information for detecting IL, face challenges such as the curse of dimensionality, convergence, computational complexity, and MI misestimation. Furthermore, emerging supervised machine learning (ML) methods, though effective, are limited to binary system-sensitive information and lack a comprehensive theoretical framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to accurately quantify and detect IL. We demonstrate that MI can be accurately estimated by approximating the log-loss and accuracy of the Bayes predictor. As the Bayes predictor is typically unknown in practice, we propose to approximate it with the help of automated machine learning (AutoML). First, we compare our MI estimation approaches against current baselines, using synthetic data sets generated using the multivariate normal (MVN) distribution with known MI. Second, we introduce a cut-off technique using one-sided statistical tests to detect IL, employing the Holm-Bonferroni correction to increase confidence in detection decisions. Our study evaluates IL detection performance on real-world data sets, highlighting the effectiveness of the Bayes predictor's log-loss estimation, and finds our proposed method to effectively estimate MI on synthetic data sets and thus detect ILs accurately. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: Under submission in JMLR

MSC Class: 94A15; 62H30; 94A60 ACM Class: I.5.1; G.3; E.3

arXiv:2310.10745 [pdf, other]

Mori-Zwanzig latent space Koopman closure for nonlinear autoencoder

Authors: Priyam Gupta, Peter J. Schmid, Denis Sipp, Taraneh Sayadi, Georgios Rigas

Abstract: The Koopman operator presents an attractive approach to achieve global linearization of nonlinear systems, making it a valuable method for simplifying the understanding of complex dynamics. While data-driven methodologies have exhibited promise in approximating finite Koopman operators, they grapple with various challenges, such as the judicious selection of observables, dimensionality reduction,… ▽ More The Koopman operator presents an attractive approach to achieve global linearization of nonlinear systems, making it a valuable method for simplifying the understanding of complex dynamics. While data-driven methodologies have exhibited promise in approximating finite Koopman operators, they grapple with various challenges, such as the judicious selection of observables, dimensionality reduction, and the ability to predict complex system behaviors accurately. This study presents a novel approach termed Mori-Zwanzig autoencoder (MZ-AE) to robustly approximate the Koopman operator in low-dimensional spaces. The proposed method leverages a nonlinear autoencoder to extract key observables for approximating a finite invariant Koopman subspace and integrates a non-Markovian correction mechanism using the Mori-Zwanzig formalism. Consequently, this approach yields a closed representation of dynamics within the latent manifold of the nonlinear autoencoder, thereby enhancing the precision and stability of the Koopman operator approximation. Demonstrations showcase the technique's ability to capture regime transitions in the flow around a cylinder. It also provides a low dimensional approximation for Kuramoto-Sivashinsky with promising short-term predictability and robust long-term statistical performance. By bridging the gap between data-driven techniques and the mathematical foundations of Koopman theory, MZ-AE offers a promising avenue for improved understanding and prediction of complex nonlinear dynamics. △ Less

Submitted 16 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 22 pages, 11 figures

arXiv:2207.12007 [pdf, other]

LETS-GZSL: A Latent Embedding Model for Time Series Generalized Zero Shot Learning

Authors: Sathvik Bhaskarpandit, Priyanka Gupta, Manik Gupta

Abstract: One of the recent developments in deep learning is generalized zero-shot learning (GZSL), which aims to recognize objects from both seen and unseen classes, when only the labeled examples from seen classes are provided. Over the past couple of years, GZSL has picked up traction and several models have been proposed to solve this problem. Whereas an extensive amount of research on GZSL has been car… ▽ More One of the recent developments in deep learning is generalized zero-shot learning (GZSL), which aims to recognize objects from both seen and unseen classes, when only the labeled examples from seen classes are provided. Over the past couple of years, GZSL has picked up traction and several models have been proposed to solve this problem. Whereas an extensive amount of research on GZSL has been carried out in fields such as computer vision and natural language processing, no such research has been carried out to deal with time series data. GZSL is used for applications such as detecting abnormalities from ECG and EEG data and identifying unseen classes from sensor, spectrograph and other devices' data. In this regard, we propose a Latent Embedding for Time Series - GZSL (LETS-GZSL) model that can solve the problem of GZSL for time series classification (TSC). We utilize an embedding-based approach and combine it with attribute vectors to predict the final class labels. We report our results on the widely popular UCR archive datasets. Our framework is able to achieve a harmonic mean value of at least 55% on most of the datasets except when the number of unseen classes is greater than 3 or the amount of data is very low (less than 100 training examples). △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: 9 pages, 5 figures, 6 tables. Accepted at the IJCAI 2022 workshop on Artificial Intelligence for Time Series (AI4TS)

arXiv:2206.14987 [pdf, other]

Lookback for Learning to Branch

Authors: Prateek Gupta, Elias B. Khalil, Didier Chetélat, Maxime Gasse, Yoshua Bengio, Andrea Lodi, M. Pawan Kumar

Abstract: The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound (B&B) solvers. These GNNs are trained, offline and on a collec… ▽ More The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound (B&B) solvers. These GNNs are trained, offline and on a collection of MILPs, to imitate a very good but computationally expensive branching heuristic, strong branching. Given that B&B results in a tree of sub-MILPs, we ask (a) whether there are strong dependencies exhibited by the target heuristic among the neighboring nodes of the B&B tree, and (b) if so, whether we can incorporate them in our training procedure. Specifically, we find that with the strong branching heuristic, a child node's best choice was often the parent's second-best choice. We call this the "lookback" phenomenon. Surprisingly, the typical branching GNN of Gasse et al. (2019) often misses this simple "answer". To imitate the target behavior more closely by incorporating the lookback phenomenon in GNNs, we propose two methods: (a) target smoothing for the standard cross-entropy loss function, and (b) adding a Parent-as-Target (PAT) Lookback regularizer term. Finally, we propose a model selection framework to incorporate harder-to-formulate objectives such as solving time in the final models. Through extensive experimentation on standard benchmark instances, we show that our proposal results in up to 22% decrease in the size of the B&B tree and up to 15% improvement in the solving times. △ Less

Submitted 29 December, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: Published in Transactions on Machine Learning Research (TMLR)

arXiv:2112.07067 [pdf, other]

Dynamic Learning of Correlation Potentials for a Time-Dependent Kohn-Sham System

Authors: Harish S. Bhat, Kevin Collins, Prachi Gupta, Christine M. Isborn

Abstract: We develop methods to learn the correlation potential for a time-dependent Kohn-Sham (TDKS) system in one spatial dimension. We start from a low-dimensional two-electron system for which we can numerically solve the time-dependent Schrödinger equation; this yields electron densities suitable for training models of the correlation potential. We frame the learning problem as one of optimizing a leas… ▽ More We develop methods to learn the correlation potential for a time-dependent Kohn-Sham (TDKS) system in one spatial dimension. We start from a low-dimensional two-electron system for which we can numerically solve the time-dependent Schrödinger equation; this yields electron densities suitable for training models of the correlation potential. We frame the learning problem as one of optimizing a least-squares objective subject to the constraint that the dynamics obey the TDKS equation. Applying adjoints, we develop efficient methods to compute gradients and thereby learn models of the correlation potential. Our results show that it is possible to learn values of the correlation potential such that the resulting electron densities match ground truth densities. We also show how to learn correlation potential functionals with memory, demonstrating one such model that yields reasonable results for trajectories outside the training set. △ Less

Submitted 6 December, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: 20 pages, 5 figures

Journal ref: Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:546-558, 2022

arXiv:2110.03594 [pdf]

doi 10.1016/j.oceaneng.2022.111094

Ship Performance Monitoring using Machine-learning

Authors: Prateek Gupta, Adil Rasheed, Sverre Steen

Abstract: The hydrodynamic performance of a sea-going ship varies over its lifespan due to factors like marine fouling and the condition of the anti-fouling paint system. In order to accurately estimate the power demand and fuel consumption for a planned voyage, it is important to assess the hydrodynamic performance of the ship. The current work uses machine-learning (ML) methods to estimate the hydrodynami… ▽ More The hydrodynamic performance of a sea-going ship varies over its lifespan due to factors like marine fouling and the condition of the anti-fouling paint system. In order to accurately estimate the power demand and fuel consumption for a planned voyage, it is important to assess the hydrodynamic performance of the ship. The current work uses machine-learning (ML) methods to estimate the hydrodynamic performance of a ship using the onboard recorded in-service data. Three ML methods, NL-PCR, NL-PLSR and probabilistic ANN, are calibrated using the data from two sister ships. The calibrated models are used to extract the varying trend in ship's hydrodynamic performance over time and predict the change in performance through several propeller and hull cleaning events. The predicted change in performance is compared with the corresponding values estimated using the fouling friction coefficient ($ΔC_F$). The ML methods are found to be performing well while modelling the hydrodynamic state variables of the ships with probabilistic ANN model performing the best, but the results from NL-PCR and NL-PLSR are not far behind, indicating that it may be possible to use simple methods to solve such problems with the help of domain knowledge. △ Less

Submitted 13 December, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Journal ref: Ocean Engineering, Volume 254, 15 June 2022, 111094

arXiv:2108.13941 [pdf, other]

Bubblewrap: Online tiling and real-time flow prediction on neural manifolds

Authors: Anne Draelos, Pranjal Gupta, Na Young Jun, Chaichontat Sriworarat, John Pearson

Abstract: While most classic studies of function in experimental neuroscience have focused on the coding properties of individual neurons, recent developments in recording technologies have resulted in an increasing emphasis on the dynamics of neural populations. This has given rise to a wide variety of models for analyzing population activity in relation to experimental variables, but direct testing of man… ▽ More While most classic studies of function in experimental neuroscience have focused on the coding properties of individual neurons, recent developments in recording technologies have resulted in an increasing emphasis on the dynamics of neural populations. This has given rise to a wide variety of models for analyzing population activity in relation to experimental variables, but direct testing of many neural population hypotheses requires intervening in the system based on current neural state, necessitating models capable of inferring neural state online. Existing approaches, primarily based on dynamical systems, require strong parametric assumptions that are easily violated in the noise-dominated regime and do not scale well to the thousands of data channels in modern experiments. To address this problem, we propose a method that combines fast, stable dimensionality reduction with a soft tiling of the resulting neural manifold, allowing dynamics to be approximated as a probability flow between tiles. This method can be fit efficiently using online expectation maximization, scales to tens of thousands of tiles, and outperforms existing methods when dynamics are noise-dominated or feature multi-modal transition probabilities. The resulting model can be trained at kiloHertz data rates, produces accurate approximations of neural dynamics within minutes, and generates predictions on submillisecond time scales. It retains predictive performance throughout many time steps into the future and is fast enough to serve as a component of closed-loop causal experiments. △ Less

Submitted 1 November, 2021; v1 submitted 31 August, 2021; originally announced August 2021.

Comments: Version of the work appearing in NeurIPS 2021

arXiv:2108.07872 [pdf, other]

Aggregated Customer Engagement Model

Authors: Priya Gupta, Cuize Han

Abstract: E-commerce websites use machine learned ranking models to serve shop** results to customers. Typically, the websites log the customer search events, which include the query entered and the resulting engagement with the shop** results, such as clicks and purchases. Each customer search event serves as input training data for the models, and the individual customer engagement serves as a signal… ▽ More E-commerce websites use machine learned ranking models to serve shop** results to customers. Typically, the websites log the customer search events, which include the query entered and the resulting engagement with the shop** results, such as clicks and purchases. Each customer search event serves as input training data for the models, and the individual customer engagement serves as a signal for customer preference. So a purchased shop** result, for example, is perceived to be more important than one that is not. However, new or under-impressed products do not have enough customer engagement signals and end up at a disadvantage when being ranked alongside popular products. In this paper, we propose a novel method for data curation that aggregates all customer engagements within a day for the same query to use as input training data. This aggregated customer engagement gives the models a complete picture of the relative importance of shop** results. Training models on this aggregated data leads to less reliance on behavioral features. This helps mitigate the cold start problem and boosted relevant new products to top search results. In this paper, we present the offline and online analysis and results comparing the individual and aggregated customer engagement models trained on e-commerce data. △ Less

Submitted 17 August, 2021; originally announced August 2021.

arXiv:2105.14890 [pdf, other]

doi 10.1145/3461702.3462592

Rawlsian Fair Adaptation of Deep Learning Classifiers

Authors: Kulin Shah, Pooja Gupta, Amit Deshpande, Chiranjib Bhattacharyya

Abstract: Group-fairness in classification aims for equality of a predictive utility across different sensitive sub-populations, e.g., race or gender. Equality or near-equality constraints in group-fairness often worsen not only the aggregate utility but also the utility for the least advantaged sub-population. In this paper, we apply the principles of Pareto-efficiency and least-difference to the utility b… ▽ More Group-fairness in classification aims for equality of a predictive utility across different sensitive sub-populations, e.g., race or gender. Equality or near-equality constraints in group-fairness often worsen not only the aggregate utility but also the utility for the least advantaged sub-population. In this paper, we apply the principles of Pareto-efficiency and least-difference to the utility being accuracy, as an illustrative example, and arrive at the Rawls classifier that minimizes the error rate on the worst-off sensitive sub-population. Our mathematical characterization shows that the Rawls classifier uniformly applies a threshold to an ideal score of features, in the spirit of fair equality of opportunity. In practice, such a score or a feature representation is often computed by a black-box model that has been useful but unfair. Our second contribution is practical Rawlsian fair adaptation of any given black-box deep learning model, without changing the score or feature representation it computes. Given any score function or feature representation and only its second-order statistics on the sensitive sub-populations, we seek a threshold classifier on the given score or a linear threshold classifier on the given feature representation that achieves the Rawls error rate restricted to this hypothesis class. Our technical contribution is to formulate the above problems using ambiguous chance constraints, and to provide efficient algorithms for Rawlsian fair adaptation, along with provable upper bounds on the Rawls error rate. Our empirical results show significant improvement over state-of-the-art group-fair algorithms, even without retraining for fairness. △ Less

Submitted 31 May, 2021; originally announced May 2021.

Comments: 24 figures, 19 figures

arXiv:2011.03729 [pdf, other]

Enhash: A Fast Streaming Algorithm For Concept Drift Detection

Authors: Aashi **dal, Prashant Gupta, Debarka Sengupta, Jayadeva

Abstract: We propose Enhash, a fast ensemble learner that detects \textit{concept drift} in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. We show empirically that the proposed method has competitive performance to existing ensemble learners in much lesser time. Also, E… ▽ More We propose Enhash, a fast ensemble learner that detects \textit{concept drift} in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. We show empirically that the proposed method has competitive performance to existing ensemble learners in much lesser time. Also, Enhash has moderate resource requirements. Experiments relevant to performance comparison were performed on 6 artificial and 4 real data sets consisting of various types of drifts. △ Less

Submitted 7 November, 2020; originally announced November 2020.

arXiv:2009.01571 [pdf, other]

MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance

Authors: Anubha Kabra, Ayush Chopra, Nikaash Puri, Pinkesh Badjatiya, Sukriti Verma, Piyush Gupta, Balaji K

Abstract: Training a classification model on a dataset where the instances of one class outnumber those of the other class is a challenging problem. Such imbalanced datasets are standard in real-world situations such as fraud detection, medical diagnosis, and computational advertising. We propose an iterative data augmentation method, MixBoost, which intelligently selects (Boost) and then combines (Mix) ins… ▽ More Training a classification model on a dataset where the instances of one class outnumber those of the other class is a challenging problem. Such imbalanced datasets are standard in real-world situations such as fraud detection, medical diagnosis, and computational advertising. We propose an iterative data augmentation method, MixBoost, which intelligently selects (Boost) and then combines (Mix) instances from the majority and minority classes to generate synthetic hybrid instances that have characteristics of both classes. We evaluate MixBoost on 20 benchmark datasets, show that it outperforms existing approaches, and test its efficacy through significance testing. We also present ablation studies to analyze the impact of the different components of MixBoost. △ Less

Submitted 3 September, 2020; originally announced September 2020.

Comments: Work done as part of internship at MDSR

arXiv:2009.00149 [pdf, other]

GIF: Generative Interpretable Faces

Authors: Partha Ghosh, Pravir Singh Gupta, Roy Uziel, Anurag Ranjan, Michael Black, Timo Bolkart

Abstract: Photo-realistic visualization and animation of expressive human faces have been a long standing challenge. 3D face modeling methods provide parametric control but generates unrealistic images, on the other hand, generative 2D models like GANs (Generative Adversarial Networks) output photo-realistic face images, but lack explicit control. Recent methods gain partial control, either by attempting to… ▽ More Photo-realistic visualization and animation of expressive human faces have been a long standing challenge. 3D face modeling methods provide parametric control but generates unrealistic images, on the other hand, generative 2D models like GANs (Generative Adversarial Networks) output photo-realistic face images, but lack explicit control. Recent methods gain partial control, either by attempting to disentangle different factors in an unsupervised manner, or by adding control post hoc to a pre-trained model. Unconditional GANs, however, may entangle factors that are hard to undo later. We condition our generative model on pre-defined control parameters to encourage disentanglement in the generation process. Specifically, we condition StyleGAN2 on FLAME, a generative 3D face model. While conditioning on FLAME parameters yields unsatisfactory results, we find that conditioning on rendered FLAME geometry and photometric details works well. This gives us a generative 2D face model named GIF (Generative Interpretable Faces) that offers FLAME's parametric control. Here, interpretable refers to the semantic meaning of different parameters. Given FLAME parameters for shape, pose, expressions, parameters for appearance, lighting, and an additional style vector, GIF outputs photo-realistic face images. We perform an AMT based perceptual study to quantitatively and qualitatively evaluate how well GIF follows its conditioning. The code, data, and trained model are publicly available for research purposes at http://gif.is.tue.mpg.de. △ Less

Submitted 25 November, 2020; v1 submitted 31 August, 2020; originally announced September 2020.

Comments: International Conference on 3D Vision (3DV) 2020

arXiv:2007.00237 [pdf, other]

Unbiased Loss Functions for Extreme Classification With Missing Labels

Authors: Erik Schultheis, Mohammadreza Qaraei, Priyanshu Gupta, Rohit Babbar

Abstract: The goal in extreme multi-label classification (XMC) is to tag an instance with a small subset of relevant labels from an extremely large set of possible labels. In addition to the computational burden arising from large number of training instances, features and labels, problems in XMC are faced with two statistical challenges, (i) large number of 'tail-labels' -- those which occur very infrequen… ▽ More The goal in extreme multi-label classification (XMC) is to tag an instance with a small subset of relevant labels from an extremely large set of possible labels. In addition to the computational burden arising from large number of training instances, features and labels, problems in XMC are faced with two statistical challenges, (i) large number of 'tail-labels' -- those which occur very infrequently, and (ii) missing labels as it is virtually impossible to manually assign every relevant label to an instance. In this work, we derive an unbiased estimator for general formulation of loss functions which decompose over labels, and then infer the forms for commonly used loss functions such as hinge- and squared-hinge-loss and binary cross-entropy loss. We show that the derived unbiased estimators, in the form of appropriate weighting factors, can be easily incorporated in state-of-the-art algorithms for extreme classification, thereby scaling to datasets with hundreds of thousand labels. However, empirically, we find a slightly altered version that gives more relative weight to tail labels to perform even better. We suspect is due to the label imbalance in the dataset, which is not explicitly addressed by our theoretically derived estimator. Minimizing the proposed loss functions leads to significant improvement over existing methods (up to 20% in some cases) on benchmark datasets in XMC. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2006.15212 [pdf, other]

Hybrid Models for Learning to Branch

Authors: Prateek Gupta, Maxime Gasse, Elias B. Khalil, M. Pawan Kumar, Andrea Lodi, Yoshua Bengio

Abstract: A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two… ▽ More A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two key questions. First, in a more realistic setting where only a CPU is available, is the GNN model still competitive? Second, can we devise an alternate computationally inexpensive model that retains the predictive power of the GNN architecture? We answer the first question in the negative, and address the second question by proposing a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching. We evaluate our methods on four classes of MILP problems, and show that they lead to up to 26% reduction in solver running time compared to state-of-the-art methods without a GPU, while extrapolating to harder problems than it was trained on. The code for this project is publicly available at https://github.com/pg2455/Hybrid-learn2branch. △ Less

Submitted 23 October, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

arXiv:2005.08640 [pdf]

A Weighted Mutual k-Nearest Neighbour for Classification Mining

Authors: Joydip Dhar, Ashaya Shukla, Mukul Kumar, Prashant Gupta

Abstract: kNN is a very effective Instance based learning method, and it is easy to implement. Due to heterogeneous nature of data, noises from different possible sources are also widespread in nature especially in case of large-scale databases. For noise elimination and effect of pseudo neighbours, in this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal o… ▽ More kNN is a very effective Instance based learning method, and it is easy to implement. Due to heterogeneous nature of data, noises from different possible sources are also widespread in nature especially in case of large-scale databases. For noise elimination and effect of pseudo neighbours, in this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal of pseudo neighbours from the dataset so as to provide comparative better results. This algorithm also tries to minimize effect of those neighbours which are distant. A concept of certainty measure is also introduced for experimental results. The advantage of using concept of mutual neighbours and distance-weighted voting is that, dataset will be refined after removal of anomaly and weightage concept compels to take into account more consideration of those neighbours, which are closer. Consequently, finally the performance of proposed algorithm is calculated. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Comments: 5 pages, 1 figure, 5 tables

ACM Class: I.7.0

arXiv:2005.02595 [pdf, ps, other]

doi 10.1109/TAI.2020.3027279

Approaches and Applications of Early Classification of Time Series: A Review

Authors: Ashish Gupta, Hari Prabhat Gupta, Bhaskar Biswas, Tanima Dutta

Abstract: Early classification of time series has been extensively studied for minimizing class prediction delay in time-sensitive applications such as healthcare and finance. A primary task of an early classification approach is to classify an incomplete time series as soon as possible with some desired level of accuracy. Recent years have witnessed several approaches for early classification of time serie… ▽ More Early classification of time series has been extensively studied for minimizing class prediction delay in time-sensitive applications such as healthcare and finance. A primary task of an early classification approach is to classify an incomplete time series as soon as possible with some desired level of accuracy. Recent years have witnessed several approaches for early classification of time series. As most of the approaches have solved the early classification problem with different aspects, it becomes very important to make a thorough review of the existing solutions to know the current status of the area. These solutions have demonstrated reasonable performance in a wide range of applications including human activity recognition, gene expression based health diagnostic, industrial monitoring, and so on. In this paper, we present a systematic review of current literature on early classification approaches for both univariate and multivariate time series. We divide various existing approaches into four exclusive categories based on their proposed solution strategies. The four categories include prefix based, shapelet based, model based, and miscellaneous approaches. The authors also discuss the applications of early classification in many areas including industrial monitoring, intelligent transportation, and medical. Finally, we provide a quick summary of the current literature with future research directions. △ Less

Submitted 15 October, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: 15 pages, 6 figures, 6 tables

Journal ref: IEEE Transactions on Artificial Intelligence (2020)

arXiv:2004.13828 [pdf, other]

DeepSubQE: Quality estimation for subtitle translations

Authors: Prabhakar Gupta, Anil Nelakanti

Abstract: Quality estimation (QE) for tasks involving language data is hard owing to numerous aspects of natural language like variations in paraphrasing, style, grammar, etc. There can be multiple answers with varying levels of acceptability depending on the application at hand. In this work, we look at estimating quality of translations for video subtitles. We show how existing QE methods are inadequate a… ▽ More Quality estimation (QE) for tasks involving language data is hard owing to numerous aspects of natural language like variations in paraphrasing, style, grammar, etc. There can be multiple answers with varying levels of acceptability depending on the application at hand. In this work, we look at estimating quality of translations for video subtitles. We show how existing QE methods are inadequate and propose our method DeepSubQE as a system to estimate quality of translation given subtitles data for a pair of languages. We rely on various data augmentation strategies for automated labelling and synthesis for training. We create a hybrid network which learns semantic and syntactic features of bilingual data and compare it with only-LSTM and only-CNN networks. Our proposed network outperforms them by significant margin. △ Less

Submitted 22 April, 2020; originally announced April 2020.

arXiv:2003.10662 [pdf, other]

doi 10.1109/LCSYS.2022.3230085

Towards Safer Self-Driving Through Great PAIN (Physically Adversarial Intelligent Networks)

Authors: Piyush Gupta, Demetris Coleman, Joshua E. Siegel

Abstract: Automated vehicles' neural networks suffer from overfit, poor generalizability, and untrained edge cases due to limited data availability. Researchers synthesize randomized edge-case scenarios to assist in the training process, though simulation introduces potential for overfit to latent rules and features. Automating worst-case scenario generation could yield informative data for improving self d… ▽ More Automated vehicles' neural networks suffer from overfit, poor generalizability, and untrained edge cases due to limited data availability. Researchers synthesize randomized edge-case scenarios to assist in the training process, though simulation introduces potential for overfit to latent rules and features. Automating worst-case scenario generation could yield informative data for improving self driving. To this end, we introduce a "Physically Adversarial Intelligent Network" (PAIN), wherein self-driving vehicles interact aggressively in the CARLA simulation environment. We train two agents, a protagonist and an adversary, using dueling double deep Q networks (DDDQNs) with prioritized experience replay. The coupled networks alternately seek-to-collide and to avoid collisions such that the "defensive" avoidance algorithm increases the mean-time-to-failure and distance traveled under non-hostile operating conditions. The trained protagonist becomes more resilient to environmental uncertainty and less prone to corner case failures resulting in collisions than the agent trained without an adversary. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:2001.05166 [pdf, other]

ShapeVis: High-dimensional Data Visualization at Scale

Authors: Nupur Kumari, Siddarth R., Akash Rupela, Piyush Gupta, Balaji Krishnamurthy

Abstract: We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation. Much success has been reported by the data visualization technique Mapper, that discreetly approximates the Reeb graph of a filter function on the data. Howe… ▽ More We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation. Much success has been reported by the data visualization technique Mapper, that discreetly approximates the Reeb graph of a filter function on the data. However, when using standard dimensionality reduction algorithms as the filter function, Mapper suffers from considerable computational cost. This makes it difficult to scale to high-dimensional data. Our proposed technique relies on finding a subset of points called landmarks along the data manifold to construct a weighted witness-graph over it. This graph captures the structural characteristics of the point cloud, and its weights are determined using a Finite Markov Chain. We further compress this graph by applying induced maps from standard community detection algorithms. Using techniques borrowed from manifold tearing, we prune and reinstate edges in the induced graph based on their modularity to summarize the shape of data. We empirically demonstrate how our technique captures the structural characteristics of real and synthetic data sets. Further, we compare our approach with Mapper using various filter functions like t-SNE, UMAP, LargeVis and show that our algorithm scales to millions of data points while preserving the quality of data visualization. △ Less

Submitted 21 January, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: Accepted at WWW 2020

arXiv:1910.11242 [pdf, other]

A context sensitive real-time Spell Checker with language adaptability

Authors: Prabhakar Gupta

Abstract: We present a novel language adaptable spell checking system which detects spelling errors and suggests context sensitive corrections in real-time. We show that our system can be extended to new languages with minimal language-specific processing. Available literature majorly discusses spell checkers for English but there are no publicly available systems which can be extended to work for other lan… ▽ More We present a novel language adaptable spell checking system which detects spelling errors and suggests context sensitive corrections in real-time. We show that our system can be extended to new languages with minimal language-specific processing. Available literature majorly discusses spell checkers for English but there are no publicly available systems which can be extended to work for other languages out of the box. Most of the systems do not work in real-time. We explain the process of generating a language's word dictionary and n-gram probability dictionaries using Wikipedia-articles data and manually curated video subtitles. We present the results of generating a list of suggestions for a misspelled word. We also propose three approaches to create noisy channel datasets of real-world typographic errors. We compare our system with industry-accepted spell checker tools for 11 languages. Finally, we show the performance of our system on synthetic datasets for 24 languages. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: 7 pages, 6 images

arXiv:1910.00314 [pdf, other]

BioNLP-OST 2019 RDoC Tasks: Multi-grain Neural Relevance Ranking Using Topics and Attention Based Query-Document-Sentence Interactions

Authors: Yatin Chaudhary, Pankaj Gupta, Hinrich Schütze

Abstract: This paper presents our system details and results of participation in the RDoC Tasks of BioNLP-OST 2019. Research Domain Criteria (RDoC) construct is a multi-dimensional and broad framework to describe mental health disorders by combining knowledge from genomics to behaviour. Non-availability of RDoC labelled dataset and tedious labelling process hinders the use of RDoC framework to reach its ful… ▽ More This paper presents our system details and results of participation in the RDoC Tasks of BioNLP-OST 2019. Research Domain Criteria (RDoC) construct is a multi-dimensional and broad framework to describe mental health disorders by combining knowledge from genomics to behaviour. Non-availability of RDoC labelled dataset and tedious labelling process hinders the use of RDoC framework to reach its full potential in Biomedical research community and Healthcare industry. Therefore, Task-1 aims at retrieval and ranking of PubMed abstracts relevant to a given RDoC construct and Task-2 aims at extraction of the most relevant sentence from a given PubMed abstract. We investigate (1) attention based supervised neural topic model and SVM for retrieval and ranking of PubMed abstracts and, further utilize BM25 and other relevance measures for re-ranking, (2) supervised and unsupervised sentence ranking models utilizing multi-view representations comprising of query-aware attention-based sentence representation (QAR), bag-of-words (BoW) and TF-IDF. Our best systems achieved 1st rank and scored 0.86 mean average precision (mAP) and 0.58 macro average accuracy (MAA) in Task-1 and Task-2 respectively. △ Less

Submitted 2 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: EMNLP2019, 10 pages, 2 figures, 7 tables

arXiv:1909.05362 [pdf, other]

Problems with automating translation of movie/TV show subtitles

Authors: Prabhakar Gupta, Mayank Sharma, Kartik Pitale, Keshav Kumar

Abstract: We present 27 problems encountered in automating the translation of movie/TV show subtitles. We categorize each problem in one of the three categories viz. problems directly related to textual translation, problems related to subtitle creation guidelines, and problems due to adaptability of machine translation (MT) engines. We also present the findings of a translation quality evaluation experimen… ▽ More We present 27 problems encountered in automating the translation of movie/TV show subtitles. We categorize each problem in one of the three categories viz. problems directly related to textual translation, problems related to subtitle creation guidelines, and problems due to adaptability of machine translation (MT) engines. We also present the findings of a translation quality evaluation experiment where we share the frequency of 16 key problems. We show that the systems working at the frontiers of Natural Language Processing do not perform well for subtitles and require some post-processing solutions for redressal of these problems △ Less

Submitted 4 September, 2019; originally announced September 2019.

arXiv:1909.00659 [pdf, other]

Guided Random Forest and its application to data approximation

Authors: Prashant Gupta, Aashi **dal, Jayadeva, Debarka Sengupta

Abstract: We present a new way of constructing an ensemble classifier, named the Guided Random Forest (GRAF) in the sequel. GRAF extends the idea of building oblique decision trees with localized partitioning to obtain a global partitioning. We show that global partitioning bridges the gap between decision trees and boosting algorithms. We empirically demonstrate that global partitioning reduces the general… ▽ More We present a new way of constructing an ensemble classifier, named the Guided Random Forest (GRAF) in the sequel. GRAF extends the idea of building oblique decision trees with localized partitioning to obtain a global partitioning. We show that global partitioning bridges the gap between decision trees and boosting algorithms. We empirically demonstrate that global partitioning reduces the generalization error bound. Results on 115 benchmark datasets show that GRAF yields comparable or better results on a majority of datasets. We also present a new way of approximating the datasets in the framework of random forests. △ Less

Submitted 2 September, 2019; originally announced September 2019.

arXiv:1907.13257 [pdf, other]

doi 10.1109/MM.2019.2935967

Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Authors: Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar, Yaosheng Fu, Victor Zhang, Szymon Migacz, David Nellans, Puneet Gupta

Abstract: Deploying deep learning (DL) models across multiple compute devices to train large and complex models continues to grow in importance because of the demand for faster and more frequent training. Data parallelism (DP) is the most widely used parallelization strategy, but as the number of devices in data parallel training grows, so does the communication overhead between devices. Additionally, a lar… ▽ More Deploying deep learning (DL) models across multiple compute devices to train large and complex models continues to grow in importance because of the demand for faster and more frequent training. Data parallelism (DP) is the most widely used parallelization strategy, but as the number of devices in data parallel training grows, so does the communication overhead between devices. Additionally, a larger aggregate batch size per step leads to statistical efficiency loss, i.e., a larger number of epochs are required to converge to a desired accuracy. These factors affect overall training time and beyond a certain number of devices, the speedup from leveraging DP begins to scale poorly. In addition to DP, each training step can be accelerated by exploiting model parallelism (MP). This work explores hybrid parallelization, where each data parallel worker is comprised of more than one device, across which the model dataflow graph (DFG) is split using MP. We show that at scale, hybrid training will be more effective at minimizing end-to-end training time than exploiting DP alone. We project that for Inception-V3, GNMT, and BigLSTM, the hybrid strategy provides an end-to-end training speedup of at least 26.5%, 8%, and 22% respectively compared to what DP alone can achieve at scale. △ Less

Submitted 30 July, 2019; originally announced July 2019.

arXiv:1907.08259 [pdf, ps, other]

WriterForcing: Generating more interesting story endings

Authors: Prakhar Gupta, Vinayshekhar Bannihatti Kumar, Mukul Bhutani, Alan W Black

Abstract: We study the problem of generating interesting endings for stories. Neural generative models have shown promising results for various text generation problems. Sequence to Sequence (Seq2Seq) models are typically trained to generate a single output sequence for a given input sequence. However, in the context of a story, multiple endings are possible. Seq2Seq models tend to ignore the context and ge… ▽ More We study the problem of generating interesting endings for stories. Neural generative models have shown promising results for various text generation problems. Sequence to Sequence (Seq2Seq) models are typically trained to generate a single output sequence for a given input sequence. However, in the context of a story, multiple endings are possible. Seq2Seq models tend to ignore the context and generate generic and dull responses. Very few works have studied generating diverse and interesting story endings for a given story context. In this paper, we propose models which generate more diverse and interesting outputs by 1) training models to focus attention on important keyphrases of the story, and 2) promoting generation of non-generic words. We show that the combination of the two leads to more diverse and interesting endings. △ Less

Submitted 18 July, 2019; originally announced July 2019.

Comments: Accepted in ACL workshop on Storytelling 2019

arXiv:1907.01643 [pdf, other]

Pentagon at MEDIQA 2019: Multi-task Learning for Filtering and Re-ranking Answers using Language Inference and Question Entailment

Authors: Hemant Pugaliya, Karan Saxena, Shefali Garg, Sheetal Shalini, Prashant Gupta, Eric Nyberg, Teruko Mitamura

Abstract: Parallel deep learning architectures like fine-tuned BERT and MT-DNN, have quickly become the state of the art, bypassing previous deep and shallow learning methods by a large margin. More recently, pre-trained models from large related datasets have been able to perform well on many downstream tasks by just fine-tuning on domain-specific datasets . However, using powerful models on non-trivial ta… ▽ More Parallel deep learning architectures like fine-tuned BERT and MT-DNN, have quickly become the state of the art, bypassing previous deep and shallow learning methods by a large margin. More recently, pre-trained models from large related datasets have been able to perform well on many downstream tasks by just fine-tuning on domain-specific datasets . However, using powerful models on non-trivial tasks, such as ranking and large document classification, still remains a challenge due to input size limitations of parallel architecture and extremely small datasets (insufficient for fine-tuning). In this work, we introduce an end-to-end system, trained in a multi-task setting, to filter and re-rank answers in the medical domain. We use task-specific pre-trained models as deep feature extractors. Our model achieves the highest Spearman's Rho and Mean Reciprocal Rank of 0.338 and 0.9622 respectively, on the ACL-BioNLP workshop MediQA Question Answering shared-task. △ Less

Submitted 1 July, 2019; originally announced July 2019.

arXiv:1904.00655 [pdf, other]

Transfer Learning for Clinical Time Series Analysis using Deep Neural Networks

Authors: Priyanka Gupta, Pankaj Malhotra, Jyoti Narwariya, Lovekesh Vig, Gautam Shroff

Abstract: Deep neural networks have shown promising results for various clinical prediction tasks. However, training deep networks such as those based on Recurrent Neural Networks (RNNs) requires large labeled data, significant hyper-parameter tuning effort and expertise, and high computational resources. In this work, we investigate as to what extent can transfer learning address these issues when using de… ▽ More Deep neural networks have shown promising results for various clinical prediction tasks. However, training deep networks such as those based on Recurrent Neural Networks (RNNs) requires large labeled data, significant hyper-parameter tuning effort and expertise, and high computational resources. In this work, we investigate as to what extent can transfer learning address these issues when using deep RNNs to model multivariate clinical time series. We consider two scenarios for transfer learning using RNNs: i) domain-adaptation, i.e., leveraging a deep RNN - namely, TimeNet - pre-trained for feature extraction on time series from diverse domains, and adapting it for feature extraction and subsequent target tasks in healthcare domain, ii) task-adaptation, i.e., pre-training a deep RNN - namely, HealthNet - on diverse tasks in healthcare domain, and adapting it to new target tasks in the same domain. We evaluate the above approaches on publicly available MIMIC-III benchmark dataset, and demonstrate that (a) computationally-efficient linear models trained using features extracted via pre-trained RNNs outperform or, in the worst case, perform as well as deep RNNs and statistical hand-crafted features based models trained specifically for target task; (b) models obtained by adapting pre-trained models for target tasks are significantly more robust to the size of labeled data compared to task-specific RNNs, while also being computationally efficient. We, therefore, conclude that pre-trained deep models like TimeNet and HealthNet allow leveraging the advantages of deep learning for clinical time series analysis tasks, while also minimize dependence on hand-crafted features, deal robustly with scarce labeled training data scenarios without overfitting, as well as reduce dependence on expertise and resources required to train deep networks from scratch. △ Less

Submitted 4 March, 2021; v1 submitted 1 April, 2019; originally announced April 2019.

Comments: Updated version of this work appeared in Journal of Healthcare Informatics Research, Vol. 4, 2020. arXiv admin note: text overlap with arXiv:1807.01705

arXiv:1901.10860 [pdf, other]

doi 10.1016/j.ijar.2021.10.002

Learning Context-Dependent Choice Functions

Authors: Karlson Pfannschmidt, Pritha Gupta, Björn Haddenhorst, Eyke Hüllermeier

Abstract: Choice functions accept a set of alternatives as input and produce a preferred subset of these alternatives as output. We study the problem of learning such functions under conditions of context-dependence of preferences, which means that the preference in favor of a certain choice alternative may depend on what other options are also available. In spite of its practical relevance, this kind of co… ▽ More Choice functions accept a set of alternatives as input and produce a preferred subset of these alternatives as output. We study the problem of learning such functions under conditions of context-dependence of preferences, which means that the preference in favor of a certain choice alternative may depend on what other options are also available. In spite of its practical relevance, this kind of context-dependence has received little attention in preference learning so far. We propose a suitable model based on context-dependent (latent) utility functions, thereby reducing the problem to the task of learning such utility functions. Practically, this comes with a number of challenges. For example, the set of alternatives provided as input to a choice function can be of any size, and the output of the function should not depend on the order in which the alternatives are presented. To meet these requirements, we propose two general approaches based on two representations of context-dependent utility functions, as well as instantiations in the form of appropriate end-to-end trainable neural network architectures. Moreover, to demonstrate the performance of both networks, we present extensive empirical evaluations on both synthetic and real-world datasets. △ Less

Submitted 20 October, 2021; v1 submitted 29 January, 2019; originally announced January 2019.

Comments: 45 pages, 21 figures

Journal ref: International Journal of Approximate Reasoning 140 (2022) 116-155

arXiv:1901.06152 [pdf, ps, other]

Protein Classification using Machine Learning and Statistical Techniques: A Comparative Analysis

Authors: Chhote Lal Prasad Gupta, Anand Bihari, Sudhakar Tripathi

Abstract: In recent era prediction of enzyme class from an unknown protein is one of the challenging tasks in bioinformatics. Day to day the number of proteins is increases as result the prediction of enzyme class gives a new opportunity to bioinformatics scholars. The prime objective of this article is to implement the machine learning classification technique for feature selection and predictions also fin… ▽ More In recent era prediction of enzyme class from an unknown protein is one of the challenging tasks in bioinformatics. Day to day the number of proteins is increases as result the prediction of enzyme class gives a new opportunity to bioinformatics scholars. The prime objective of this article is to implement the machine learning classification technique for feature selection and predictions also find out an appropriate classification technique for function prediction. In this article the seven different classification technique like CRT, QUEST, CHAID, C5.0, ANN (Artificial Neural Network), SVM and Bayesian has been implemented on 4368 protein data that has been extracted from UniprotKB databank and categories into six different class. The proteins data is high dimensional sequence data and contain a maximum of 48 features.To manipulate the high dimensional sequential protein data with different classification technique, the SPSS has been used as an experimental tool. Different classification techniques give different results for every model and shows that the data are imbalanced for class C4, C5 and C6. The imbalanced data affect the performance of model. In these three classes the precision and recall value is very less or negligible. The experimental results highlight that the C5.0 classification technique accuracy is more suited for protein feature classification and predictions. The C5.0 classification technique gives 95.56% accuracy and also gives high precision and recall value. Finally, we conclude that the features that is selected can be used for function prediction. △ Less

Submitted 18 January, 2019; originally announced January 2019.

arXiv:1811.00911 [pdf, other]

Online Diverse Learning to Rank from Partial-Click Feedback

Authors: Prakhar Gupta, Gaurush Hiranandani, Harvineet Singh, Branislav Kveton, Zheng Wen, Iftikhar Ahamath Burhanuddin

Abstract: Learning to rank is an important problem in machine learning and recommender systems. In a recommender system, a user is typically recommended a list of items. Since the user is unlikely to examine the entire recommended list, partial feedback arises naturally. At the same time, diverse recommendations are important because it is challenging to model all tastes of the user in practice. In this pap… ▽ More Learning to rank is an important problem in machine learning and recommender systems. In a recommender system, a user is typically recommended a list of items. Since the user is unlikely to examine the entire recommended list, partial feedback arises naturally. At the same time, diverse recommendations are important because it is challenging to model all tastes of the user in practice. In this paper, we propose the first algorithm for online learning to rank diverse items from partial-click feedback. We assume that the user examines the list of recommended items until the user is attracted by an item, which is clicked, and does not examine the rest of the items. This model of user behavior is known as the cascade model. We propose an online learning algorithm, cascadelsb, for solving our problem. The algorithm actively explores the tastes of the user with the objective of learning to recommend the optimal diverse list. We analyze the algorithm and prove a gap-free upper bound on its n-step regret. We evaluate cascadelsb on both synthetic and real-world datasets, compare it to various baselines, and show that it learns even when our modeling assumptions do not hold exactly. △ Less

Submitted 21 November, 2018; v1 submitted 31 October, 2018; originally announced November 2018.

Comments: The first three authors contributed equally to this work. 24 pages, 4 figures, 1 table

arXiv:1807.01705 [pdf, other]

Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks

Authors: Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, Gautam Shroff

Abstract: Deep neural networks have shown promising results for various clinical prediction tasks such as diagnosis, mortality prediction, predicting duration of stay in hospital, etc. However, training deep networks -- such as those based on Recurrent Neural Networks (RNNs) -- requires large labeled data, high computational resources, and significant hyperparameter tuning effort. In this work, we investiga… ▽ More Deep neural networks have shown promising results for various clinical prediction tasks such as diagnosis, mortality prediction, predicting duration of stay in hospital, etc. However, training deep networks -- such as those based on Recurrent Neural Networks (RNNs) -- requires large labeled data, high computational resources, and significant hyperparameter tuning effort. In this work, we investigate as to what extent can transfer learning address these issues when using deep RNNs to model multivariate clinical time series. We consider transferring the knowledge captured in an RNN trained on several source tasks simultaneously using a large labeled dataset to build the model for a target task with limited labeled data. An RNN pre-trained on several tasks provides generic features, which are then used to build simpler linear models for new target tasks without training task-specific RNNs. For evaluation, we train a deep RNN to identify several patient phenotypes on time series from MIMIC-III database, and then use the features extracted using that RNN to build classifiers for identifying previously unseen phenotypes, and also for a seemingly unrelated task of in-hospital mortality. We demonstrate that (i) models trained on features extracted using pre-trained RNN outperform or, in the worst case, perform as well as task-specific RNNs; (ii) the models using features from pre-trained models are more robust to the size of labeled data than task-specific RNNs; and (iii) features extracted using pre-trained RNN are generic enough and perform better than typical statistical hand-crafted features. △ Less

Submitted 4 July, 2018; originally announced July 2018.

Comments: Accepted at Machine Learning for Medicine and Healthcare Workshop at ACM KDD 2018 Conference

arXiv:1803.05796 [pdf, other]

Deep Architectures for Learning Context-dependent Ranking Functions

Authors: Karlson Pfannschmidt, Pritha Gupta, Eyke Hüllermeier

Abstract: Object ranking is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects, which are typically represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. Current approaches commonly focus on ranking by scoring, i.e., on learning an underlying latent utili… ▽ More Object ranking is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects, which are typically represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. Current approaches commonly focus on ranking by scoring, i.e., on learning an underlying latent utility function that seeks to capture the inherent utility of each object. These approaches, however, are not able to take possible effects of context-dependence into account, where context-dependence means that the utility or usefulness of an object may also depend on what other objects are available as alternatives. In this paper, we formalize the problem of context-dependent ranking and present two general approaches based on two natural representations of context-dependent ranking functions. Both approaches are instantiated by means of appropriate neural network architectures, which are evaluated on suitable benchmark task. △ Less

Submitted 6 December, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

arXiv:1711.05923

Enhanced Array Aperture using Higher Order Statistics for DoA Estimation

Authors: Payal Gupta, Monika Agrawal

Abstract: Recently, the higher order statistics (HOS) and sparsity based array are most talked about techniques to estimate the Direction of Arrival (DoA). They not only provide enhanced Degree of Freedom (DoF) to handle underdetermined cases but also improve the estimation accuracy of the system. To achieve high accuracy and more number of DoF with limited number of sensors, here we have proposed a method… ▽ More Recently, the higher order statistics (HOS) and sparsity based array are most talked about techniques to estimate the Direction of Arrival (DoA). They not only provide enhanced Degree of Freedom (DoF) to handle underdetermined cases but also improve the estimation accuracy of the system. To achieve high accuracy and more number of DoF with limited number of sensors, here we have proposed a method based on the fourth order statistics. The aperture of virtual array becomes O(16N^4) using N physical sensors. Proposed method can be extended to the HOS which increases the DoF by many folds. Numeric simulation validates these claims that the proposed method increases the resolution capacity as well as maximize the DoF among all the earlier proposed method. △ Less

Submitted 19 April, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

Comments: I want to withdraw the paper because of I have noticed many drawbacks of the paper. I got the review about this "it is not correct technically"

arXiv:1605.07913 [pdf, ps, other]

Solution of linear ill-posed problems using random dictionaries

Authors: Pawan Gupta, Marianna Pensky

Abstract: In the present paper we consider application of overcomplete dictionaries to solution of general ill-posed linear inverse problems. In the context of regression problems, there has been enormous amount of effort to recover an unknown function using such dictionaries. One of the most popular methods, lasso and its versions, is based on minimizing empirical likelihood and unfortunately, requires str… ▽ More In the present paper we consider application of overcomplete dictionaries to solution of general ill-posed linear inverse problems. In the context of regression problems, there has been enormous amount of effort to recover an unknown function using such dictionaries. One of the most popular methods, lasso and its versions, is based on minimizing empirical likelihood and unfortunately, requires stringent assumptions on the dictionary, the, so called, compatibility conditions. Though compatibility conditions are hard to satisfy, it is well known that this can be accomplished by using random dictionaries. In the present paper, we show how one can apply random dictionaries to solution of ill-posed linear inverse problems. We put a theoretical foundation under the suggested methodology and study its performance via simulations. △ Less

Submitted 19 June, 2017; v1 submitted 25 May, 2016; originally announced May 2016.

MSC Class: 62G05 (Primary); 62C10 (Secondary)

arXiv:1402.3070 [pdf, other]

Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities

Authors: Parth Gupta, Rafael E. Banchs, Paolo Rosso

Abstract: We present a comprehensive study on the use of autoencoders for modelling text data, in which (differently from previous studies) we focus our attention on the following issues: i) we explore the suitability of two different models bDA and rsDA for constructing deep autoencoders for text data at the sentence level; ii) we propose and evaluate two novel metrics for better assessing the text-reconst… ▽ More We present a comprehensive study on the use of autoencoders for modelling text data, in which (differently from previous studies) we focus our attention on the following issues: i) we explore the suitability of two different models bDA and rsDA for constructing deep autoencoders for text data at the sentence level; ii) we propose and evaluate two novel metrics for better assessing the text-reconstruction capabilities of autoencoders; and iii) we propose an automatic method to find the critical bottleneck dimensionality for text language representations (below which structural information is lost). △ Less

Submitted 13 February, 2014; originally announced February 2014.

arXiv:1009.5785 [pdf, ps, other]

doi 10.1214/09-AOAS258

Profiling time course expression of virus genes---an illustration of Bayesian inference under shape restrictions

Authors: Li-Chu Chien, I-Shou Chang, Shih Sheng Jiang, Pramod K. Gupta, Chi-Chung Wen, Yuh-Jenn Wu, Chao A. Hsiung

Abstract: There have been several studies of the genome-wide temporal transcriptional program of viruses, based on microarray experiments, which are generally useful in the construction of gene regulation network. It seems that biological interpretations in these studies are directly based on the normalized data and some crude statistics, which provide rough estimates of limited features of the profile and… ▽ More There have been several studies of the genome-wide temporal transcriptional program of viruses, based on microarray experiments, which are generally useful in the construction of gene regulation network. It seems that biological interpretations in these studies are directly based on the normalized data and some crude statistics, which provide rough estimates of limited features of the profile and may incur biases. This paper introduces a hierarchical Bayesian shape restricted regression method for making inference on the time course expression of virus genes. Estimates of many salient features of the expression profile like onset time, inflection point, maximum value, time to maximum value, area under curve, etc. can be obtained immediately by this method. Applying this method to a baculovirus microarray time course expression data set, we indicate that many biological questions can be formulated quantitatively and we are able to offer insights into the baculovirus biology. △ Less

Submitted 29 September, 2010; originally announced September 2010.

Comments: Published in at http://dx.doi.org/10.1214/09-AOAS258 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS258

Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 4, 1542-1565

Showing 1–37 of 37 results for author: Gupta, P