-
Towards Provable Log Density Policy Gradient
Authors:
Pulkit Katdare,
Anant Joshi,
Katherine Driggs-Campbell
Abstract:
Policy gradient methods are a vital ingredient behind the success of modern reinforcement learning. Modern policy gradient methods, although successful, introduce a residual error in gradient estimation. In this work, we argue that this residual term is significant and correcting for it could potentially improve sample-complexity of reinforcement learning methods. To that end, we propose log densi…
▽ More
Policy gradient methods are a vital ingredient behind the success of modern reinforcement learning. Modern policy gradient methods, although successful, introduce a residual error in gradient estimation. In this work, we argue that this residual term is significant and correcting for it could potentially improve sample-complexity of reinforcement learning methods. To that end, we propose log density gradient to estimate the policy gradient, which corrects for this residual error term. Log density gradient method computes policy gradient by utilising the state-action discounted distributional formulation. We first present the equations needed to exactly find the log density gradient for a tabular Markov Decision Processes (MDPs). For more complex environments, we propose a temporal difference (TD) method that approximates log density gradient by utilizing backward on-policy samples. Since backward sampling from a Markov chain is highly restrictive we also propose a min-max optimization that can approximate log density gradient using just on-policy samples. We also prove uniqueness, and convergence under linear function approximation, for this min-max optimization. Finally, we show that the sample complexity of our min-max optimization to be of the order of $m^{-1/2}$, where $m$ is the number of on-policy samples. We also demonstrate a proof-of-concept for our log density gradient method on gridworld environment, and observe that our method is able to improve upon the classical policy gradient method by a clear margin, thus indicating a promising novel direction to develop reinforcement learning algorithms that require fewer samples.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Investigating Speed Deviation Patterns During Glucose Episodes: A Quantile Regression Approach
Authors:
Aparna Joshi,
Jennifer Merickel,
Cyrus V. Desouza,
Matthew Rizzo,
Pujitha Gunaratne,
Anuj Sharma
Abstract:
Given the growing prevalence of diabetes, there has been significant interest in determining how diabetes affects instrumental daily functions, like driving. Complication of glucose control in diabetes includes hypoglycemic and hyperglycemic episodes, which may impair cognitive and psychomotor functions needed for safe driving. The goal of this paper was to determine patterns of diabetes speed beh…
▽ More
Given the growing prevalence of diabetes, there has been significant interest in determining how diabetes affects instrumental daily functions, like driving. Complication of glucose control in diabetes includes hypoglycemic and hyperglycemic episodes, which may impair cognitive and psychomotor functions needed for safe driving. The goal of this paper was to determine patterns of diabetes speed behavior during acute glucose to drivers with diabetes who were euglycemic or control drivers without diabetes in a naturalistic driving environment. By employing distribution-based analytic methods which capture distribution patterns, our study advances prior literature that has focused on conventional approach of average speed to explore speed deviation patterns.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
No-regret Algorithms for Fair Resource Allocation
Authors:
Abhishek Sinha,
Ativ Joshi,
Rajarshi Bhattacharjee,
Cameron Musco,
Mohammad Hajiesmaili
Abstract:
We consider a fair resource allocation problem in the no-regret setting against an unrestricted adversary. The objective is to allocate resources equitably among several agents in an online fashion so that the difference of the aggregate $α$-fair utilities of the agents between an optimal static clairvoyant allocation and that of the online policy grows sub-linearly with time. The problem is chall…
▽ More
We consider a fair resource allocation problem in the no-regret setting against an unrestricted adversary. The objective is to allocate resources equitably among several agents in an online fashion so that the difference of the aggregate $α$-fair utilities of the agents between an optimal static clairvoyant allocation and that of the online policy grows sub-linearly with time. The problem is challenging due to the non-additive nature of the $α$-fairness function. Previously, it was shown that no online policy can exist for this problem with a sublinear standard regret. In this paper, we propose an efficient online resource allocation policy, called Online Proportional Fair (OPF), that achieves $c_α$-approximate sublinear regret with the approximation factor $c_α=(1-α)^{-(1-α)}\leq 1.445,$ for $0\leq α< 1$. The upper bound to the $c_α$-regret for this problem exhibits a surprising phase transition phenomenon. The regret bound changes from a power-law to a constant at the critical exponent $α=\frac{1}{2}.$ As a corollary, our result also resolves an open problem raised by Even-Dar et al. [2009] on designing an efficient no-regret policy for the online job scheduling problem in certain parameter regimes. The proof of our results introduces new algorithmic and analytical techniques, including greedy estimation of the future gradients for non-additive global reward functions and bootstrap** adaptive regret bounds, which may be of independent interest.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
Democratizing Aviation Emissions Estimation: Development of an Open-Source, Data-Driven Methodology
Authors:
Andy Eskenazi,
Landon Butler,
Arnav Joshi,
Megan Ryerson
Abstract:
Through an aviation emissions estimation tool that is both publicly-accessible and comprehensive, researchers, planners, and community advocates can help shape a more sustainable and equitable U.S. air transportation system. To this end, we develop an open-source, data-driven methodology to calculate the system-wide emissions of the U.S. domestic civil aviation industry. This process utilizes and…
▽ More
Through an aviation emissions estimation tool that is both publicly-accessible and comprehensive, researchers, planners, and community advocates can help shape a more sustainable and equitable U.S. air transportation system. To this end, we develop an open-source, data-driven methodology to calculate the system-wide emissions of the U.S. domestic civil aviation industry. This process utilizes and integrates six different public datasets provided by the Bureau of Transportation Statistics (BTS), the Federal Aviation Agency (FAA), EUROCONTROL, and the International Civil Aviation Organization (ICAO). At the individual flight level, our approach examines the specific aircraft type, equipped engine, and time in stage of flight to produce a more granular estimate than competing approaches. Enabled by our methodology, we then calculate system-wide emissions, considering four different greenhouse gases (CO2, NOx, CO, HC) during the Landing, Take-off (LTO) and Climb, Cruise, and Descent (CCD) flight cycles. Our results elucidate that emissions on a particular route can vary significantly due to aircraft and engine choice, and that emission rates differ significantly from airline to airline. We also find that CO2 alone is not a sufficient proxy for emissions, as NOx, when converted to its CO2-equivalency, exceeds CO2 during both LTO and CCD.
△ Less
Submitted 5 May, 2022; v1 submitted 13 February, 2022;
originally announced February 2022.
-
Differentiable Spline Approximations
Authors:
Minsu Cho,
Aditya Balu,
Ameya Joshi,
Anjana Deva Prasad,
Biswajit Khara,
Soumik Sarkar,
Baskar Ganapathysubramanian,
Adarsh Krishnamurthy,
Chinmay Hegde
Abstract:
The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend grad…
▽ More
The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend gradient-based optimization to functions well modeled by splines, which encompass a large family of piecewise polynomial models. We derive the form of the (weak) Jacobian of such functions and show that it exhibits a block-sparse structure that can be computed implicitly and efficiently. Overall, we show that leveraging this redesigned Jacobian in the form of a differentiable "layer" in predictive models leads to improved performance in diverse applications such as image segmentation, 3D point cloud reconstruction, and finite element analysis.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Deep Quantile Regression for Uncertainty Estimation in Unsupervised and Supervised Lesion Detection
Authors:
Haleh Akrami,
Anand Joshi,
Sergul Aydore,
Richard Leahy
Abstract:
Despite impressive state-of-the-art performance on a wide variety of machine learning tasks, deep learning methods can produce over-confident predictions, particularly with limited training data. Therefore, quantifying uncertainty is particularly important in critical applications such as lesion detection and clinical diagnosis, where a realistic assessment of uncertainty is essential in determini…
▽ More
Despite impressive state-of-the-art performance on a wide variety of machine learning tasks, deep learning methods can produce over-confident predictions, particularly with limited training data. Therefore, quantifying uncertainty is particularly important in critical applications such as lesion detection and clinical diagnosis, where a realistic assessment of uncertainty is essential in determining surgical margins, disease status and appropriate treatment. In this work, we propose a novel approach that uses quantile regression for quantifying aleatoric uncertainty in both supervised and unsupervised lesion detection problems. The resulting confidence intervals can be used for lesion detection and segmentation. In the unsupervised setting, we combine quantile regression with the Variational AutoEncoder (VAE). Here we address the problem of quantifying uncertainty in the images that are reconstructed by the VAE as the basis for principled outlier or lesion detection. The VAE models the output as a conditionally independent Gaussian characterized by its mean and variance. Unfortunately, joint optimization of both mean and variance in the VAE leads to the well-known problem of shrinkage or underestimation of variance. Here we describe an alternative Quantile-Regression VAE (QR-VAE) that avoids this variance shrinkage problem by directly estimating conditional quantiles for the input image. Using the estimated quantiles, we compute the conditional mean and variance for the input image from which we then detect outliers by thresholding at a false-discovery-rate corrected p-value. In the supervised setting, we develop binary quantile regression (BQR) for the supervised lesion segmentation task. We show how BQR can be used to capture uncertainty in lesion boundaries in a manner that characterizes expert disagreement.
△ Less
Submitted 26 April, 2022; v1 submitted 20 September, 2021;
originally announced September 2021.
-
Manufacturing Process Optimization using Statistical Methodologies
Authors:
Karthik Srinivasan,
Amit Kumar,
Parameshwaran Iyer,
Abhinav Joshi
Abstract:
Response Surface Methodology (RSM) introduced in the paper (Box & Wilson, 1951) explores the relationships between explanatory and response variables in complex settings and provides a framework to identify correct settings for the explanatory variables to yield the desired response. RSM involves setting up sequential experimental designs followed by application of elementary optimization methods…
▽ More
Response Surface Methodology (RSM) introduced in the paper (Box & Wilson, 1951) explores the relationships between explanatory and response variables in complex settings and provides a framework to identify correct settings for the explanatory variables to yield the desired response. RSM involves setting up sequential experimental designs followed by application of elementary optimization methods to identify direction of improvement in response. In this paper, an application of RSM using a two-factor two-level Central Composite Design (CCD) is explained for a diesel engine nozzle manufacturing sub-process. The analysis shows that one of the factors has a significant influence in improving desired values of the response. The implementation of RSM is done using the DoE plug-in available in R software.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Addressing Variance Shrinkage in Variational Autoencoders using Quantile Regression
Authors:
Haleh Akrami,
Anand A. Joshi,
Sergul Aydore,
Richard M. Leahy
Abstract:
Estimation of uncertainty in deep learning models is of vital importance, especially in medical imaging, where reliance on inference without taking into account uncertainty could lead to misdiagnosis. Recently, the probabilistic Variational AutoEncoder (VAE) has become a popular model for anomaly detection in applications such as lesion detection in medical images. The VAE is a generative graphica…
▽ More
Estimation of uncertainty in deep learning models is of vital importance, especially in medical imaging, where reliance on inference without taking into account uncertainty could lead to misdiagnosis. Recently, the probabilistic Variational AutoEncoder (VAE) has become a popular model for anomaly detection in applications such as lesion detection in medical images. The VAE is a generative graphical model that is used to learn the data distribution from samples and then generate new samples from this distribution. By training on normal samples, the VAE can be used to detect inputs that deviate from this learned distribution. The VAE models the output as a conditionally independent Gaussian characterized by means and variances for each output dimension. VAEs can therefore use reconstruction probability instead of reconstruction error for anomaly detection. Unfortunately, joint optimization of both mean and variance in the VAE leads to the well-known problem of shrinkage or underestimation of variance. We describe an alternative approach that avoids this variance shrinkage problem by using quantile regression. Using estimated quantiles to compute mean and variance under the Gaussian assumption, we compute reconstruction probability as a principled approach to outlier or anomaly detection. Results on simulated and Fashion MNIST data demonstrate the effectiveness of our approach. We also show how our approach can be used for principled heterogeneous thresholding for lesion detection in brain images.
△ Less
Submitted 18 October, 2020;
originally announced October 2020.
-
Adversarially Robust Learning via Entropic Regularization
Authors:
Gauri Jagatap,
Ameya Joshi,
Animesh Basak Chowdhury,
Siddharth Garg,
Chinmay Hegde
Abstract:
In this paper we propose a new family of algorithms, ATENT, for training adversarially robust deep neural networks. We formulate a new loss function that is equipped with an additional entropic regularization. Our loss function considers the contribution of adversarial samples that are drawn from a specially designed distribution in the data space that assigns high probability to points with high…
▽ More
In this paper we propose a new family of algorithms, ATENT, for training adversarially robust deep neural networks. We formulate a new loss function that is equipped with an additional entropic regularization. Our loss function considers the contribution of adversarial samples that are drawn from a specially designed distribution in the data space that assigns high probability to points with high loss and in the immediate neighborhood of training samples. Our proposed algorithms optimize this loss to seek adversarially robust valleys of the loss landscape. Our approach achieves competitive (or better) performance in terms of robust classification accuracy as compared to several state-of-the-art robust learning approaches on benchmark datasets such as MNIST and CIFAR-10.
△ Less
Submitted 19 February, 2021; v1 submitted 27 August, 2020;
originally announced August 2020.
-
Deep Generative Models that Solve PDEs: Distributed Computing for Training Large Data-Free Models
Authors:
Sergio Botelho,
Ameya Joshi,
Biswajit Khara,
Soumik Sarkar,
Chinmay Hegde,
Santi Adavani,
Baskar Ganapathysubramanian
Abstract:
Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. Howeve…
▽ More
Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. However, practical adoption of these approaches is limited by the difficulty in training these models, especially to make predictions at large output resolutions ($\geq 1024 \times 1024$). Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models - training in reasonable time as well as distributing the storage requirements. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods. We show excellent scalability of this framework on both cloud as well as HPC clusters, and report on the interplay between bandwidth, network topology and bare metal vs cloud. We deploy this approach to train generative models of sizes hitherto not possible, showing that neural PDE solvers can be viably trained for practical applications. We also demonstrate that distributed higher-order optimization methods are $2-3\times$ faster than stochastic gradient-based methods and provide minimal convergence drift with higher batch-size.
△ Less
Submitted 24 July, 2020;
originally announced July 2020.
-
ESPN: Extremely Sparse Pruned Networks
Authors:
Minsu Cho,
Ameya Joshi,
Chinmay Hegde
Abstract:
Deep neural networks are often highly overparameterized, prohibiting their use in compute-limited systems. However, a line of recent works has shown that the size of deep networks can be considerably reduced by identifying a subset of neuron indicators (or mask) that correspond to significant weights prior to training. We demonstrate that an simple iterative mask discovery method can achieve state…
▽ More
Deep neural networks are often highly overparameterized, prohibiting their use in compute-limited systems. However, a line of recent works has shown that the size of deep networks can be considerably reduced by identifying a subset of neuron indicators (or mask) that correspond to significant weights prior to training. We demonstrate that an simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks. Our algorithm represents a hybrid approach between single shot network pruning methods (such as SNIP) with Lottery-Ticket type approaches. We validate our approach on several datasets and outperform several existing pruning approaches in both test accuracy and compression ratio.
△ Less
Submitted 28 June, 2020;
originally announced June 2020.
-
Robust Variational Autoencoder for Tabular Data with Beta Divergence
Authors:
Haleh Akrami,
Sergul Aydore,
Richard M. Leahy,
Anand A. Joshi
Abstract:
We propose a robust variational autoencoder with $β$ divergence for tabular data (RTVAE) with mixed categorical and continuous features. Variational autoencoders (VAE) and their variations are popular frameworks for anomaly detection problems. The primary assumption is that we can learn representations for normal patterns via VAEs and any deviation from that can indicate anomalies. However, the tr…
▽ More
We propose a robust variational autoencoder with $β$ divergence for tabular data (RTVAE) with mixed categorical and continuous features. Variational autoencoders (VAE) and their variations are popular frameworks for anomaly detection problems. The primary assumption is that we can learn representations for normal patterns via VAEs and any deviation from that can indicate anomalies. However, the training data itself can contain outliers. The source of outliers in training data include the data collection process itself (random noise) or a malicious attacker (data poisoning) who may target to degrade the performance of the machine learning model. In either case, these outliers can disproportionately affect the training process of VAEs and may lead to wrong conclusions about what the normal behavior is. In this work, we derive a novel form of a variational autoencoder for tabular data sets with categorical and continuous features that is robust to outliers in training data. Our results on the anomaly detection application for network traffic datasets demonstrate the effectiveness of our approach.
△ Less
Submitted 15 June, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Independent Component Analysis for Trustworthy Cyberspace during High Impact Events: An Application to Covid-19
Authors:
Zois Boukouvalas,
Christine Mallinson,
Evan Crothers,
Nathalie Japkowicz,
Aritran Piplai,
Sudip Mittal,
Anupam Joshi,
Tülay Adalı
Abstract:
Social media has become an important communication channel during high impact events, such as the COVID-19 pandemic. As misinformation in social media can rapidly spread, creating social unrest, curtailing the spread of misinformation during such events is a significant data challenge. While recent solutions that are based on machine learning have shown promise for the detection of misinformation,…
▽ More
Social media has become an important communication channel during high impact events, such as the COVID-19 pandemic. As misinformation in social media can rapidly spread, creating social unrest, curtailing the spread of misinformation during such events is a significant data challenge. While recent solutions that are based on machine learning have shown promise for the detection of misinformation, most widely used methods include approaches that rely on either handcrafted features that cannot be optimal for all scenarios, or those that are based on deep learning where the interpretation of the prediction results is not directly accessible. In this work, we propose a data-driven solution that is based on the ICA model, such that knowledge discovery and detection of misinformation are achieved jointly. To demonstrate the effectiveness of our method and compare its performance with deep learning methods, we developed a labeled COVID-19 Twitter dataset based on socio-linguistic criteria.
△ Less
Submitted 30 June, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Authors:
Pei Sun,
Henrik Kretzschmar,
Xerxes Dotiwalla,
Aurelien Chouard,
Vijaysai Patnaik,
Paul Tsui,
James Guo,
Yin Zhou,
Yuning Chai,
Benjamin Caine,
Vijay Vasudevan,
Wei Han,
Jiquan Ngiam,
Hang Zhao,
Aleksei Timofeev,
Scott Ettinger,
Maxim Krivokon,
Amy Gao,
Aditya Joshi,
Sheng Zhao,
Shuyang Cheng,
Yu Zhang,
Jonathon Shlens,
Zhifeng Chen,
Dragomir Anguelov
Abstract:
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help a…
▽ More
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.
△ Less
Submitted 12 May, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Zeroth Order Non-convex optimization with Dueling-Choice Bandits
Authors:
Yichong Xu,
Aparna Joshi,
Aarti Singh,
Artur Dubrawski
Abstract:
We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB…
▽ More
We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform a constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\fracΦ{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $Φ$ is an improved information gain corresponding to a comparison based constraint set that restricts the search space for the optimum. In contrast, in the direct query only setting, $Φ$ depends on the entire domain. Finally, we present experimental results to show the efficacy of our algorithm.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
CUDA optimized Neural Network predicts blood glucose control from quantified joint mobility and anthropometrics
Authors:
Sterling Ramroach,
Andrew Dhanoo,
Brian Cockburn,
Ajay Joshi
Abstract:
Neural network training entails heavy computation with obvious bottlenecks. The Compute Unified Device Architecture (CUDA) programming model allows us to accelerate computation by passing the processing workload from the CPU to the graphics processing unit (GPU). In this paper, we leveraged the power of Nvidia GPUs to parallelize all of the computation involved in training, to accelerate a backpro…
▽ More
Neural network training entails heavy computation with obvious bottlenecks. The Compute Unified Device Architecture (CUDA) programming model allows us to accelerate computation by passing the processing workload from the CPU to the graphics processing unit (GPU). In this paper, we leveraged the power of Nvidia GPUs to parallelize all of the computation involved in training, to accelerate a backpropagation feed-forward neural network with one hidden layer using CUDA and C++. This optimized neural network was tasked with predicting the level of glycated hemoglobin (HbA1c) from non-invasive markers. The rate of increase in the prevalence of Diabetes Mellitus has resulted in an urgent need for early detection and accurate diagnosis. However, due to the invasiveness and limitations of conventional tests, alternate means are being considered. Limited Joint Mobility (LJM) has been reported as an indicator for poor glycemic control. LJM of the fingers is quantified and its link to HbA1c is investigated along with other potential non-invasive markers of HbA1c. We collected readings of 33 potential markers from 120 participants at a clinic in south Trinidad. Our neural network achieved 95.65% accuracy on the training and 86.67% accuracy on the testing set for male participants and 97.73% and 66.67% accuracy on the training and testing sets for female participants. Using 960 CUDA cores from a Nvidia GeForce GTX 660, our parallelized neural network was trained 50 times faster on both subsets, than its corresponding CPU implementation on an Intel Core (TM) i7-3630QM 2.40 GHz CPU.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
The efficacy of various machine learning models for multi-class classification of RNA-seq expression data
Authors:
Sterling Ramroach,
Melford John,
Ajay Joshi
Abstract:
Late diagnosis and high costs are key factors that negatively impact the care of cancer patients worldwide. Although the availability of biological markers for the diagnosis of cancer type is increasing, costs and reliability of tests currently present a barrier to the adoption of their routine use. There is a pressing need for accurate methods that enable early diagnosis and cover a broad range o…
▽ More
Late diagnosis and high costs are key factors that negatively impact the care of cancer patients worldwide. Although the availability of biological markers for the diagnosis of cancer type is increasing, costs and reliability of tests currently present a barrier to the adoption of their routine use. There is a pressing need for accurate methods that enable early diagnosis and cover a broad range of cancers. The use of machine learning and RNA-seq expression analysis has shown promise in the classification of cancer type. However, research is inconclusive about which type of machine learning models are optimal. The suitability of five algorithms were assessed for the classification of 17 different cancer types. Each algorithm was fine-tuned and trained on the full array of 18,015 genes per sample, for 4,221 samples (75 % of the dataset). They were then tested with 1,408 samples (25 % of the dataset) for which cancer types were withheld to determine the accuracy of prediction. The results show that ensemble algorithms achieve 100% accuracy in the classification of 14 out of 17 types of cancer. The clustering and classification models, while faster than the ensembles, performed poorly due to the high level of noise in the dataset. When the features were reduced to a list of 20 genes, the ensemble algorithms maintained an accuracy above 95% as opposed to the clustering and classification models.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
Encoding Invariances in Deep Generative Models
Authors:
Viraj Shah,
Ameya Joshi,
Sambuddha Ghosal,
Balaji Pokuri,
Soumik Sarkar,
Baskar Ganapathysubramanian,
Chinmay Hegde
Abstract:
Reliable training of generative adversarial networks (GANs) typically require massive datasets in order to model complicated distributions. However, in several applications, training samples obey invariances that are \textit{a priori} known; for example, in complex physics simulations, the training data obey universal laws encoded as well-defined mathematical equations. In this paper, we propose a…
▽ More
Reliable training of generative adversarial networks (GANs) typically require massive datasets in order to model complicated distributions. However, in several applications, training samples obey invariances that are \textit{a priori} known; for example, in complex physics simulations, the training data obey universal laws encoded as well-defined mathematical equations. In this paper, we propose a new generative modeling approach, InvNet, that can efficiently model data spaces with known invariances. We devise an adversarial training algorithm to encode them into data distribution. We validate our framework in three experimental settings: generating images with fixed motifs; solving nonlinear partial differential equations (PDEs); and reconstructing two-phase microstructures with desired statistical properties. We complement our experiments with several theoretical results.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Robust Variational Autoencoder
Authors:
Haleh Akrami,
Anand A. Joshi,
Jian Li,
Sergul Aydore,
Richard M. Leahy
Abstract:
Machine learning methods often need a large amount of labeled training data. Since the training data is assumed to be the ground truth, outliers can severely degrade learned representations and performance of trained models. Here we apply concepts from robust statistics to derive a novel variational autoencoder that is robust to outliers in the training data. Variational autoencoders (VAEs) extrac…
▽ More
Machine learning methods often need a large amount of labeled training data. Since the training data is assumed to be the ground truth, outliers can severely degrade learned representations and performance of trained models. Here we apply concepts from robust statistics to derive a novel variational autoencoder that is robust to outliers in the training data. Variational autoencoders (VAEs) extract a lower-dimensional encoded feature representation from which we can generate new data samples. Robustness of autoencoders to outliers is critical for generating a reliable representation of particular data types in the encoded space when using corrupted training data. Our robust VAE is based on beta-divergence rather than the standard Kullback-Leibler (KL) divergence. Our proposed lower bound lead to a RVAE model that has the same computational complexity as the VAE and contains a single tuning parameter to control the degree of robustness. We demonstrate the performance of our $β$-divergence based autoencoder for a range of image datasets, showing improved robustness to outliers both qualitatively and quantitatively. We also illustrate the use of our robust VAE for outlier detection.
△ Less
Submitted 21 December, 2019; v1 submitted 23 May, 2019;
originally announced May 2019.
-
DeepIrisNet2: Learning Deep-IrisCodes from Scratch for Segmentation-Robust Visible Wavelength and Near Infrared Iris Recognition
Authors:
Abhishek Gangwar,
Akanksha Joshi,
Padmaja Joshi,
R. Raghavendra
Abstract:
We first, introduce a deep learning based framework named as DeepIrisNet2 for visible spectrum and NIR Iris representation. The framework can work without classical iris normalization step or very accurate iris segmentation; allowing to work under non-ideal situation. The framework contains spatial transformer layers to handle deformation and supervision branches after certain intermediate layers…
▽ More
We first, introduce a deep learning based framework named as DeepIrisNet2 for visible spectrum and NIR Iris representation. The framework can work without classical iris normalization step or very accurate iris segmentation; allowing to work under non-ideal situation. The framework contains spatial transformer layers to handle deformation and supervision branches after certain intermediate layers to mitigate overfitting. In addition, we present a dual CNN iris segmentation pipeline comprising of a iris/pupil bounding boxes detection network and a semantic pixel-wise segmentation network. Furthermore, to get compact templates, we present a strategy to generate binary iris codes using DeepIrisNet2. Since, no ground truth dataset are available for CNN training for iris segmentation, We build large scale hand labeled datasets and make them public; i) iris, pupil bounding boxes, ii) labeled iris texture. The networks are evaluated on challenging ND-IRIS-0405, UBIRIS.v2, MICHE-I, and CASIA v4 Interval datasets. Proposed approach significantly improves the state-of-the-art and achieve outstanding performance surpassing all previous methods.
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
Field of Groves: An Energy-Efficient Random Forest
Authors:
Zafar Takhirov,
Joseph Wang,
Marcia S. Louis,
Venkatesh Saligrama,
Ajay Joshi
Abstract:
Machine Learning (ML) algorithms, like Convolutional Neural Networks (CNN), Support Vector Machines (SVM), etc. have become widespread and can achieve high statistical performance. However their accuracy decreases significantly in energy-constrained mobile and embedded systems space, where all computations need to be completed under a tight energy budget. In this work, we present a field of groves…
▽ More
Machine Learning (ML) algorithms, like Convolutional Neural Networks (CNN), Support Vector Machines (SVM), etc. have become widespread and can achieve high statistical performance. However their accuracy decreases significantly in energy-constrained mobile and embedded systems space, where all computations need to be completed under a tight energy budget. In this work, we present a field of groves (FoG) implementation of random forests (RF) that achieves an accuracy comparable to CNNs and SVMs under tight energy budgets. Evaluation of the FoG shows that at comparable accuracy it consumes ~1.48x, ~24x, ~2.5x, and ~34.7x lower energy per classification compared to conventional RF, SVM_RBF , MLP, and CNN, respectively. FoG is ~6.5x less energy efficient than SVM_LR, but achieves 18% higher accuracy on average across all considered datasets.
△ Less
Submitted 10 April, 2017;
originally announced April 2017.