Search | arXiv e-print repository

Unraveling overoptimism and publication bias in ML-driven science

Authors: Pouria Saidi, Gautam Dasarathy, Visar Berisha

Abstract: Machine Learning (ML) is increasingly used across many disciplines with impressive reported results across many domain areas. However, recent studies suggest that the published performance of ML models are often overoptimistic. Validity concerns are underscored by findings of an inverse relationship between sample size and reported accuracy in published ML models, contrasting with the theory of le… ▽ More Machine Learning (ML) is increasingly used across many disciplines with impressive reported results across many domain areas. However, recent studies suggest that the published performance of ML models are often overoptimistic. Validity concerns are underscored by findings of an inverse relationship between sample size and reported accuracy in published ML models, contrasting with the theory of learning curves where accuracy should improve or remain stable with increasing sample size. This paper investigates factors contributing to overoptimistic accuracy reports in ML-driven science, focusing on data leakage and publication bias. We introduce a novel stochastic model for observed accuracy, integrating parametric learning curves and the aforementioned biases. We then construct an estimator that corrects for these biases in observed data. Theoretical and empirical results show that our framework can estimate the underlying learning curve, providing realistic performance assessments from published results. Applying the model to meta-analyses in ML-driven science, including neuroimaging-based and speech-based classifications of neurological conditions, we find prevalent overoptimism and estimate the inherent limits of ML-based prediction in each domain. △ Less

Submitted 10 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 31 pages, 7 figures, 6 tables

arXiv:2405.06253 [pdf, ps, other]

On Characterizations of Potential and Ordinal Potential Games

Authors: Sina Arefizadeh, Angelia Nedich, Gautam Dasarathy

Abstract: This paper investigates some necessary and sufficient conditions for a game to be a potential game. At first, we extend the classical results of Slade and Monderer and Shapley from games with one-dimensional action spaces to games with multi-dimensional action spaces, which require differentiable cost functions. Then, we provide a necessary and sufficient conditions for a game to have a potential… ▽ More This paper investigates some necessary and sufficient conditions for a game to be a potential game. At first, we extend the classical results of Slade and Monderer and Shapley from games with one-dimensional action spaces to games with multi-dimensional action spaces, which require differentiable cost functions. Then, we provide a necessary and sufficient conditions for a game to have a potential function by investigating the structure of a potential function in terms of the players' cost differences, as opposed to differentials. This condition provides a systematic way for construction of a potential function, which is applied to network congestion games, as an example. Finally, we provide some sufficient conditions for a game to be ordinal potential and generalized ordinal potential. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2301.12616 [pdf, other]

Active Sequential Two-Sample Testing

Authors: Weizhi Li, Prad Kadambi, Pouria Saidi, Karthikeyan Natesan Ramamurthy, Gautam Dasarathy, Visar Berisha

Abstract: A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample features) are inexpensive to access, but their group memberships (or labels) are costly. To address the problem, we devise the first \emph{active sequential two… ▽ More A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample features) are inexpensive to access, but their group memberships (or labels) are costly. To address the problem, we devise the first \emph{active sequential two-sample testing framework} that not only sequentially but also \emph{actively queries}. Our test statistic is a likelihood ratio where one likelihood is found by maximization over all class priors, and the other is provided by a probabilistic classification model. The classification model is adaptively updated and used to predict where the (unlabelled) features have a high dependency on labels; labeling the ``high-dependency'' features leads to the increased power of the proposed testing framework. In theory, we provide the proof that our framework produces an \emph{anytime-valid} $p$-value. In addition, we characterize the proposed framework's gain in testing power by analyzing the mutual information between the feature and label variables in asymptotic and finite-sample scenarios. In practice, we introduce an instantiation of our framework and evaluate it using several experiments; the experiments on the synthetic, MNIST, and application-specific datasets demonstrate that the testing power of the instantiated active sequential test significantly increases while the Type I error is under control. △ Less

Submitted 27 June, 2024; v1 submitted 29 January, 2023; originally announced January 2023.

arXiv:2211.05690 [pdf, other]

Robust Model Selection of Gaussian Graphical Models

Authors: Abrar Zahin, Rajasekhar Anguluri, Lalitha Sankar, Oliver Kosut, Gautam Dasarathy

Abstract: In Gaussian graphical model selection, noise-corrupted samples present significant challenges. It is known that even minimal amounts of noise can obscure the underlying structure, leading to fundamental identifiability issues. A recent line of work addressing this "robust model selection" problem narrows its focus to tree-structured graphical models. Even within this specific class of models, exac… ▽ More In Gaussian graphical model selection, noise-corrupted samples present significant challenges. It is known that even minimal amounts of noise can obscure the underlying structure, leading to fundamental identifiability issues. A recent line of work addressing this "robust model selection" problem narrows its focus to tree-structured graphical models. Even within this specific class of models, exact structure recovery is shown to be impossible. However, several algorithms have been developed that are known to provably recover the underlying tree-structure up to an (unavoidable) equivalence class. In this paper, we extend these results beyond tree-structured graphs. We first characterize the equivalence class up to which general graphs can be recovered in the presence of noise. Despite the inherent ambiguity (which we prove is unavoidable), the structure that can be recovered reveals local clustering information and global connectivity patterns in the underlying model. Such information is useful in a range of real-world problems, including power grids, social networks, protein-protein interactions, and neural structures. We then propose an algorithm which provably recovers the underlying graph up to the identified ambiguity. We further provide finite sample guarantees in the high-dimensional regime for our algorithm and validate our results through numerical simulations. △ Less

Submitted 7 May, 2024; v1 submitted 10 November, 2022; originally announced November 2022.

arXiv:2206.10569 [pdf, other]

Controllability of Coarsely Measured Networked Linear Dynamical Systems (Extended Version)

Authors: Nafiseh Ghoroghchian, Rajasekhar Anguluri, Gautam Dasarathy, Stark C. Draper

Abstract: We consider the controllability of large-scale linear networked dynamical systems when complete knowledge of network structure is unavailable and knowledge is limited to coarse summaries. We provide conditions under which average controllability of the fine-scale system can be well approximated by average controllability of the (synthesized, reduced-order) coarse-scale system. To this end, we requ… ▽ More We consider the controllability of large-scale linear networked dynamical systems when complete knowledge of network structure is unavailable and knowledge is limited to coarse summaries. We provide conditions under which average controllability of the fine-scale system can be well approximated by average controllability of the (synthesized, reduced-order) coarse-scale system. To this end, we require knowledge of some inherent parametric structure of the fine-scale network that makes this type of approximation possible. Therefore, we assume that the underlying fine-scale network is generated by the stochastic block model (SBM) -- often studied in community detection. We then provide an algorithm that directly estimates the average controllability of the fine-scale system using a coarse summary of SBM. Our analysis indicates the necessity of underlying structure (e.g., in-built communities) to be able to quantify accurately the controllability from coarsely characterized networked dynamics. We also compare our method to that of the reduced-order method and highlight the regimes where both can outperform each other. Finally, we provide simulations to confirm our theoretical results for different scalings of network size and density, and the parameter that captures how much community-structure is retained in the coarse summary. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.07083 [pdf, other]

Learning the Structure of Large Networked Systems Obeying Conservation Laws

Authors: Anirudh Rayas, Rajasekhar Anguluri, Gautam Dasarathy

Abstract: Many networked systems such as electric networks, the brain, and social networks of opinion dynamics are known to obey conservation laws. Examples of this phenomenon include the Kirchoff laws in electric networks and opinion consensus in social networks. Conservation laws in networked systems may be modeled as balance equations of the form $X = B^{*} Y$, where the sparsity pattern of $B^{*}$ captu… ▽ More Many networked systems such as electric networks, the brain, and social networks of opinion dynamics are known to obey conservation laws. Examples of this phenomenon include the Kirchoff laws in electric networks and opinion consensus in social networks. Conservation laws in networked systems may be modeled as balance equations of the form $X = B^{*} Y$, where the sparsity pattern of $B^{*}$ captures the connectivity of the network, and $Y, X \in \mathbb{R}^p$ are vectors of "potentials" and "injected flows" at the nodes respectively. The node potentials $Y$ cause flows across edges and the flows $X$ injected at the nodes are extraneous to the network dynamics. In several practical systems, the network structure is often unknown and needs to be estimated from data. Towards this, one has access to samples of the node potentials $Y$, but only the statistics of the node injections $X$. Motivated by this important problem, we study the estimation of the sparsity structure of the matrix $B^{*}$ from $n$ samples of $Y$ under the assumption that the node injections $X$ follow a Gaussian distribution with a known covariance $Σ_X$. We propose a new $\ell_{1}$-regularized maximum likelihood estimator for this problem in the high-dimensional regime where the size of the network $p$ is larger than sample size $n$. We show that this optimization problem is convex in the objective and admits a unique solution. Under a new mutual incoherence condition, we establish sufficient conditions on the triple $(n,p,d)$ for which exact sparsity recovery of $B^{*}$ is possible with high probability; $d$ is the degree of the graph. We also establish guarantees for the recovery of $B^{*}$ in the element-wise maximum, Frobenius, and operator norms. Finally, we complement these theoretical results with experimental validation of the performance of the proposed estimator on synthetic and real-world data. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2202.06836 [pdf, other]

doi 10.1109/TPWRS.2022.3212323

A Machine Learning Framework for Event Identification via Modal Analysis of PMU Data

Authors: Nima T. Bazargani, Gautam Dasarathy, Lalitha Sankar, Oliver Kosut

Abstract: Power systems are prone to a variety of events (e.g. line trips and generation loss) and real-time identification of such events is crucial in terms of situational awareness, reliability, and security. Using measurements from multiple synchrophasors, i.e., phasor measurement units (PMUs), we propose to identify events by extracting features based on modal dynamics. We combine such traditional phys… ▽ More Power systems are prone to a variety of events (e.g. line trips and generation loss) and real-time identification of such events is crucial in terms of situational awareness, reliability, and security. Using measurements from multiple synchrophasors, i.e., phasor measurement units (PMUs), we propose to identify events by extracting features based on modal dynamics. We combine such traditional physics-based feature extraction methods with machine learning to distinguish different event types. Including all measurement channels at each PMU allows exploiting diverse features but also requires learning classification models over a high-dimensional space. To address this issue, various feature selection methods are implemented to choose the best subset of features. Using the obtained subset of features, we investigate the performance of two well-known classification models, namely, logistic regression (LR) and support vector machines (SVM) to identify generation loss and line trip events in two datasets. The first dataset is obtained from simulated generation loss and line trip events in the Texas 2000-bus synthetic grid. The second is a proprietary dataset with labeled events obtained from a large utility in the USA involving measurements from nearly 500 PMUs. Our results indicate that the proposed framework is promising for identifying the two types of events. △ Less

Submitted 3 October, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: 12 pages, Accepted for the publication in the IEEE Transactions on Power Systems

Journal ref: IEEE Transactions on Power Systems, 2022

arXiv:2111.08861 [pdf, other]

A label-efficient two-sample test

Authors: Weizhi Li, Gautam Dasarathy, Karthikeyan Natesan Ramamurthy, Visar Berisha

Abstract: Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two… ▽ More Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed \emph{bimodal query} is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test controls the Type I error and has decreased Type II error relative to uniform querying and certainty-based querying. Source code for our algorithms and experimental results is available at \url{https://github.com/wayne0908/Label-Efficient-Two-Sample}. △ Less

Submitted 19 July, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: Accepted to the 38th conference on Uncertainty in Artificial Intelligence (UAI2022)

arXiv:2108.01152 [pdf, ps, other]

Maximizing and Satisficing in Multi-armed Bandits with Graph Information

Authors: Parth K. Thaker, Mohit Malu, Nikhil Rao, Gautam Dasarathy

Abstract: Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision-making and search under uncertainty. In modern applications, however, one is often faced with a tremendously large number of options. Even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to simila… ▽ More Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision-making and search under uncertainty. In modern applications, however, one is often faced with a tremendously large number of options. Even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similar relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms are captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one with a sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms \textbf{\algoname{}} (GRaph-based UcB) and $ζ$-\textbf{\algoname{}} for these problems and provide a theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement, showing a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information. △ Less

Submitted 20 November, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

arXiv:2104.07208 [pdf]

State and Topology Estimation for Unobservable Distribution Systems using Deep Neural Networks

Authors: Behrouz Azimian, Reetam Sen Biswas, Shiva Moshtagh, Anamitra Pal, Lang Tong, Gautam Dasarathy

Abstract: Time-synchronized state estimation for reconfigurable distribution networks is challenging because of limited real-time observability. This paper addresses this challenge by formulating a deep learning (DL)-based approach for topology identification (TI) and unbalanced three-phase distribution system state estimation (DSSE). Two deep neural networks (DNNs) are trained for time-synchronized DNN-bas… ▽ More Time-synchronized state estimation for reconfigurable distribution networks is challenging because of limited real-time observability. This paper addresses this challenge by formulating a deep learning (DL)-based approach for topology identification (TI) and unbalanced three-phase distribution system state estimation (DSSE). Two deep neural networks (DNNs) are trained for time-synchronized DNN-based TI and DSSE, respectively, for systems that are incompletely observed by synchrophasor measurement devices (SMDs) in real-time. A data-driven approach for judicious SMD placement to facilitate reliable TI and DSSE is also provided. Robustness of the proposed methodology is demonstrated by considering non-Gaussian noise in the SMD measurements. A comparison of the DNN-based DSSE with more conventional approaches indicates that the DL-based approach gives better accuracy with smaller number of SMDs. △ Less

Submitted 26 March, 2022; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: 13 pages. arXiv admin note: substantial text overlap with arXiv:2011.04272

Journal ref: IEEE Transactions on Instrumentation and Measurement, 2022

arXiv:2102.13135 [pdf, other]

Graph Community Detection from Coarse Measurements: Recovery Conditions for the Coarsened Weighted Stochastic Block Model

Authors: Nafiseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper

Abstract: We study the problem of community recovery from coarse measurements of a graph. In contrast to the problem of community recovery of a fully observed graph, one often encounters situations when measurements of a graph are made at low-resolution, each measurement integrating across multiple graph nodes. Such low-resolution measurements effectively induce a coarse graph with its own communities. Our… ▽ More We study the problem of community recovery from coarse measurements of a graph. In contrast to the problem of community recovery of a fully observed graph, one often encounters situations when measurements of a graph are made at low-resolution, each measurement integrating across multiple graph nodes. Such low-resolution measurements effectively induce a coarse graph with its own communities. Our objective is to develop conditions on the graph structure, the quantity, and properties of measurements, under which we can recover the community organization in this coarse graph. In this paper, we build on the stochastic block model by mathematically formalizing the coarsening process, and characterizing its impact on the community members and connections. Through this novel setup and modeling, we characterize an error bound for community recovery. The error bound yields simple and closed-form asymptotic conditions to achieve the perfect recovery of the coarse graph communities. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2011.09645 [pdf, other]

Finding the Homology of Decision Boundaries with Active Learning

Authors: Weizhi Li, Gautam Dasarathy, Karthikeyan Natesan Ramamurthy, Visar Berisha

Abstract: Accurately and efficiently characterizing the decision boundary of classifiers is important for problems related to model selection and meta-learning. Inspired by topological data analysis, the characterization of decision boundaries using their homology has recently emerged as a general and powerful tool. In this paper, we propose an active learning algorithm to recover the homology of decision b… ▽ More Accurately and efficiently characterizing the decision boundary of classifiers is important for problems related to model selection and meta-learning. Inspired by topological data analysis, the characterization of decision boundaries using their homology has recently emerged as a general and powerful tool. In this paper, we propose an active learning algorithm to recover the homology of decision boundaries. Our algorithm sequentially and adaptively selects which samples it requires the labels of. We theoretically analyze the proposed framework and show that the query complexity of our active learning algorithm depends naturally on the intrinsic complexity of the underlying manifold. We demonstrate the effectiveness of our framework in selecting best-performing machine learning models for datasets just using their respective homological summaries. Experiments on several standard datasets show the sample complexity improvement in recovering the homology and demonstrate the practical utility of the framework for model selection. Source code for our algorithms and experimental results is available at https://github.com/wayne0908/Active-Learning-Homology. △ Less

Submitted 18 November, 2020; originally announced November 2020.

Journal ref: Advances in Neural Information Processing Systems 33 (2020)

arXiv:2007.05996 [pdf, other]

Differentiable Programming for Hyperspectral Unmixing using a Physics-based Dispersion Model

Authors: John Janiczek, Parth Thaker, Gautam Dasarathy, Christopher S. Edwards, Philip Christensen, Suren Jayasuriya

Abstract: Hyperspectral unmixing is an important remote sensing task with applications including material identification and analysis. Characteristic spectral features make many pure materials identifiable from their visible-to-infrared spectra, but quantifying their presence within a mixture is a challenging task due to nonlinearities and factors of variation. In this paper, spectral variation is considere… ▽ More Hyperspectral unmixing is an important remote sensing task with applications including material identification and analysis. Characteristic spectral features make many pure materials identifiable from their visible-to-infrared spectra, but quantifying their presence within a mixture is a challenging task due to nonlinearities and factors of variation. In this paper, spectral variation is considered from a physics-based approach and incorporated into an end-to-end spectral unmixing algorithm via differentiable programming. The dispersion model is introduced to simulate realistic spectral variation, and an efficient method to fit the parameters is presented. Then, this dispersion model is utilized as a generative model within an analysis-by-synthesis spectral unmixing algorithm. Further, a technique for inverse rendering using a convolutional neural network to predict parameters of the generative model is introduced to enhance performance and speed when training data is available. Results achieve state-of-the-art on both infrared and visible-to-near-infrared (VNIR) datasets, and show promise for the synergy between physics-based models and deep learning in hyperspectral unmixing in the future. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: 36 pages, 11 figures. Accepted to European Conference on Computer Vision (ECCV) 2020

arXiv:2006.12406 [pdf, ps, other]

On the alpha-loss Landscape in the Logistic Model

Authors: Tyler Sypherd, Mario Diaz, Lalitha Sankar, Gautam Dasarathy

Abstract: We analyze the optimization landscape of a recently introduced tunable class of loss functions called $α$-loss, $α\in (0,\infty]$, in the logistic model. This family encapsulates the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$) and contains compelling properties that enable the practitioner to discern among a host of operating conditions relevant to emerging l… ▽ More We analyze the optimization landscape of a recently introduced tunable class of loss functions called $α$-loss, $α\in (0,\infty]$, in the logistic model. This family encapsulates the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$) and contains compelling properties that enable the practitioner to discern among a host of operating conditions relevant to emerging learning methods. Specifically, we study the evolution of the optimization landscape of $α$-loss with respect to $α$ using tools drawn from the study of strictly-locally-quasi-convex functions in addition to geometric techniques. We interpret these results in terms of optimization complexity via normalized gradient descent. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: 5 pages, appeared in ISIT 2020. arXiv admin note: text overlap with arXiv:1906.02314

arXiv:2002.01066 [pdf, ps, other]

doi 10.1109/ISIT44484.2020.9174368

On the Sample Complexity and Optimization Landscape for Quadratic Feasibility Problems

Authors: Parth Thaker, Gautam Dasarathy, Angelia Nedić

Abstract: We consider the problem of recovering a complex vector $\mathbf{x}\in \mathbb{C}^n$ from $m$ quadratic measurements $\{\langle A_i\mathbf{x}, \mathbf{x}\rangle\}_{i=1}^m$. This problem, known as quadratic feasibility, encompasses the well known phase retrieval problem and has applications in a wide range of important areas including power system state estimation and x-ray crystallography. In gener… ▽ More We consider the problem of recovering a complex vector $\mathbf{x}\in \mathbb{C}^n$ from $m$ quadratic measurements $\{\langle A_i\mathbf{x}, \mathbf{x}\rangle\}_{i=1}^m$. This problem, known as quadratic feasibility, encompasses the well known phase retrieval problem and has applications in a wide range of important areas including power system state estimation and x-ray crystallography. In general, not only is the the quadratic feasibility problem NP-hard to solve, but it may in fact be unidentifiable. In this paper, we establish conditions under which this problem becomes {identifiable}, and further prove isometry properties in the case when the matrices $\{A_i\}_{i=1}^m$ are Hermitian matrices sampled from a complex Gaussian distribution. Moreover, we explore a nonconvex {optimization} formulation of this problem, and establish salient features of the associated optimization landscape that enables gradient algorithms with an arbitrary initialization to converge to a \emph{globally optimal} point with a high probability. Our results also reveal sample complexity requirements for successfully identifying a feasible solution in these contexts. △ Less

Submitted 14 December, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

Comments: 21 pages

arXiv:2001.01900 [pdf, other]

Regularization via Structural Label Smoothing

Authors: Weizhi Li, Gautam Dasarathy, Visar Berisha

Abstract: Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to… ▽ More Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to impose this smoothing, which is uniform across all training data. In this paper, we show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data, with regions of the feature space with high overlap and low marginal likelihood having a lower bias and regions of low overlap and high marginal likelihood having a higher bias. These theoretical results motivate a simple objective function for data-dependent smoothing to mitigate the potential negative consequences of the operation while maintaining its desirable properties as a regularizer. We call this approach Structural Label Smoothing (SLS). We implement SLS and empirically validate on synthetic, Higgs, SVHN, CIFAR-10, and CIFAR-100 datasets. The results confirm our theoretical insights and demonstrate the effectiveness of the proposed method in comparison to traditional label smoothing. △ Less

Submitted 4 July, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

arXiv:1906.02314 [pdf, other]

A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization

Authors: Tyler Sypherd, Mario Diaz, John Kevin Cava, Gautam Dasarathy, Peter Kairouz, Lalitha Sankar

Abstract: We introduce a tunable loss function called $α$-loss, parameterized by $α\in (0,\infty]$, which interpolates between the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between $α$-loss and Arimoto conditional entropy, verify the classification-calibration o… ▽ More We introduce a tunable loss function called $α$-loss, parameterized by $α\in (0,\infty]$, which interpolates between the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between $α$-loss and Arimoto conditional entropy, verify the classification-calibration of $α$-loss in order to demonstrate asymptotic optimality via Rademacher complexity generalization techniques, and build-upon a notion called strictly local quasi-convexity in order to quantitatively characterize the optimization landscape of $α$-loss. Practically, we perform class imbalance, robustness, and classification experiments on benchmark image datasets using convolutional-neural-networks. Our main practical conclusion is that certain tasks may benefit from tuning $α$-loss away from log-loss ($α= 1$), and to this end we provide simple heuristics for the practitioner. In particular, navigating the $α$ hyperparameter can readily provide superior model robustness to label flips ($α> 1$) and sensitivity to imbalanced classes ($α< 1$). △ Less

Submitted 21 December, 2022; v1 submitted 5 June, 2019; originally announced June 2019.

Comments: Published at the Transactions on Information Theory

arXiv:1905.09190 [pdf, other]

Thresholding Graph Bandits with GrAPL

Authors: Daniel LeJeune, Gautam Dasarathy, Richard G. Baraniuk

Abstract: In this paper, we introduce a new online decision making paradigm that we call Thresholding Graph Bandits. The main goal is to efficiently identify a subset of arms in a multi-armed bandit problem whose means are above a specified threshold. While traditionally in such problems, the arms are assumed to be independent, in our paradigm we further suppose that we have access to the similarity between… ▽ More In this paper, we introduce a new online decision making paradigm that we call Thresholding Graph Bandits. The main goal is to efficiently identify a subset of arms in a multi-armed bandit problem whose means are above a specified threshold. While traditionally in such problems, the arms are assumed to be independent, in our paradigm we further suppose that we have access to the similarity between the arms in the form of a graph, allowing us gain information about the arm means in fewer samples. Such settings play a key role in a wide range of modern decision making problems where rapid decisions need to be made in spite of the large number of options available at each time. We present GrAPL, a novel algorithm for the thresholding graph bandit problem. We demonstrate theoretically that this algorithm is effective in taking advantage of the graph structure when available and the reward function homophily (that strongly connected arms have similar rewards) when favorable. We confirm these theoretical findings via experiments on both synthetic and real data. △ Less

Submitted 24 March, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

Comments: 14 pages, 3 figures. To appear in AISTATS 2020

arXiv:1905.08831 [pdf, other]

IdeoTrace: A Framework for Ideology Tracing with a Case Study on the 2016 U.S. Presidential Election

Authors: Indu Manickam, Andrew S. Lan, Gautam Dasarathy, Richard G. Baraniuk

Abstract: The 2016 United States presidential election has been characterized as a period of extreme divisiveness that was exacerbated on social media by the influence of fake news, trolls, and social bots. However, the extent to which the public became more polarized in response to these influences over the course of the election is not well understood. In this paper we propose IdeoTrace, a framework for (… ▽ More The 2016 United States presidential election has been characterized as a period of extreme divisiveness that was exacerbated on social media by the influence of fake news, trolls, and social bots. However, the extent to which the public became more polarized in response to these influences over the course of the election is not well understood. In this paper we propose IdeoTrace, a framework for (i) jointly estimating the ideology of social media users and news websites and (ii) tracing changes in user ideology over time. We apply this framework to the last two months of the election period for a group of 47508 Twitter users and demonstrate that both liberal and conservative users became more polarized over time. △ Less

Submitted 30 May, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

Comments: 9 pages, 4 figures, submitted to ASONAM 2019

arXiv:1806.04310 [pdf, other]

MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

Authors: Amirali Aghazadeh, Ryan Spring, Daniel LeJeune, Gautam Dasarathy, Anshumali Shrivastava, Richard G. Baraniuk

Abstract: Feature selection is an important challenge in machine learning. It plays a crucial role in the explainability of machine-driven decisions that are rapidly permeating throughout modern society. Unfortunately, the explosion in the size and dimensionality of real-world datasets poses a severe challenge to standard feature selection algorithms. Today, it is not uncommon for datasets to have billions… ▽ More Feature selection is an important challenge in machine learning. It plays a crucial role in the explainability of machine-driven decisions that are rapidly permeating throughout modern society. Unfortunately, the explosion in the size and dimensionality of real-world datasets poses a severe challenge to standard feature selection algorithms. Today, it is not uncommon for datasets to have billions of dimensions. At such scale, even storing the feature vector is impossible, causing most existing feature selection methods to fail. Workarounds like feature hashing, a standard approach to large-scale machine learning, helps with the computational feasibility, but at the cost of losing the interpretability of features. In this paper, we present MISSION, a novel framework for ultra large-scale feature selection that performs stochastic gradient descent while maintaining an efficient representation of the features in memory using a Count-Sketch data structure. MISSION retains the simplicity of feature hashing without sacrificing the interpretability of the features while using only O(log^2(p)) working memory. We demonstrate that MISSION accurately and efficiently performs feature selection on real-world, large-scale datasets with billions of dimensions. △ Less

Submitted 11 June, 2018; originally announced June 2018.

arXiv:1707.04300 [pdf, other]

Coalescent-based species tree estimation: a stochastic Farris transform

Authors: Gautam Dasarathy, Elchanan Mossel, Robert Nowak, Sebastien Roch

Abstract: The reconstruction of a species phylogeny from genomic data faces two significant hurdles: 1) the trees describing the evolution of each individual gene--i.e., the gene trees--may differ from the species phylogeny and 2) the molecular sequences corresponding to each gene often provide limited information about the gene trees themselves. In this paper we consider an approach to species tree reconst… ▽ More The reconstruction of a species phylogeny from genomic data faces two significant hurdles: 1) the trees describing the evolution of each individual gene--i.e., the gene trees--may differ from the species phylogeny and 2) the molecular sequences corresponding to each gene often provide limited information about the gene trees themselves. In this paper we consider an approach to species tree reconstruction that addresses both these hurdles. Specifically, we propose an algorithm for phylogeny reconstruction under the multispecies coalescent model with a standard model of site substitution. The multispecies coalescent is commonly used to model gene tree discordance due to incomplete lineage sorting, a well-studied population-genetic effect. In previous work, an information-theoretic trade-off was derived in this context between the number of loci, $m$, needed for an accurate reconstruction and the length of the locus sequences, $k$. It was shown that to reconstruct an internal branch of length $f$, one needs $m$ to be of the order of $1/[f^{2} \sqrt{k}]$. That previous result was obtained under the molecular clock assumption, i.e., under the assumption that mutation rates (as well as population sizes) are constant across the species phylogeny. Here we generalize this result beyond the restrictive molecular clock assumption, and obtain a new reconstruction algorithm that has the same data requirement (up to log factors). Our main contribution is a novel reduction to the molecular clock case under the multispecies coalescent. As a corollary, we also obtain a new identifiability result of independent interest: for any species tree with $n \geq 3$ species, the rooted species tree can be identified from the distribution of its unrooted weighted gene trees even in the absence of a molecular clock. △ Less

Submitted 13 July, 2017; originally announced July 2017.

Comments: Submitted. 49 pages

arXiv:1707.03386 [pdf, ps, other]

DeepCodec: Adaptive Sensing and Recovery via Deep Convolutional Neural Networks

Authors: Ali Mousavi, Gautam Dasarathy, Richard G. Baraniuk

Abstract: In this paper we develop a novel computational sensing framework for sensing and recovering structured signals. When trained on a set of representative signals, our framework learns to take undersampled measurements and recover signals from them using a deep convolutional neural network. In other words, it learns a transformation from the original signals to a near-optimal number of undersampled m… ▽ More In this paper we develop a novel computational sensing framework for sensing and recovering structured signals. When trained on a set of representative signals, our framework learns to take undersampled measurements and recover signals from them using a deep convolutional neural network. In other words, it learns a transformation from the original signals to a near-optimal number of undersampled measurements and the inverse transformation from measurements to signals. This is in contrast to traditional compressive sensing (CS) systems that use random linear measurements and convex optimization or iterative algorithms for signal recovery. We compare our new framework with $\ell_1$-minimization from the phase transition point of view and demonstrate that it outperforms $\ell_1$-minimization in the regions of phase transition plot where $\ell_1$-minimization cannot recover the exact solution. In addition, we experimentally demonstrate how learning measurements enhances the overall recovery performance, speeds up training of recovery framework, and leads to having fewer parameters to learn. △ Less

Submitted 11 July, 2017; originally announced July 2017.

arXiv:1610.09726 [pdf, other]

The Multi-fidelity Multi-armed Bandit

Authors: Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, Barnabás Póczos

Abstract: We study a variant of the classical stochastic $K$-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, t… ▽ More We study a variant of the classical stochastic $K$-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of $M$ fidelities. The highest fidelity (desired outcome) expends cost $λ^{(m)}$. The $m^{\text{th}}$ fidelity (an approximation) expends $λ^{(m)} < λ^{(M)}$ and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions. △ Less

Submitted 30 October, 2016; originally announced October 2016.

Comments: To appear at NIPS 2016

arXiv:1603.06288 [pdf, other]

Multi-fidelity Gaussian Process Bandit Optimisation

Authors: Kirthevasan Kandasamy, Gautam Dasarathy, Junier B. Oliva, Jeff Schneider, Barnabas Poczos

Abstract: In many scientific and engineering applications, we are tasked with the maximisation of an expensive to evaluate black box function $f$. Traditional settings for this problem assume just the availability of this single function. However, in many cases, cheap approximations to $f$ may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer s… ▽ More In many scientific and engineering applications, we are tasked with the maximisation of an expensive to evaluate black box function $f$. Traditional settings for this problem assume just the availability of this single function. However, in many cases, cheap approximations to $f$ may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of $f$ in a small but promising region and speedily identify the optimum. We formalise this task as a \emph{multi-fidelity} bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop MF-GP-UCB, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour, and achieves better regret than strategies which ignore multi-fidelity information. Empirically, MF-GP-UCB outperforms such naive strategies and other multi-fidelity methods on several synthetic and real experiments. △ Less

Submitted 15 March, 2019; v1 submitted 20 March, 2016; originally announced March 2016.

Comments: Preliminary version appeared at NIPS 2016

arXiv:1602.00354 [pdf, other]

Active Learning Algorithms for Graphical Model Selection

Authors: Gautam Dasarathy, Aarti Singh, Maria-Florina Balcan, Jong Hyuk Park

Abstract: The problem of learning the structure of a high dimensional graphical model from data has received considerable attention in recent years. In many applications such as sensor networks and proteomics it is often expensive to obtain samples from all the variables involved simultaneously. For instance, this might involve the synchronization of a large number of sensors or the tagging of a large numbe… ▽ More The problem of learning the structure of a high dimensional graphical model from data has received considerable attention in recent years. In many applications such as sensor networks and proteomics it is often expensive to obtain samples from all the variables involved simultaneously. For instance, this might involve the synchronization of a large number of sensors or the tagging of a large number of proteins. To address this important issue, we initiate the study of a novel graphical model selection problem, where the goal is to optimize the total number of scalar samples obtained by allowing the collection of samples from only subsets of the variables. We propose a general paradigm for graphical model selection where feedback is used to guide the sampling to high degree vertices, while obtaining only few samples from the ones with the low degrees. We instantiate this framework with two specific active learning algorithms, one of which makes mild assumptions but is computationally expensive, while the other is more computationally efficient but requires stronger (nevertheless standard) assumptions. Whereas the sample complexity of passive algorithms is typically a function of the maximum degree of the graph, we show that the sample complexity of our algorithms is provable smaller and that it depends on a novel local complexity measure that is akin to the average degree of the graph. We finally demonstrate the efficacy of our framework via simulations. △ Less

Submitted 7 April, 2016; v1 submitted 31 January, 2016; originally announced February 2016.

Comments: 26 pages, 3 figures. Preliminary version to appear in AI & Statistics 2016

arXiv:1506.08760 [pdf, other]

S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification

Authors: Gautam Dasarathy, Robert Nowak, Xiao** Zhu

Abstract: This paper investigates the problem of active learning for binary label prediction on a graph. We introduce a simple and label-efficient algorithm called S2 for this task. At each step, S2 selects the vertex to be labeled based on the structure of the graph and all previously gathered labels. Specifically, S2 queries for the label of the vertex that bisects the *shortest shortest* path between any… ▽ More This paper investigates the problem of active learning for binary label prediction on a graph. We introduce a simple and label-efficient algorithm called S2 for this task. At each step, S2 selects the vertex to be labeled based on the structure of the graph and all previously gathered labels. Specifically, S2 queries for the label of the vertex that bisects the *shortest shortest* path between any pair of oppositely labeled vertices. We present a theoretical estimate of the number of queries S2 needs in terms of a novel parametrization of the complexity of binary functions on graphs. We also present experimental results demonstrating the performance of S2 on both real and synthetic data. While other graph-based active learning algorithms have shown promise in practice, our algorithm is the first with both good performance and theoretical guarantees. Finally, we demonstrate the implications of the S2 algorithm to the theory of nonparametric active learning. In particular, we show that S2 achieves near minimax optimal excess risk for an important class of nonparametric classification problems. △ Less

Submitted 29 June, 2015; originally announced June 2015.

Comments: A version of this paper appears in the Conference on Learning Theory (COLT) 2015

arXiv:1404.7055 [pdf, other]

doi 10.1109/TCBB.2014.2361685

Data Requirement for Phylogenetic Inference from Multiple Loci: A New Distance Method

Authors: Gautam Dasarathy, Robert Nowak, Sebastien Roch

Abstract: We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate g… ▽ More We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length. We provide the first full data-requirement analysis of a species tree reconstruction method that takes into account estimation errors at the gene level. Under that criterion, we also devise a novel reconstruction algorithm that provably improves over all previous methods in a regime of interest. △ Less

Submitted 30 June, 2014; v1 submitted 28 April, 2014; originally announced April 2014.

Comments: 19 pages, 2 figures. Preliminary version to appear in IEEE ISIT 2014. Added acknowledgements and made the proof of the "equality" part of Theorem 3 explicit in Appendix C

arXiv:1303.6544 [pdf, other]

Sketching Sparse Matrices

Authors: Gautam Dasarathy, Parikshit Shah, Badri Narayan Bhaskar, Robert Nowak

Abstract: This paper considers the problem of recovering an unknown sparse p\times p matrix X from an m\times m matrix Y=AXB^T, where A and B are known m \times p matrices with m << p. The main result shows that there exist constructions of the "sketching" matrices A and B so that even if X has O(p) non-zeros, it can be recovered exactly and efficiently using a convex program as long as these non-zeros ar… ▽ More This paper considers the problem of recovering an unknown sparse p\times p matrix X from an m\times m matrix Y=AXB^T, where A and B are known m \times p matrices with m << p. The main result shows that there exist constructions of the "sketching" matrices A and B so that even if X has O(p) non-zeros, it can be recovered exactly and efficiently using a convex program as long as these non-zeros are not concentrated in any single row/column of X. Furthermore, it suffices for the size of Y (the sketch dimension) to scale as m = O(\sqrt{# nonzeros in X} \times log p). The results also show that the recovery is robust and stable in the sense that if X is equal to a sparse matrix plus a perturbation, then the convex program we propose produces an approximation with accuracy proportional to the size of the perturbation. Unlike traditional results on sparse recovery, where the sensing matrix produces independent measurements, our sensing operator is highly constrained (it assumes a tensor product structure). Therefore, proving recovery guarantees require non-standard techniques. Indeed our approach relies on a novel result concerning tensor products of bipartite graphs, which may be of independent interest. This problem is motivated by the following application, among others. Consider a p\times n data matrix D, consisting of n observations of p variables. Assume that the correlation matrix X:=DD^{T} is (approximately) sparse in the sense that each of the p variables is significantly correlated with only a few others. Our results show that these significant correlations can be detected even if we have access to only a sketch of the data S=AD with A \in R^{m\times p}. △ Less

Submitted 26 March, 2013; originally announced March 2013.

arXiv:1102.3887 [pdf, ps, other]

Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities

Authors: Brian Eriksson, Gautam Dasarathy, Aarti Singh, Robert Nowak

Abstract: Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered. This paper investigates the hierarchical clustering of N items based on a small subset of pairwise similarities, significantly less than the complete set of N(N-1)… ▽ More Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered. This paper investigates the hierarchical clustering of N items based on a small subset of pairwise similarities, significantly less than the complete set of N(N-1)/2 similarities. First, we show that if the intracluster similarities exceed intercluster similarities, then it is possible to correctly determine the hierarchical clustering from as few as 3N log N similarities. We demonstrate this order of magnitude savings in the number of pairwise similarities necessitates sequentially selecting which similarities to obtain in an adaptive fashion, rather than picking them at random. We then propose an active clustering method that is robust to a limited fraction of anomalous similarities, and show how even in the presence of these noisy similarity values we can resolve the hierarchical clustering using only O(N log^2 N) pairwise similarities. △ Less

Submitted 18 February, 2011; originally announced February 2011.

Showing 1–29 of 29 results for author: Dasarathy, G