Skip to main content

Showing 1–13 of 13 results for author: Gayen, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00927  [pdf, ps, other

    cs.LG cs.CC stat.ML

    Learnability of Parameter-Bounded Bayes Nets

    Authors: Arnab Bhattacharyya, Davin Choo, Sutanu Gayen, Dimitrios Myrisiotis

    Abstract: Bayes nets are extensively used in practice to efficiently represent joint probability distributions over a set of random variables and capture dependency relations. In a seminal paper, Chickering et al. (JMLR 2004) showed that given a distribution $P$, that is defined as the marginal distribution of a Bayes net, it is $\mathsf{NP}$-hard to decide whether there is a parameter-bounded Bayes net tha… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 15 pages, 2 figures

  2. arXiv:2405.08255  [pdf, ps, other

    cs.CC

    Total Variation Distance for Product Distributions is $\#\mathsf{P}$-Complete

    Authors: Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, Dimitrios Myrisiotis, A. Pavan, N. V. Vinodchandran

    Abstract: We show that computing the total variation distance between two product distributions is $\#\mathsf{P}$-complete. This is in stark contrast with other distance measures such as Kullback-Leibler, Chi-square, and Hellinger, which tensorize over the marginals leading to efficient algorithms.

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 5 pages. An extended version of this paper appeared in the proceedings of IJCAI 2023, under the title "On approximating total variation distance" (see https://www.ijcai.org/proceedings/2023/387 and arXiv:2206.07209)

  3. arXiv:2405.07914  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Distribution Learning Meets Graph Structure Sampling

    Authors: Arnab Bhattacharyya, Sutanu Gayen, Philips George John, Sayantan Sen, N. V. Vinodchandran

    Abstract: This work establishes a novel link between the problem of PAC-learning high-dimensional graphical models and the task of (efficient) counting and sampling of graph structures, using an online learning framework. We observe that if we apply the exponentially weighted average (EWA) or randomized weighted majority (RWM) forecasters on a sequence of samples from a distribution P using the log loss f… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 48 pages, 2 figures. Shortened abstract as per arXiv criteria

  4. arXiv:2309.09134  [pdf, ps, other

    cs.DS cs.CC cs.DM cs.LG

    Total Variation Distance Meets Probabilistic Inference

    Authors: Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, Dimitrios Myrisiotis, A. Pavan, N. V. Vinodchandran

    Abstract: In this paper, we establish a novel connection between total variation (TV) distance estimation and probabilistic inference. In particular, we present an efficient, structure-preserving reduction from relative approximation of TV distance to probabilistic inference over directed graphical models. This reduction leads to a fully polynomial randomized approximation scheme (FPRAS) for estimating TV d… ▽ More

    Submitted 1 July, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: 25 pages. This work has been accepted for presentation at the International Conference on Machine Learning (ICML) 2024

  5. arXiv:2210.02401  [pdf, other

    cs.CV cs.AI

    Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features

    Authors: Deepak Gupta, Russell Loane, Soumya Gayen, Dina Demner-Fushman

    Abstract: Nearest neighbor search (NNS) aims to locate the points in high-dimensional space that is closest to the query point. The brute-force approach for finding the nearest neighbor becomes computationally infeasible when the number of points is large. The NNS has multiple applications in medicine, such as searching large medical imaging databases, disease classification, diagnosis, etc. With a focus on… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  6. arXiv:2206.07209  [pdf, ps, other

    cs.DS cs.CC cs.DM

    On Approximating Total Variation Distance

    Authors: Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, Dimitrios Myrisiotis, A. Pavan, N. V. Vinodchandran

    Abstract: Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain $\{0,1\}^n$. In particular, we establish the following results. 1. The problem of exactly computing the TV distance of two product distributions is $\#\mathsf{P}$-co… ▽ More

    Submitted 16 August, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: 20 pages, 1 figure

    Journal ref: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (2023) Main Track. Pages 3479-3487

  7. arXiv:2107.11712  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Efficient inference of interventional distributions

    Authors: Arnab Bhattacharyya, Sutanu Gayen, Saravanan Kandasamy, Vedant Raval, N. V. Vinodchandran

    Abstract: We consider the problem of efficiently inferring interventional distributions in a causal Bayesian network from a finite number of observations. Let $\mathcal{P}$ be a causal model on a set $\mathbf{V}$ of observable variables on a given causal graph $G$. For sets $\mathbf{X},\mathbf{Y}\subseteq \mathbf{V}$, and setting ${\bf x}$ to $\mathbf{X}$, let $P_{\bf x}(\mathbf{Y})$ denote the intervention… ▽ More

    Submitted 27 July, 2021; v1 submitted 24 July, 2021; originally announced July 2021.

    Comments: 16 pages, 2 figures

  8. arXiv:2107.10450  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    Learning Sparse Fixed-Structure Gaussian Bayesian Networks

    Authors: Arnab Bhattacharyya, Davin Choo, Rishikesh Gajjala, Sutanu Gayen, Yuhao Wang

    Abstract: Gaussian Bayesian networks (a.k.a. linear Gaussian structural equation models) are widely used to model causal interactions among continuous variables. In this work, we study the problem of learning a fixed-structure Gaussian Bayesian network up to a bounded error in total variation distance. We analyze the commonly used node-wise least squares regression (LeastSquares) and prove that it has a nea… ▽ More

    Submitted 18 October, 2022; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: 30 pages, 11 figures, acknowledgement added

  9. arXiv:2012.14632  [pdf, ps, other

    cs.DS cs.IT cs.LG

    Testing Product Distributions: A Closer Look

    Authors: Arnab Bhattacharyya, Sutanu Gayen, Saravanan Kandasamy, N. V. Vinodchandran

    Abstract: We study the problems of identity and closeness testing of $n$-dimensional product distributions. Prior works by Canonne, Diakonikolas, Kane and Stewart (COLT 2017) and Daskalakis and Pan (COLT 2017) have established tight sample complexity bounds for non-tolerant testing over a binary alphabet: given two product distributions $P$ and $Q$ over a binary alphabet, distinguish between the cases… ▽ More

    Submitted 26 May, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: A version appears in ALT 2021

  10. arXiv:2011.04144  [pdf, ps, other

    cs.DS cs.IT cs.LG

    Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu

    Authors: Arnab Bhattacharyya, Sutanu Gayen, Eric Price, N. V. Vinodchandran

    Abstract: We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution $P$ on $Σ^n$ and a tree $T$ on $n$ nodes, we say $T$ is an $\varepsilon$-approximate tree for $P$ if there is a $T$-structured distribution $Q$ such that $D(P\;||\;Q)$ is at most $\varepsilon$ more than the best… ▽ More

    Submitted 22 July, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: 33 pages, 3 figures

  11. arXiv:2005.09067  [pdf, other

    cs.CL

    Question-Driven Summarization of Answers to Consumer Health Questions

    Authors: Max Savery, Asma Ben Abacha, Soumya Gayen, Dina Demner-Fushman

    Abstract: Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who routinely needs to understand large quantities of information. For example, in the medical domain, recent developments in deep learning approaches to automatic summarization have the potential to make health information more easily accessible to patients and consum… ▽ More

    Submitted 20 May, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

  12. arXiv:2002.05378  [pdf, other

    cs.DS cs.LG

    Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning

    Authors: Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, N. V. Vinodchandran

    Abstract: We design efficient distance approximation algorithms for several classes of structured high-dimensional distributions. Specifically, we show algorithms for the following problems: - Given sample access to two Bayesian networks $P_1$ and $P_2$ over known directed acyclic graphs $G_1$ and $G_2$ having $n$ nodes and bounded in-degree, approximate $d_{tv}(P_1,P_2)$ to within additive error $ε$ usin… ▽ More

    Submitted 13 February, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: 24 pages, 1 figure

  13. arXiv:2002.04232  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Learning and Sampling of Atomic Interventions from Observations

    Authors: Arnab Bhattacharyya, Sutanu Gayen, Saravanan Kandasamy, Ashwin Maran, N. V. Vinodchandran

    Abstract: We study the problem of efficiently estimating the effect of an intervention on a single variable (atomic interventions) using observational samples in a causal Bayesian network. Our goal is to give algorithms that are efficient in both time and sample complexity in a non-parametric setting. Tian and Pearl (AAAI `02) have exactly characterized the class of causal graphs for which causal effects… ▽ More

    Submitted 5 August, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: 26 pages, 4 figures, a version appeared in ICML 2020

    ACM Class: I.2.6