Skip to main content

Showing 1–50 of 158 results for author: Karthik

Searching in archive stat. Search in all archives.
.
  1. arXiv:2404.13056  [pdf, other

    cs.LG cs.CE stat.CO stat.ME stat.ML

    Variational Bayesian Optimal Experimental Design with Normalizing Flows

    Authors: Jiayuan Dong, Christian Jacobsen, Mehdi Khalloufi, Maryam Akram, Wanjiao Liu, Karthik Duraisamy, Xun Huan

    Abstract: Bayesian optimal experimental design (OED) seeks experiments that maximize the expected information gain (EIG) in model parameters. Directly estimating the EIG using nested Monte Carlo is computationally expensive and requires an explicit likelihood. Variational OED (vOED), in contrast, estimates a lower bound of the EIG without likelihood evaluations by approximating the posterior distributions w… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    MSC Class: 62K05; 94A17; 62C10; 62F15

  2. arXiv:2404.12556  [pdf, other

    stat.CO

    Variance-informed Rounding Uncertainty Analysis for Floating-point Statistical Models

    Authors: Sahil Bhola, Karthik Duraisamy

    Abstract: Advancements in computer hardware have made it possible to utilize low- and mixed-precision arithmetic for enhanced computational efficiency. In practical predictive modeling, however, it is vital to quantify uncertainty due to rounding along other sources like measurement, sampling, and numerical discretization. Traditional deterministic rounding uncertainty analysis (DBEA) assumes that the round… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  3. arXiv:2403.18124  [pdf, other

    math.OC stat.CO

    Stochastic Finite Volume Method for Uncertainty Management in Gas Pipeline Network Flows

    Authors: Saif R. Kazi, Sidhant Misra, Svetlana Tokareva, Kaarthik Sundar, Anatoly Zlotnik

    Abstract: Natural gas consumption by users of pipeline networks is subject to increasing uncertainty that originates from the intermittent nature of electric power loads serviced by gas-fired generators. To enable computationally efficient optimization of gas network flows subject to uncertainty, we develop a finite volume representation of stochastic solutions of hyperbolic partial differential equation (P… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Report number: LA-UR-24-22647 MSC Class: 65Kxx; 90C35; 90C15

  4. arXiv:2403.04033  [pdf, ps, other

    cs.LG cs.AI math.ST stat.ML

    Online Learning with Unknown Constraints

    Authors: Karthik Sridharan, Seung Won Wilson Yoo

    Abstract: We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round. The goal is to minimize regret with respect to the best safe action in hindsight while simultaneously satisfying the safety constraint with high probability on each round. We provide a general meta-algorithm that leverages an online regression o… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  5. arXiv:2401.11515  [pdf, other

    stat.ME

    Geometry-driven Bayesian Inference for Ultrametric Covariance Matrices

    Authors: Tsung-Hung Yao, Zhenke Wu, Karthik Bharath, Veerabhadran Baladandayuthapani

    Abstract: Ultrametric matrices arise as covariance matrices in latent tree models for multivariate data with hierarchically correlated components. As a parameter space in a model, the set of ultrametric matrices is neither convex nor a smooth manifold, and focus in literature has hitherto mainly been restricted to estimation through projections and relaxation-based techniques. Leveraging the link between an… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

  6. arXiv:2401.09073  [pdf, other

    cs.LG cs.AI cs.IT math.ST stat.ML

    Fixed-Budget Differentially Private Best Arm Identification

    Authors: Zhirui Chen, P. N. Karthik, Yeow Meng Chee, Vincent Y. F. Tan

    Abstract: We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints, when the arm rewards are supported on the unit interval. Given a finite budget $T$ and a privacy parameter $\varepsilon>0$, the goal is to minimise the error probability in finding the arm with the largest mean after $T$ sampling rounds, subject to the constraint that the pol… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024

  7. arXiv:2401.01231  [pdf, other

    stat.AP

    Movement of insurgent gangs: A Bayesian kernel density model for incomplete temporal data

    Authors: Karthik Sriram, Dhruv Gupta, Rajiv Parikh

    Abstract: We develop a Bayesian modeling framework to address a pressing real-life problem faced by the police in tackling insurgent gangs. Unlike criminals associated with common crimes such as robbery, theft or street crime, insurgent gangs are trained in sophisticated arms and strategise against the government to weaken its resolve. They are constantly on the move, operating over large areas causing dama… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  8. arXiv:2312.14882  [pdf, ps, other

    math.ST math.NA math.PR stat.CO stat.ML

    Sampling and estimation on manifolds using the Langevin diffusion

    Authors: Karthik Bharath, Alexander Lewis, Akash Sharma, Michael V Tretyakov

    Abstract: Error bounds are derived for sampling and estimation using a discretization of an intrinsically defined Langevin diffusion with invariant measure $\text{d}μ_φ\propto e^{-φ} \mathrm{dvol}_g $ on a compact Riemannian manifold. Two estimators of linear functionals of $μ_φ$ based on the discretized Markov process are considered: a time-averaging estimator based on a single trajectory and an ensemble-a… ▽ More

    Submitted 15 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  9. arXiv:2312.12361  [pdf, other

    stat.ME

    Improved multifidelity Monte Carlo estimators based on normalizing flows and dimensionality reduction techniques

    Authors: Andrea Zanoni, Gianluca Geraci, Matteo Salvador, Karthik Menon, Alison L. Marsden, Daniele E. Schiavazzi

    Abstract: We study the problem of multifidelity uncertainty propagation for computationally expensive models. In particular, we consider the general setting where the high-fidelity and low-fidelity models have a dissimilar parameterization both in terms of number of random inputs and their probability distributions, which can be either known in closed form or provided through samples. We derive novel multif… ▽ More

    Submitted 14 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  10. arXiv:2312.11230  [pdf, other

    stat.ML cs.LG

    Dirichlet-based Uncertainty Quantification for Personalized Federated Learning with Improved Posterior Networks

    Authors: Nikita Kotelevskii, Samuel Horváth, Karthik Nandakumar, Martin Takáč, Maxim Panov

    Abstract: In modern federated learning, one of the main challenges is to account for inherent heterogeneity and the diverse nature of data distributions for different clients. This problem is often addressed by introducing personalization of the models towards the data distribution of the particular client. However, a personalized model might be unreliable when applied to the data that is not typical for th… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  11. arXiv:2312.00854  [pdf, other

    physics.med-ph cs.AI cs.LG math.NA stat.CO

    A Probabilistic Neural Twin for Treatment Planning in Peripheral Pulmonary Artery Stenosis

    Authors: John D. Lee, Jakob Richter, Martin R. Pfaller, Jason M. Szafron, Karthik Menon, Andrea Zanoni, Michael R. Ma, Jeffrey A. Feinstein, Jacqueline Kreutzer, Alison L. Marsden, Daniele E. Schiavazzi

    Abstract: The substantial computational cost of high-fidelity models in numerical hemodynamics has, so far, relegated their use mainly to offline treatment planning. New breakthroughs in data-driven architectures and optimization techniques for fast surrogate modeling provide an exciting opportunity to overcome these limitations, enabling the use of such technology for time-critical decisions. We discuss an… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  12. arXiv:2310.13393  [pdf, ps, other

    stat.ML cs.IT cs.LG

    Optimal Best Arm Identification with Fixed Confidence in Restless Bandits

    Authors: P. N. Karthik, Vincent Y. F. Tan, Arpan Mukherjee, Ali Tajer

    Abstract: We study best arm identification in a restless multi-armed bandit setting with finitely many arms. The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space. The state transitions in each arm are captured by an ergodic transition probability matrix (TPM) that is a member of a single-parameter exponential family of TPMs. The real-val… ▽ More

    Submitted 23 June, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted to the IEEE Transactions on Information Theory

  13. arXiv:2309.11512  [pdf, other

    stat.AP cs.LG

    Multidimensional well-being of US households at a fine spatial scale using fused household surveys: fusionACS

    Authors: Kevin Ummel, Miguel Poblete-Cazenave, Karthik Akkiraju, Nick Graetz, Hero Ashman, Cora Kingdon, Steven Herrera Tenorio, Aaryaman "Sunny" Singhal, Daniel Aldana Cohen, Narasimha D. Rao

    Abstract: Social science often relies on surveys of households and individuals. Dozens of such surveys are regularly administered by the U.S. government. However, they field independent, unconnected samples with specialized questions, limiting research questions to those that can be answered by a single survey. The fusionACS project seeks to integrate data from multiple U.S. household surveys by statistical… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 35 pages, 6 figures

  14. arXiv:2307.09423  [pdf, other

    cs.LG cs.AI stat.ML

    Scaling Laws for Imitation Learning in Single-Agent Games

    Authors: Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade

    Abstract: Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scali… ▽ More

    Submitted 10 March, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

  15. arXiv:2307.04998  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Selective Sampling and Imitation Learning via Online Regression

    Authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu

    Abstract: We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be sho… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  16. arXiv:2305.06082  [pdf, ps, other

    cs.LG cs.AI cs.IT math.ST stat.ML

    Best Arm Identification in Bandits with Limited Precision Sampling

    Authors: Kota Srinivas Reddy, P. N. Karthik, Nikhil Karamchandani, Jayakrishnan Nair

    Abstract: We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. The learner can only sample arms via certain exploration bundles, which we refer to as boxes. In particular, at each sampling epoch, the learner selects a box, which in turn causes an arm to get pulled as per a box-specific probability distribution. The pulled a… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: ISIT 2023

  17. arXiv:2304.08740  [pdf, other

    stat.ML cs.LG eess.SP

    Estimating Joint Probability Distribution With Low-Rank Tensor Decomposition, Radon Transforms and Dictionaries

    Authors: Pranava Singhal, Waqar Mirza, Ajit Rajwade, Karthik S. Gurumoorthy

    Abstract: In this paper, we describe a method for estimating the joint probability density from data samples by assuming that the underlying distribution can be decomposed as a mixture of product densities with few mixture components. Prior works have used such a decomposition to estimate the joint density from lower-dimensional marginals, which can be estimated more reliably with the same number of samples… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    MSC Class: 62G07

  18. Estimating Global Identifiability Using Conditional Mutual Information in a Bayesian Framework

    Authors: Sahil Bhola, Karthik Duraisamy

    Abstract: A novel information-theoretic approach is proposed to assess the global practical identifiability of Bayesian statistical models. Based on the concept of conditional mutual information, an estimate of information gained for each model parameter is used to quantify the identifiability with practical considerations. No assumptions are made about the structure of the statistical model or the prior di… ▽ More

    Submitted 21 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  19. arXiv:2303.04390  [pdf, other

    stat.CO q-bio.PE

    Many-core algorithms for high-dimensional gradients on phylogenetic trees

    Authors: Karthik Gangavarapu, Xiang Ji, Guy Baele, Mathieu Fourment, Philippe Lemey, Frederick A. Matsen IV, Marc A. Suchard

    Abstract: The rapid growth in genomic pathogen data spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences $N$. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  20. arXiv:2211.07484  [pdf, ps, other

    cs.LG stat.ML

    Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

    Authors: Aleksandrs Slivkins, Xingyu Zhou, Karthik Abinav Sankararaman, Dylan J. Foster

    Abstract: We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm f… ▽ More

    Submitted 29 June, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: A preliminary version of this paper, authored by A. Slivkins, K.A. Sankararaman and D.J. Foster, has been published at COLT 2023. The present version features an important improvement, due to Xingyu Zhou. Specifically, the $\sqrt{T}$-regret result in Theorem 3.6(a) holds under a much weaker assumption, and is now positioned as the main guarantee

  21. arXiv:2210.14843  [pdf, other

    stat.ML cs.AI cs.LG

    TuneUp: A Simple Improved Training Strategy for Graph Neural Networks

    Authors: Weihua Hu, Kaidi Cao, Kexin Huang, Edward W Huang, Karthik Subbian, Kenji Kawaguchi, Jure Leskovec

    Abstract: Despite recent advances in Graph Neural Networks (GNNs), their training strategies remain largely under-explored. The conventional training strategy learns over all nodes in the original graph(s) equally, which can be sub-optimal as certain nodes are often more difficult to learn than others. Here we present TuneUp, a simple curriculum-based training strategy for improving the predictive performan… ▽ More

    Submitted 26 August, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

  22. arXiv:2209.12667  [pdf, other

    stat.ML cs.LG math.DG math.ST

    Shape And Structure Preserving Differential Privacy

    Authors: Carlos Soto, Karthik Bharath, Matthew Reimherr, Aleksandra Slavkovic

    Abstract: It is common for data structures such as images and shapes of 2D objects to be represented as points on a manifold. The utility of a mechanism to produce sanitized differentially private estimates from such data is intimately linked to how compatible it is with the underlying structure and geometry of the space. In particular, as recently shown, utility of the Laplace mechanism on a positively cur… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: 15 pages (including supplementary material and references), 3 figures (including supplementary material), to be published in NeurIPS 2022

  23. arXiv:2208.09215  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Almost Cost-Free Communication in Federated Best Arm Identification

    Authors: Kota Srinivas Reddy, P. N. Karthik, Vincent Y. F. Tan

    Abstract: We study the problem of best arm identification in a federated learning multi-armed bandit setup with a central server and multiple clients. Each client is associated with a multi-armed bandit in which each arm yields {\em i.i.d.}\ rewards following a Gaussian distribution with an unknown mean and known variance. The set of arms is assumed to be the same at all the clients. We define two notions o… ▽ More

    Submitted 19 December, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: Accepted to AAAI 2023

  24. arXiv:2208.06115  [pdf, other

    stat.ML econ.EM math.OC

    A Nonparametric Approach with Marginals for Modeling Consumer Choice

    Authors: Yanqiu Ruan, Xiaobo Li, Karthyek Murthy, Karthik Natarajan

    Abstract: Given data on the choices made by consumers for different offer sets, a key challenge is to develop parsimonious models that describe and predict consumer choice behavior while being amenable to prescriptive tasks such as pricing and assortment optimization. The marginal distribution model (MDM) is one such model, that requires only the specification of marginal distributions of the random utiliti… ▽ More

    Submitted 24 July, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

  25. arXiv:2207.13797  [pdf, other

    stat.ME econ.EM

    Identification and Inference with Min-over-max Estimators for the Measurement of Labor Market Fairness

    Authors: Karthik Rajkumar

    Abstract: These notes shows how to do inference on the Demographic Parity (DP) metric. Although the metric is a complex statistic involving min and max computations, we propose a smooth approximation of those functions and derive its asymptotic distribution. The limit of these approximations and their gradients converge to those of the true max and min functions, wherever they exist. More importantly, when… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: 12 pages, 3 figures

  26. arXiv:2207.10914  [pdf, other

    stat.ME stat.CO

    Spatially Penalised Registration of Multivariate Functional Data

    Authors: Xiaohan Guo, Sebastian Kurtek, Karthik Bharath

    Abstract: Registration of multivariate functional data involves handling of both cross-component and cross-observation phase variations. Allowing for the two phase variations to be modelled as general diffeomorphic time war**s, in this work we focus on the hitherto unconsidered setting where phase variation of the component functions are spatially correlated. We propose an algorithm to optimize a metric-b… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

  27. arXiv:2206.13063  [pdf, other

    cs.LG math.OC math.ST stat.ML

    On the Complexity of Adversarial Decision Making

    Authors: Dylan J. Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan

    Abstract: A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

  28. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  29. arXiv:2206.03040  [pdf, other

    stat.ML cs.IR cs.LG

    Learning Backward Compatible Embeddings

    Authors: Weihua Hu, Rajas Bansal, Kaidi Cao, Nikhil Rao, Karthik Subbian, Jure Leskovec

    Abstract: Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: KDD 2022, Applied Data Science Track

  30. arXiv:2203.15236  [pdf, ps, other

    stat.ML cs.IT cs.LG

    Best Arm Identification in Restless Markov Multi-Armed Bandits

    Authors: P. N. Karthik, Kota Srinivas Reddy, Vincent Y. F. Tan

    Abstract: We study the problem of identifying the best arm in a multi-armed bandit environment when each arm is a time-homogeneous and ergodic discrete-time Markov process on a common, finite state space. The state evolution on each arm is governed by the arm's transition probability matrix (TPM). A decision entity that knows the set of arm TPMs but not the exact map** of the TPMs to the arms, wishes to f… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: 41 pages

  31. arXiv:2203.01667  [pdf, other

    cs.LG stat.ML

    Joint Probability Estimation Using Tensor Decomposition and Dictionaries

    Authors: Shaan ul Haque, Ajit Rajwade, Karthik S. Gurumoorthy

    Abstract: In this work, we study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals, under the assumption that the joint probability could be decomposed and approximated by a mixture of product densities/mass functions. The problem of estimating the joint probability density function (PDF) using semi-par… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

  32. arXiv:2203.01161  [pdf, ps, other

    math.OC cs.CC cs.LG stat.ML

    Discrete Optimal Transport with Independent Marginals is #P-Hard

    Authors: Bahar Taşkesen, Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, Karthik Natarajan

    Abstract: We study the computational complexity of the optimal transport problem that evaluates the Wasserstein distance between the distributions of two K-dimensional discrete random vectors. The best known algorithms for this problem run in polynomial time in the maximum of the number of atoms of the two distributions. However, if the components of either random vector are independent, then this number ca… ▽ More

    Submitted 14 October, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

  33. Probabilistic Learning of Treatment Trees in Cancer

    Authors: Tsung-Hung Yao, Zhenke Wu, Karthik Bharath, **ju Li, Veerabhadran Baladandayuthapan

    Abstract: Accurate identification of synergistic treatment combinations and their underlying biological mechanisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as patient-derived xenografts (PDX) have emerged as a unique study design evaluating multiple treatments administered to samples from the same human tumor implanted into gene… ▽ More

    Submitted 23 January, 2022; originally announced January 2022.

  34. arXiv:2111.02516  [pdf, other

    math.ST math.DG stat.ML

    Differential Privacy Over Riemannian Manifolds

    Authors: Matthew Reimherr, Karthik Bharath, Carlos Soto

    Abstract: In this work we consider the problem of releasing a differentially private statistical summary that resides on a Riemannian manifold. We present an extension of the Laplace or K-norm mechanism that utilizes intrinsic distances and volumes on the manifold. We also consider in detail the specific case where the summary is the Fréchet mean of data residing on a manifold. We demonstrate that our mecha… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: 15 pages (including supplementary material and references), 2 figures (including supplementary material), published in NeurIPS

  35. arXiv:2109.01965  [pdf, other

    stat.ML cs.LG

    Scalable Feature Selection for (Multitask) Gradient Boosted Trees

    Authors: Cuize Han, Nikhil Rao, Daria Sorokina, Karthik Subbian

    Abstract: Gradient Boosted Decision Trees (GBDTs) are widely used for building ranking and relevance models in search and recommendation. Considerations such as latency and interpretability dictate the use of as few features as possible to train these models. Feature selection in GBDT models typically involves heuristically ranking the features by importance and selecting the top few, or by performing a ful… ▽ More

    Submitted 4 September, 2021; originally announced September 2021.

    Comments: Correct a mistake in the proof of Lemma B1 in http://proceedings.mlr.press/v108/han20a.html

    Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:885-894, 2020

  36. arXiv:2106.15436  [pdf, other

    stat.ME

    Topo-Geometric Analysis of Variability in Point Clouds using Persistence Landscapes

    Authors: James Matuk, Sebastian Kurtek, Karthik Bharath

    Abstract: Topological data analysis provides a set of tools to uncover low-dimensional structure in noisy point clouds. Prominent amongst the tools is persistence homology, which summarizes birth-death times of homological features using data objects known as persistence diagrams. To better aid statistical analysis, a functional representation of the diagrams, known as persistence landscapes, enable use of… ▽ More

    Submitted 1 February, 2024; v1 submitted 29 June, 2021; originally announced June 2021.

  37. arXiv:2106.11880  [pdf, other

    cs.LG stat.ML

    Dynamic Customer Embeddings for Financial Service Applications

    Authors: Nima Chitsazan, Samuel Sharpe, Dwipam Katariya, Qianyu Cheng, Karthik Rajasethupathy

    Abstract: As financial services (FS) companies have experienced drastic technology driven changes, the availability of new data streams provides the opportunity for more comprehensive customer understanding. We propose Dynamic Customer Embeddings (DCE), a framework that leverages customers' digital activity and a wide range of financial context to learn dense representations of customers in the FS industry.… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: ICML Workshop on Representation Learning for Finance and E-Commerce Applications

  38. arXiv:2106.10941  [pdf, other

    stat.ME stat.AP

    Tumor Radiogenomics with Bayesian Layered Variable Selection

    Authors: Shariq Mohammed, Sebastian Kurtek, Karthik Bharath, Arvind Rao, Veerabhadran Baladandayuthapani

    Abstract: We propose a statistical framework to integrate radiological magnetic resonance imaging (MRI) and genomic data to identify the underlying radiogenomic associations in lower grade gliomas (LGG). We devise a novel imaging phenotype by dividing the tumor region into concentric spherical layers that mimics the tumor evolution process. MRI data within each layer is represented by voxel--intensity-based… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

  39. arXiv:2106.07548  [pdf, other

    eess.SY math.DS stat.ML

    A scalable multi-step least squares method for network identification with unknown disturbance topology

    Authors: Stefanie J. M. Fonken, Karthik R. Ramaswamy, Paul M. J. Van den Hof

    Abstract: Identification methods for dynamic networks typically require prior knowledge of the network and disturbance topology, and often rely on solving poorly scalable non-convex optimization problems. While methods for estimating network topology are available in the literature, less attention has been paid to estimating the disturbance topology, i.e., the (spatial) noise correlation structure and the n… ▽ More

    Submitted 25 May, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: 17 pages, 4 figures, accepted and published in Automatica Volume 141, July 2022

    Journal ref: Volume 141, July 2022, 110295

  40. arXiv:2105.03603  [pdf, ps, other

    cs.IT cs.LG stat.ML

    Learning to Detect an Odd Restless Markov Arm with a Trembling Hand

    Authors: P. N. Karthik, Rajesh Sundaresan

    Abstract: This paper studies the problem of finding an anomalous arm in a multi-armed bandit when (a) each arm is a finite-state Markov process, and (b) the arms are restless. Here, anomaly means that the transition probability matrix (TPM) of one of the arms (the odd arm) is different from the common TPM of each of the non-odd arms. The TPMs are unknown to a decision entity that wishes to find the index of… ▽ More

    Submitted 1 June, 2021; v1 submitted 8 May, 2021; originally announced May 2021.

    Comments: 49 pages. A shorter version of this manuscript has been accepted for presentation at the 2021 IEEE International Symposium on Information Theory. This manuscript contains the proofs of all the main results

  41. arXiv:2104.00510  [pdf, other

    stat.ME stat.AP

    RADIOHEAD: Radiogenomic Analysis Incorporating Tumor Heterogeneity in Imaging Through Densities

    Authors: Shariq Mohammed, Karthik Bharath, Sebastian Kurtek, Arvind Rao, Veerabhadran Baladandayuthapani

    Abstract: Recent technological advancements have enabled detailed investigation of associations between the molecular architecture and tumor heterogeneity, through multi-source integration of radiological imaging and genomic (radiogenomic) data. In this paper, we integrate and harness radiogenomic data in patients with lower grade gliomas (LGG), a type of brain cancer, in order to develop a regression frame… ▽ More

    Submitted 7 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  42. arXiv:2103.11864  [pdf, ps, other

    cs.LG eess.SP stat.ML

    Recovery of Joint Probability Distribution from one-way marginals: Low rank Tensors and Random Projections

    Authors: Jian Vora, Karthik S. Gurumoorthy, Ajit Rajwade

    Abstract: Joint probability mass function (PMF) estimation is a fundamental machine learning problem. The number of free parameters scales exponentially with respect to the number of random variables. Hence, most work on nonparametric PMF estimation is based on some structural assumptions such as clique factorization adopted by probabilistic graphical models, imposition of low rank on the joint probability… ▽ More

    Submitted 24 March, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

  43. arXiv:2103.10159  [pdf, other

    cs.LG cs.AI stat.ML

    SPOT: A framework for selection of prototypes using optimal transport

    Authors: Karthik S. Gurumoorthy, Pratik Jawanpuria, Bamdev Mishra

    Abstract: In this work, we develop an optimal transport (OT) based framework to select informative prototypical examples that best represent a given target dataset. Summarizing a given target dataset via representative examples is an important problem in several machine learning applications where human understanding of the learning models and underlying data distribution is essential for decision making. W… ▽ More

    Submitted 5 April, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

  44. arXiv:2103.07501  [pdf, other

    cs.LG stat.ML

    Beyond $\log^2(T)$ Regret for Decentralized Bandits in Matching Markets

    Authors: Soumya Basu, Karthik Abinav Sankararaman, Abishek Sankararaman

    Abstract: We design decentralized algorithms for regret minimization in the two-sided matching market with one-sided bandit feedback that significantly improves upon the prior works (Liu et al. 2020a, 2020b, Sankararaman et al. 2020). First, for general markets, for any $\varepsilon > 0$, we design an algorithm that achieves a $O(\log^{1+\varepsilon}(T))$ regret to the agent-optimal stable matching, with un… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

  45. arXiv:2103.01097  [pdf, other

    stat.ME stat.AP

    Tangent functional canonical correlation analysis for densities and shapes, with applications to multimodal imaging data

    Authors: Min Ho Cho, Sebastian Kurtek, Karthik Bharath

    Abstract: It is quite common for functional data arising from imaging data to assume values in infinite-dimensional manifolds. Uncovering associations between two or more such nonlinear functional data extracted from the same object across medical imaging modalities can assist development of personalized treatment strategies. We propose a method for canonical correlation analysis between paired probability… ▽ More

    Submitted 24 September, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

  46. arXiv:2012.01705  [pdf, ps, other

    cs.LG stat.ML

    Online learning with dynamics: A minimax perspective

    Authors: Kush Bhatia, Karthik Sridharan

    Abstract: We study the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. I… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

    Comments: Published at NeurIPS 2020

  47. arXiv:2011.14005  [pdf, other

    eess.IV cs.CV stat.AP

    Three-dimensional Segmentation of the Scoliotic Spine from MRI using Unsupervised Volume-based MR-CT Synthesis

    Authors: Enamundram M. V. Naga Karthik, Catherine Laporte, Farida Cheriet

    Abstract: Vertebral bone segmentation from magnetic resonance (MR) images is a challenging task. Due to the inherent nature of the modality to emphasize soft tissues of the body, common thresholding algorithms are ineffective in detecting bones in MR images. On the other hand, it is relatively easier to segment bones from CT images because of the high contrast between bones and the surrounding regions. For… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Comments: To appear in the Proceedings of the SPIE Medical Imaging Conference 2021, San Diego, CA. 9 pages, 4 figures in total

    Journal ref: Proceedings Volume 11596, SPIE Medical Imaging 2021: Image Processing; 115961H

  48. arXiv:2011.01979  [pdf, other

    stat.ML cs.LG stat.ME

    High-Dimensional Feature Selection for Sample Efficient Treatment Effect Estimation

    Authors: Kristjan Greenewald, Dmitriy Katz-Rogozhnikov, Karthik Shanmugam

    Abstract: The estimation of causal treatment effects from observational data is a fundamental problem in causal inference. To avoid bias, the effect estimator must control for all confounders. Hence practitioners often collect data for as many covariates as possible to raise the chances of including the relevant confounders. While this addresses the bias, this has the side effect of significantly increasing… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

  49. arXiv:2010.15972  [pdf

    stat.ME stat.AP

    Manufacturing Process Optimization using Statistical Methodologies

    Authors: Karthik Srinivasan, Amit Kumar, Parameshwaran Iyer, Abhinav Joshi

    Abstract: Response Surface Methodology (RSM) introduced in the paper (Box & Wilson, 1951) explores the relationships between explanatory and response variables in complex settings and provides a framework to identify correct settings for the explanatory variables to yield the desired response. RSM involves setting up sequential experimental designs followed by application of elementary optimization methods… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

  50. arXiv:2010.09578  [pdf, other

    stat.ME

    Variograms for spatial functional data with phase variation

    Authors: Xiaohan Guo, Sebastian Kurtek, Karthik Bharath

    Abstract: Spatial, amplitude and phase variations in spatial functional data are confounded. Conclusions from the popular functional trace variogram, which quantifies spatial variation, can be misleading when analysing misaligned functional data with phase variation. To remedy this, we describe a framework that extends amplitude-phase separation methods in functional data to the spatial setting, with a view… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.