Search | arXiv e-print repository

arXiv:2312.10072 [pdf, other]

Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk

Authors: Colleen Chan, Kisung You, Sunny Chung, Mauro Giuffrè, Theo Saarinen, Niroop Rajashekar, Yuan Pu, Yeo Eun Shin, Loren Laine, Ambrose Wong, René Kizilcec, Jasjeet Sekhon, Dennis Shung

Abstract: Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electroni… ▽ More Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electronic health record (EHR) with emergency medicine physicians, internal medicine physicians, and medical students to evaluate its effect on physician acceptance and trust in AI clinical decision support systems (AI-CDSS). GutGPT provides risk predictions from a validated machine learning model and evidence-based answers by querying extracted clinical guidelines. Participants were randomized to GutGPT and an interactive dashboard, or the interactive dashboard and a search engine. Surveys and educational assessments taken before and after measured technology acceptance and content mastery. Preliminary results showed mixed effects on acceptance after using GutGPT compared to the dashboard or search engine but appeared to improve content mastery based on simulation performance. Overall, this study demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented optimally and paired with interactive interfaces. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10, 2023, New Orleans, United States, 11 pages

arXiv:2307.15213 [pdf, other]

PCA, SVD, and Centering of Data

Authors: Donggun Kim, Kisung You

Abstract: The research detailed in this paper scrutinizes Principal Component Analysis (PCA), a seminal method employed in statistics and machine learning for the purpose of reducing data dimensionality. Singular Value Decomposition (SVD) is often employed as the primary means for computing PCA, a process that indispensably includes the step of centering - the subtraction of the mean location from the data… ▽ More The research detailed in this paper scrutinizes Principal Component Analysis (PCA), a seminal method employed in statistics and machine learning for the purpose of reducing data dimensionality. Singular Value Decomposition (SVD) is often employed as the primary means for computing PCA, a process that indispensably includes the step of centering - the subtraction of the mean location from the data set. In our study, we delve into a detailed exploration of the influence of this critical yet often ignored or downplayed data centering step. Our research meticulously investigates the conditions under which two PCA embeddings, one derived from SVD with centering and the other without, can be viewed as aligned. As part of this exploration, we analyze the relationship between the first singular vector and the mean direction, subsequently linking this observation to the congruity between two SVDs of centered and uncentered matrices. Furthermore, we explore the potential implications arising from the absence of centering in the context of performing PCA via SVD from a spectral analysis standpoint. Our investigation emphasizes the importance of a comprehensive understanding and acknowledgment of the subtleties involved in the computation of PCA. As such, we believe this paper offers a crucial contribution to the nuanced understanding of this foundational statistical method and stands as a valuable addition to the academic literature in the field of statistics. △ Less

Submitted 1 April, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

Comments: 16 pages, 2 figures

arXiv:2209.03318 [pdf, other]

On the Wasserstein median of probability measures

Authors: Kisung You, Dennis Shung

Abstract: Measures of central tendency such as mean and median are a primary way to summarize a given collection of random objects. In the field of optimal transport, the Wasserstein barycenter corresponds to Fréchet or geometric mean of a set of probability measures, which is defined as a minimizer of the sum of squared distances to each element in a given set when the order is 2. We present the Wasserstei… ▽ More Measures of central tendency such as mean and median are a primary way to summarize a given collection of random objects. In the field of optimal transport, the Wasserstein barycenter corresponds to Fréchet or geometric mean of a set of probability measures, which is defined as a minimizer of the sum of squared distances to each element in a given set when the order is 2. We present the Wasserstein median, an equivalent of Fréchet median under the 2-Wasserstein metric, as a robust alternative to the Wasserstein barycenter. We first establish existence and consistency of the Wasserstein median. We also propose a generic algorithm that makes use of any established routine for the Wasserstein barycenter in an iterative manner and prove its convergence. Our proposal is validated with simulated and real data examples when the objects of interest are univariate distributions, centered Gaussian distributions, and discrete measures on regular grids. △ Less

Submitted 8 September, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: 27 pages, 9 figures

MSC Class: 49Q22

arXiv:2208.12435 [pdf, other]

Comparing multiple latent space embeddings using topological analysis

Authors: Kisung You, Ilmun Kim, Ick Hoon **, Minjeong Jeon, Dennis Shung

Abstract: The latent space model is one of the well-known methods for statistical inference of network data. While the model has been much studied for a single network, it has not attracted much attention to analyze collectively when multiple networks and their latent embeddings are present. We adopt a topology-based representation of latent space embeddings to learn over a population of network model fits,… ▽ More The latent space model is one of the well-known methods for statistical inference of network data. While the model has been much studied for a single network, it has not attracted much attention to analyze collectively when multiple networks and their latent embeddings are present. We adopt a topology-based representation of latent space embeddings to learn over a population of network model fits, which allows us to compare networks of potentially varying sizes in an invariant manner to label permutation and rigid motion. This approach enables us to propose algorithms for clustering and multi-sample hypothesis tests by adopting well-established theories for Hilbert space-valued analysis. After the proposed method is validated via simulated examples, we apply the framework to analyze educational survey data from Korean innovative school reform. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: 46 pages, 11 figures

arXiv:2208.11929 [pdf, other]

On the spherical Laplace distribution

Authors: Kisung You, Dennis Shung

Abstract: The von Mises-Fisher (vMF) distribution has long been a mainstay for inference with data on the unit hypersphere in directional statistics. The performance of statistical inference based on the vMF distribution, however, may suffer when there are significant outliers and noise in the data. Based on an analogy of the median as a robust measure of central tendency and its relationship to the Laplace… ▽ More The von Mises-Fisher (vMF) distribution has long been a mainstay for inference with data on the unit hypersphere in directional statistics. The performance of statistical inference based on the vMF distribution, however, may suffer when there are significant outliers and noise in the data. Based on an analogy of the median as a robust measure of central tendency and its relationship to the Laplace distribution, we proposed the spherical Laplace (SL) distribution, a novel probability measure for modelling directional data. We present a sampling scheme and theoretical results on maximum likelihood estimation. We derive efficient numerical routines for parameter estimation in the absence of closed-form formula. An application of model-based clustering is considered under the finite mixture model framework. Our numerical methods for parameter estimation and clustering are validated using simulated and real data experiments. △ Less

Submitted 7 September, 2022; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: 28 pages, 6 figures

MSC Class: 62F10; 62H11; 62H12; 62H30; 62R30

arXiv:2112.02580 [pdf, other]

Bayesian Optimal Two-sample Tests in High-dimension

Authors: Kyoungjae Lee, Kisung You, Lizhen Lin

Abstract: We propose optimal Bayesian two-sample tests for testing equality of high-dimensional mean vectors and covariance matrices between two populations. In many applications including genomics and medical imaging, it is natural to assume that only a few entries of two mean vectors or covariance matrices are different. Many existing tests that rely on aggregating the difference between empirical means o… ▽ More We propose optimal Bayesian two-sample tests for testing equality of high-dimensional mean vectors and covariance matrices between two populations. In many applications including genomics and medical imaging, it is natural to assume that only a few entries of two mean vectors or covariance matrices are different. Many existing tests that rely on aggregating the difference between empirical means or covariance matrices are not optimal or yield low power under such setups. Motivated by this, we develop Bayesian two-sample tests employing a divide-and-conquer idea, which is powerful especially when the difference between two populations is sparse but large. The proposed two-sample tests manifest closed forms of Bayes factors and allow scalable computations even in high-dimensions. We prove that the proposed tests are consistent under relatively mild conditions compared to existing tests in the literature. Furthermore, the testable regions from the proposed tests turn out to be optimal in terms of rates. Simulation studies show clear advantages of the proposed tests over other state-of-the-art methods in various scenarios. Our tests are also applied to the analysis of the gene expression data of two cancer data sets. △ Less

Submitted 5 December, 2021; originally announced December 2021.

arXiv:2106.06375 [pdf, other]

doi 10.1016/j.csda.2022.107457

Parameter Estimation and Model-Based Clustering with Spherical Normal Distribution on the Unit Hypersphere

Authors: Kisung You

Abstract: In directional statistics, the von Mises-Fisher (vMF) distribution is one of the most basic and popular probability distributions for data on the unit hypersphere. Recently, the spherical normal (SN) distribution was proposed as an intrinsic counterpart to the vMF distribution by replacing the standard Euclidean norm with the great-circle distance, which is the shortest path joining two points on… ▽ More In directional statistics, the von Mises-Fisher (vMF) distribution is one of the most basic and popular probability distributions for data on the unit hypersphere. Recently, the spherical normal (SN) distribution was proposed as an intrinsic counterpart to the vMF distribution by replacing the standard Euclidean norm with the great-circle distance, which is the shortest path joining two points on the unit sphere. We propose numerical approaches for parameter estimation since there are no analytic formula available. We consider the estimation problems in a general setting where non-negative weights are assigned to observations. This leads to a more interesting contribution for model-based clustering on the unit hypersphere by finite mixture model with SN distributions. We validate efficiency of optimization-based estimation procedures and effectiveness of SN mixture model using simulated and real data examples. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2106.02096 [pdf, ps, other]

Shape-Preserving Dimensionality Reduction : An Algorithm and Measures of Topological Equivalence

Authors: Byeongsu Yu, Kisung You

Abstract: We introduce a linear dimensionality reduction technique preserving topological features via persistent homology. The method is designed to find linear projection $L$ which preserves the persistent diagram of a point cloud $\mathbb{X}$ via simulated annealing. The projection $L$ induces a set of canonical simplicial maps from the Rips (or Čech) filtration of $\mathbb{X}$ to that of $L\mathbb{X}$.… ▽ More We introduce a linear dimensionality reduction technique preserving topological features via persistent homology. The method is designed to find linear projection $L$ which preserves the persistent diagram of a point cloud $\mathbb{X}$ via simulated annealing. The projection $L$ induces a set of canonical simplicial maps from the Rips (or Čech) filtration of $\mathbb{X}$ to that of $L\mathbb{X}$. In addition to the distance between persistent diagrams, the projection induces a map between filtrations, called filtration homomorphism. Using the filtration homomorphism, one can measure the difference between shapes of two filtrations directly comparing simplicial complexes with respect to quasi-isomorphism $μ_{\operatorname{quasi-iso}}$ or strong homotopy equivalence $μ_{\operatorname{equiv}}$. These $μ_{\operatorname{quasi-iso}}$ and $μ_{\operatorname{equiv}}$ measures how much portion of corresponding simplicial complexes is quasi-isomorphic or homotopy equivalence respectively. We validate the effectiveness of our framework with simple examples. △ Less

Submitted 13 June, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: 18 pages, 2 figures

arXiv:2005.11107 [pdf, other]

doi 10.1016/j.simpa.2022.100414

Rdimtools: An R package for Dimension Reduction and Intrinsic Dimension Estimation

Authors: Kisung You

Abstract: Discovering patterns of the complex high-dimensional data is a long-standing problem. Dimension Reduction (DR) and Intrinsic Dimension Estimation (IDE) are two fundamental thematic programs that facilitate geometric understanding of the data. We present Rdimtools - an R package that supports 133 DR and 17 IDE algorithms whose extent makes multifaceted scrutiny of the data in one place easier. Rdim… ▽ More Discovering patterns of the complex high-dimensional data is a long-standing problem. Dimension Reduction (DR) and Intrinsic Dimension Estimation (IDE) are two fundamental thematic programs that facilitate geometric understanding of the data. We present Rdimtools - an R package that supports 133 DR and 17 IDE algorithms whose extent makes multifaceted scrutiny of the data in one place easier. Rdimtools is distributed under the MIT license and is accessible from CRAN, GitHub, and its package website, all of which deliver instruction for installation, self-contained examples, and API documentation. △ Less

Submitted 22 May, 2020; originally announced May 2020.

arXiv:2003.00433 [pdf, other]

Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

Authors: Xingyu Sha, Jiaqi Zhang, Keyou You, Kaiqing Zhang, Tamer Başar

Abstract: This paper proposes a \emph{fully asynchronous} scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors. This is in sharp contrast to the gossip-based scheme wher… ▽ More This paper proposes a \emph{fully asynchronous} scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors. This is in sharp contrast to the gossip-based scheme where a pair of nodes concurrently update. Though the fully asynchronous setting involves a difficult multi-timescale decision problem, we design a novel stochastic average gradient (SAG) based distributed algorithm and develop a push-pull augmented graph approach to prove its exact convergence at a linear rate of $\mathcal{O}(c^k)$ where $c\in(0,1)$ and $k$ increases by one no matter on which node updates. Finally, numerical experiments validate that our method speeds up linearly with respect to the number of nodes, and is robust to straggler nodes. △ Less

Submitted 22 January, 2021; v1 submitted 1 March, 2020; originally announced March 2020.

arXiv:1911.02748 [pdf, other]

doi 10.1080/10618600.2019.1704295

Data transforming augmentation for heteroscedastic models

Authors: Hyungsuk Tak, Kisung You, Sujit K. Ghosh, Bingyue Su, Joseph Kelly

Abstract: Data augmentation (DA) turns seemingly intractable computational problems into simple ones by augmenting latent missing data. In addition to computational simplicity, it is now well-established that DA equipped with a deterministic transformation can improve the convergence speed of iterative algorithms such as an EM algorithm or Gibbs sampler. In this article, we outline a framework for the trans… ▽ More Data augmentation (DA) turns seemingly intractable computational problems into simple ones by augmenting latent missing data. In addition to computational simplicity, it is now well-established that DA equipped with a deterministic transformation can improve the convergence speed of iterative algorithms such as an EM algorithm or Gibbs sampler. In this article, we outline a framework for the transformation-based DA, which we call data transforming augmentation (DTA), allowing augmented data to be a deterministic function of latent and observed data, and unknown parameters. Under this framework, we investigate a novel DTA scheme that turns heteroscedastic models into homoscedastic ones to take advantage of simpler computations typically available in homoscedastic cases. Applying this DTA scheme to fitting linear mixed models, we demonstrate simpler computations and faster convergence rates of resulting iterative algorithms, compared with those under a non-transformation-based DA scheme. We also fit a Beta-Binomial model using the proposed DTA scheme, which enables sampling approximate marginal posterior distributions that are available only under homoscedasticity. An R package, Rdta, is publicly available at CRAN. △ Less

Submitted 27 January, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

arXiv:1909.02712 [pdf, other]

Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization

Authors: Jiaqi Zhang, Keyou You

Abstract: This paper studies a decentralized stochastic gradient tracking (DSGT) algorithm for non-convex empirical risk minimization problems over a peer-to-peer network of nodes, which is in sharp contrast to the existing DSGT only for convex problems. To ensure exact convergence and handle the variance among decentralized datasets, each node performs a stochastic gradient (SG) tracking step by using a mi… ▽ More This paper studies a decentralized stochastic gradient tracking (DSGT) algorithm for non-convex empirical risk minimization problems over a peer-to-peer network of nodes, which is in sharp contrast to the existing DSGT only for convex problems. To ensure exact convergence and handle the variance among decentralized datasets, each node performs a stochastic gradient (SG) tracking step by using a mini-batch of samples, where the batch size is designed to be proportional to the size of the local dataset. We explicitly evaluate the convergence rate of DSGT with respect to the number of iterations in terms of algebraic connectivity of the network, mini-batch size, gradient variance, etc. Under certain conditions, we further show that DSGT has a network independence property in the sense that the network topology only affects the convergence rate up to a constant factor. Hence, the convergence rate of DSGT can be comparable to the centralized SGD method. Moreover, a linear speedup of DSGT with respect to the number of nodes is achievable for some scenarios. Numerical experiments for neural networks and logistic regression problems on CIFAR-10 finally illustrate the advantages of DSGT. △ Less

Submitted 28 August, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

Comments: This paper has been revised and theoretical results are improved

arXiv:1908.01878 [pdf, other]

How Does Learning Rate Decay Help Modern Neural Networks?

Authors: Kaichao You, Mingsheng Long, Jianmin Wang, Michael I. Jordan

Abstract: Learning rate decay (lrDecay) is a \emph{de facto} technique for training modern neural networks. It starts with a large learning rate and then decays it multiple times. It is empirically observed to help both optimization and generalization. Common beliefs in how lrDecay works come from the optimization analysis of (Stochastic) Gradient Descent: 1) an initially large learning rate accelerates tra… ▽ More Learning rate decay (lrDecay) is a \emph{de facto} technique for training modern neural networks. It starts with a large learning rate and then decays it multiple times. It is empirically observed to help both optimization and generalization. Common beliefs in how lrDecay works come from the optimization analysis of (Stochastic) Gradient Descent: 1) an initially large learning rate accelerates training or helps the network escape spurious local minima; 2) decaying the learning rate helps the network converge to a local minimum and avoid oscillation. Despite the popularity of these common beliefs, experiments suggest that they are insufficient in explaining the general effectiveness of lrDecay in training modern neural networks that are deep, wide, and nonconvex. We provide another novel explanation: an initially large learning rate suppresses the network from memorizing noisy data while decaying the learning rate improves the learning of complex patterns. The proposed explanation is validated on a carefully-constructed dataset with tractable pattern complexity. And its implication, that additional patterns learned in later stages of lrDecay are more complex and thus less transferable, is justified in real-world datasets. We believe that this alternative explanation will shed light into the design of better training strategies for modern neural networks. △ Less

Submitted 26 September, 2019; v1 submitted 5 August, 2019; originally announced August 2019.

Comments: title changed

arXiv:1810.05297 [pdf, other]

Bayesian Hierarchical Spatial Model for Small Area Estimation with Non-ignorable Nonresponses and Its Applications to the NHANES Dental Caries Assessments

Authors: Ick Hoon **, Fang Liu, Evercita C. Eugenio, Kisung You, Suyu Liu

Abstract: The National Health and Nutrition Examination Survey (NHANES) is a major program of the National Center for Health Statistics, designed to assess the health and nutritional status of adults and children in the United States. The analysis of NHANES dental caries data faces several challenges, including (1) the data were collected using a complex, multistage, stratified, unequal-probability sampling… ▽ More The National Health and Nutrition Examination Survey (NHANES) is a major program of the National Center for Health Statistics, designed to assess the health and nutritional status of adults and children in the United States. The analysis of NHANES dental caries data faces several challenges, including (1) the data were collected using a complex, multistage, stratified, unequal-probability sampling design; (2) the sample size of some primary sampling units (PSU), e.g., counties, is very small; (3) the measures of dental caries have complicated structure and correlation, and (4) there is a substantial percentage of nonresponses, for which the missing data are expected to be not missing at random or non-ignorable. We propose a Bayesian hierarchical spatial model to address these analysis challenges. We develop a two-level Potts model that closely resembles the caries evolution process and captures complicated spatial correlations between teeth and surfaces of the teeth. By adding Bayesian hierarchies to the Potts model, we account for the multistage survey sampling design and also enable information borrowing across PSUs for small area estimation. We incorporate sampling weights by including them as a covariate in the model and adopt flexible B-splines to achieve robust inference. We account for non-ignorable missing outcomes and covariates using the selection model. We use data augmentation coupled with the noisy exchange sampler to obtain the posterior of model parameters that involve doubly-intractable normalizing constants. Our analysis results show strong spatial associations between teeth and tooth surfaces and that dental hygienic factors, fluorosis and sealant reduce the risks of having dental diseases. △ Less

Submitted 14 October, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

arXiv:1810.02906 [pdf, other]

Network Distance Based on Laplacian Flows on Graphs

Authors: Dianbin Bao, Kisung You, Lizhen Lin

Abstract: Distance plays a fundamental role in measuring similarity between objects. Various visualization techniques and learning tasks in statistics and machine learning such as shape matching, classification, dimension reduction and clustering often rely on some distance or similarity measure. It is of tremendous importance to have a distance that can incorporate the underlying structure of the object. I… ▽ More Distance plays a fundamental role in measuring similarity between objects. Various visualization techniques and learning tasks in statistics and machine learning such as shape matching, classification, dimension reduction and clustering often rely on some distance or similarity measure. It is of tremendous importance to have a distance that can incorporate the underlying structure of the object. In this paper, we focus on proposing such a distance between network objects. Our key insight is to define a distance based on the long term diffusion behavior of the whole network. We first introduce a dynamic system on graphs called Laplacian flow. Based on this Laplacian flow, a new version of diffusion distance between networks is proposed. We will demonstrate the utility of the distance and its advantage over various existing distances through explicit examples. The distance is also applied to subsequent learning tasks such as clustering network objects. △ Less

Submitted 5 October, 2018; originally announced October 2018.

Showing 1–15 of 15 results for author: You, K