Skip to main content

Showing 1–10 of 10 results for author: Terhorst, J

.
  1. arXiv:2403.01684  [pdf, other

    stat.ME stat.ML

    Dendrogram of mixing measures: Hierarchical clustering and model selection for finite mixture models

    Authors: Dat Do, Linh Do, Scott A. McKinley, Jonathan Terhorst, XuanLong Nguyen

    Abstract: We present a new way to summarize and select mixture models via the hierarchical clustering tree (dendrogram) constructed from an overfitted latent mixing measure. Our proposed method bridges agglomerative hierarchical clustering and mixture modeling. The dendrogram's construction is derived from the theory of convergence of the mixing measures, and as a result, we can both consistently select the… ▽ More

    Submitted 8 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 53 pages, 11 figures

  2. arXiv:2111.10841  [pdf, other

    stat.ME

    A linear adjustment based approach to posterior drift in transfer learning

    Authors: Subha Maity, Diptavo Dutta, Jonathan Terhorst, Yuekai Sun, Moulinath Banerjee

    Abstract: We present a new model and methods for the posterior drift problem where the regression function in the target domain is modeled as a linear adjustment (on an appropriate scale) of that in the source domain, an idea that inherits the simplicity and the usefulness of generalized linear models and accelerated failure time models from the classical statistics literature, and study the theoretical pro… ▽ More

    Submitted 12 December, 2021; v1 submitted 21 November, 2021; originally announced November 2021.

  3. arXiv:2008.06664  [pdf, other

    math.ST math.PR

    Exact and arbitrarily accurate non-parametric two-sample tests based on rank spacings

    Authors: Dan D. Erdmann-Pham, Jonathan Terhorst, Yun S. Song

    Abstract: A common method for deriving non-parametric tests is to reformulate a parametric test in terms of sample ranks. Despite being distribution free (even in finite samples), the resulting tests often display remarkable asymptotic power properties, typically matching the efficiency of their parametric counterpart. Empirically, these favorable power properties have been shown to persist in non-asymptoti… ▽ More

    Submitted 8 August, 2022; v1 submitted 15 August, 2020; originally announced August 2020.

    Comments: 33 pages, 6 figures

  4. arXiv:2003.01640  [pdf, other

    cs.LG stat.ML

    Explaining Groups of Points in Low-Dimensional Representations

    Authors: Gregory Plumb, Jonathan Terhorst, Sriram Sankararaman, Ameet Talwalkar

    Abstract: A common workflow in data exploration is to learn a low-dimensional representation of the data, identify groups of points in that representation, and examine the differences between the groups to determine what they represent. We treat this workflow as an interpretable machine learning problem by leveraging the model that learned the low-dimensional representation to help identify the key differen… ▽ More

    Submitted 14 August, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

  5. arXiv:1807.02763  [pdf, other

    q-bio.PE

    Inference of Population History using Coalescent HMMs: Review and Outlook

    Authors: Jeffrey P. Spence, Matthias Steinrücken, Jonathan Terhorst, Yun S. Song

    Abstract: Studying how diverse human populations are related is of historical and anthropological interest, in addition to providing a realistic null model for testing for signatures of natural selection or disease associations. Furthermore, understanding the demographic histories of other species is playing an increasingly important role in conservation genetics. A number of statistical methods have been d… ▽ More

    Submitted 8 July, 2018; originally announced July 2018.

    Comments: 12 pages, 2 figures

  6. arXiv:1505.04228  [pdf, other

    q-bio.PE math.ST

    Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum

    Authors: Jonathan Terhorst, Yun S. Song

    Abstract: The sample frequency spectrum (SFS) of DNA sequences from a collection of individuals is a summary statistic which is commonly used for parametric inference in population genetics. Despite the popularity of SFS-based inference methods, currently little is known about the information-theoretic limit on the estimation accuracy as a function of sample size. Here, we show that using the SFS to estimat… ▽ More

    Submitted 15 May, 2015; originally announced May 2015.

    Comments: 17 pages, 1 figure

    Journal ref: Proc. Natl. Acad. Sci. U.S.A., Vol. 112, No. 25 (2015) 7677-7682

  7. Efficient computation of the joint sample frequency spectra for multiple populations

    Authors: John A. Kamm, Jonathan Terhorst, Yun S. Song

    Abstract: A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences. In particular, recently there has been growing interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including v… ▽ More

    Submitted 3 March, 2015; originally announced March 2015.

    Comments: 24 pages, 5 figures

  8. arXiv:1409.1458  [pdf, ps, other

    cs.LG math.OC stat.ML

    Communication-Efficient Distributed Dual Coordinate Ascent

    Authors: Martin Jaggi, Virginia Smith, Martin Takáč, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I. Jordan

    Abstract: Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In this paper, we propose a communication-efficient framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algor… ▽ More

    Submitted 29 September, 2014; v1 submitted 4 September, 2014; originally announced September 2014.

    Comments: NIPS 2014 version, including proofs. Published in Advances in Neural Information Processing Systems 27 (NIPS 2014)

    MSC Class: 90C25; 68W15 ACM Class: G.1.6; C.1.4

  9. arXiv:1310.8420  [pdf, other

    q-bio.GN q-bio.QM

    SMaSH: A Benchmarking Toolkit for Human Genome Variant Calling

    Authors: Ameet Talwalkar, Jesse Liptrap, Julie Newcomb, Christopher Hartl, Jonathan Terhorst, Kristal Curtis, Ma'ayan Bresler, Yun S. Song, Michael I. Jordan, David Patterson

    Abstract: Motivation: Computational methods are essential to extract actionable information from raw sequencing data, and to thus fulfill the promise of next-generation sequencing technology. Unfortunately, computational tools developed to call variants from human sequencing data disagree on many of their predictions, and current methods to evaluate accuracy and computational performance are ad-hoc and inco… ▽ More

    Submitted 5 January, 2014; v1 submitted 31 October, 2013; originally announced October 2013.

  10. arXiv:1102.3177  [pdf, other

    math.CO q-bio.QM

    The Kalmanson Complex

    Authors: Jonathan Terhorst

    Abstract: Let X be a finite set of cardinality n. The Kalmanson complex K_n is the simplicial complex whose vertices are non-trivial X-splits, and whose facets are maximal circular split systems over X. In this paper we examine K_n from three perspectives. In addition to the T-theoretic description, we show that K_n has a geometric realization as the Kalmanson conditions on a finite metric. A third descript… ▽ More

    Submitted 6 March, 2011; v1 submitted 15 February, 2011; originally announced February 2011.

    Comments: Improved exposition. 24 pages, 2 figures, 1 table

    MSC Class: 05E45