Skip to main content

Showing 1–5 of 5 results for author: Gordon-Rodriguez, E

Searching in archive stat. Search in all archives.
.
  1. arXiv:2205.09906  [pdf, other

    stat.ML cs.LG

    Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

    Authors: Elliott Gordon-Rodriguez, Thomas P. Quinn, John P. Cunningham

    Abstract: Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human mi… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  2. arXiv:2204.13290  [pdf, other

    stat.ML cs.LG

    On the Normalizing Constant of the Continuous Categorical Distribution

    Authors: Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, Andres Potapczynski, John P. Cunningham

    Abstract: Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in c… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

  3. arXiv:2104.07266  [pdf, other

    stat.ME q-bio.GN

    A Critique of Differential Abundance Analysis, and Advocacy for an Alternative

    Authors: Thomas P Quinn, Elliott Gordon-Rodriguez, Ionas Erb

    Abstract: It is largely taken for granted that differential abundance analysis is, by default, the best first step when analyzing genomic data. We argue that this is not necessarily the case. In this article, we identify key limitations that are intrinsic to differential abundance analysis: it is (a) dependent on unverifiable assumptions, (b) an unreliable construct, and (c) overly reductionist. We formulat… ▽ More

    Submitted 7 June, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

  4. arXiv:2011.05231  [pdf, other

    stat.ML cs.LG

    Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning

    Authors: Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, Geoff Pleiss, John P. Cunningham

    Abstract: Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing a… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  5. arXiv:2002.08563  [pdf, other

    stat.ML cs.LG

    The continuous categorical: a novel simplex-valued exponential family

    Authors: Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John P. Cunningham

    Abstract: Simplex-valued data appear throughout statistics and machine learning, for example in the context of transfer learning and compression of deep networks. Existing models for this class of data rely on the Dirichlet distribution or other related loss functions; here we show these standard choices suffer systematically from a number of limitations, including bias and numerical issues that frustrate t… ▽ More

    Submitted 8 June, 2020; v1 submitted 19 February, 2020; originally announced February 2020.