Skip to main content

Showing 1–10 of 10 results for author: Solís-Lemus, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2312.16074  [pdf, other

    q-bio.PE stat.ML

    Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding

    Authors: Yibo Kong, George P. Tiley, Claudia Solis-Lemus

    Abstract: Unsupervised learning has become a staple in classical machine learning, successfully identifying clustering patterns in data across a broad range of domain applications. Surprisingly, despite its accuracy and elegant simplicity, unsupervised learning has not been sufficiently exploited in the realm of phylogenetic tree inference. The main reason for the delay in adoption of unsupervised learning… ▽ More

    Submitted 3 May, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  2. arXiv:2306.11157  [pdf, other

    stat.ML cs.LG stat.AP

    Human Limits in Machine Learning: Prediction of Plant Phenotypes Using Soil Microbiome Data

    Authors: Rosa Aghdam, Xudong Tang, Shan Shan, Richard Lankau, Claudia Solís-Lemus

    Abstract: The preservation of soil health is a critical challenge in the 21st century due to its significant impact on agriculture, human health, and biodiversity. We provide the first deep investigation of the predictive potential of machine learning models to understand the connections between soil and biological phenotypes. We investigate an integrative framework performing accurate machine learning-base… ▽ More

    Submitted 16 February, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

  3. arXiv:2211.16647  [pdf, other

    q-bio.PE math.AG stat.AP

    Ultrafast learning of 4-node hybridization cycles in phylogenetic networks using algebraic invariants

    Authors: Zhaoxing Wu, Claudia Solis-Lemus

    Abstract: Motivation: The abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process, as this process cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heur… ▽ More

    Submitted 9 November, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

  4. arXiv:2209.11730  [pdf, other

    q-bio.GN stat.AP

    BioKlustering: a web app for semi-supervised learning of maximally imbalanced genomic data

    Authors: Samuel Ozminkowski, Yuke Wu, Liule Yang, Zhiwen Xu, Luke Selberg, Chunrong Huang, Claudia Solis-Lemus

    Abstract: Summary: Accurate phenotype prediction from genomic sequences is a highly coveted task in biological and medical research. While machine-learning holds the key to accurate prediction in a variety of fields, the complexity of biological data can render many methodologies inapplicable. We introduce BioKlustering, a user-friendly open-source and publicly available web app for unsupervised and semi-su… ▽ More

    Submitted 26 September, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

  5. arXiv:2208.05600  [pdf, other

    stat.AP

    Identifying microbial drivers in biological phenotypes with a Bayesian Network Regression model

    Authors: Samuel Ozminkowski, Claudia Solis-Lemus

    Abstract: 1. In Bayesian Network Regression models, networks are considered the predictors of continuous responses. These models have been successfully used in brain research to identify regions in the brain that are associated with specific human traits, yet their potential to elucidate microbial drivers in biological phenotypes for microbiome research remains unknown. In particular, microbial networks are… ▽ More

    Submitted 20 January, 2024; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: 62 pages, 49 figures

  6. arXiv:2207.07020  [pdf, other

    stat.ME

    Estimating sparse direct effects in multivariate regression with the spike-and-slab LASSO

    Authors: Yunyi Shen, Claudia Solís-Lemus, Sameer K. Deshpande

    Abstract: The multivariate regression interpretation of the Gaussian chain graph model simultaneously parametrizes (i) the direct effects of $p$ predictors on $q$ outcomes and (ii) the residual partial covariances between pairs of outcomes. We introduce a new method for fitting sparse Gaussian chain graph models with spike-and-slab LASSO (SSL) priors. We develop an Expectation Conditional Maximization algor… ▽ More

    Submitted 26 March, 2024; v1 submitted 14 July, 2022; originally announced July 2022.

  7. arXiv:2107.13763  [pdf, other

    stat.AP

    CARlasso: An R package for the estimation of sparse microbial networks with predictors

    Authors: Yunyi Shen, Claudia Solis-Lemus

    Abstract: Microbiome data analyses require statistical tools that can simultaneously decode microbes' reactions to the environment and interactions among microbes. We introduce CARlasso, the first user-friendly open-source and publicly available R package to fit a chain graph model for the inference of sparse microbial networks that represent both interactions among nodes and effects of a set of predictors.… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

  8. arXiv:2107.01306  [pdf, other

    stat.ME math.ST

    The Effect of the Prior and the Experimental Design on the Inference of the Precision Matrix in Gaussian Chain Graph Models

    Authors: Yunyi Shen, Claudia Solis-Lemus

    Abstract: Here, we investigate whether (and how) experimental design could aid in the estimation of the precision matrix in a Gaussian chain graph model, especially the interplay between the design, the effect of the experiment and prior knowledge about the effect. Estimation of the precision matrix is a fundamental task to infer biological graphical structures like microbial networks. We compare the margin… ▽ More

    Submitted 29 November, 2023; v1 submitted 2 July, 2021; originally announced July 2021.

  9. arXiv:2012.08397  [pdf, other

    stat.AP stat.ME

    Bayesian Chain Graph LASSO Models to Learn Sparse Microbial Networks with Predictors

    Authors: Yunyi Shen, Claudia Solis-Lemus

    Abstract: Microbiome data require statistical models that can simultaneously decode microbes' reaction to the environment and interactions among microbes. While a multiresponse linear regression model seems like a straight-forward solution, we argue that treating it as a graphical model is flawed given that the regression coefficient matrix does not encode the conditional dependence structure between respon… ▽ More

    Submitted 23 July, 2022; v1 submitted 15 December, 2020; originally announced December 2020.

    MSC Class: 62H10; 62P10

  10. arXiv:1509.06075  [pdf, ps, other

    q-bio.PE math.ST stat.AP stat.CO

    Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

    Authors: Claudia Solís-Lemus, Cécile Ané

    Abstract: Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are st… ▽ More

    Submitted 12 February, 2016; v1 submitted 20 September, 2015; originally announced September 2015.