Skip to main content

Showing 1–20 of 20 results for author: Theis, F J

.
  1. arXiv:2401.08868  [pdf, other

    cs.CV

    B-Cos Aligned Transformers Learn Human-Interpretable Features

    Authors: Manuel Tran, Amal Lahiani, Yashin Dicente Cid, Melanie Boxberg, Peter Lienemann, Christian Matek, Sophia J. Wagner, Fabian J. Theis, Eldad Klaiman, Tingying Peng

    Abstract: Vision Transformers (ViTs) and Swin Transformers (Swin) are currently state-of-the-art in computational pathology. However, domain experts are still reluctant to use these models due to their lack of interpretability. This is not surprising, as critical decisions need to be transparent and understandable. The most common approach to understanding transformers is to visualize their attention. Howev… ▽ More

    Submitted 18 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at MICCAI 2023 (oral). Camera-ready available at https://doi.org/10.1007/978-3-031-43993-3_50

  2. arXiv:2311.07621  [pdf, other

    q-bio.GN cs.LG

    To Transformers and Beyond: Large Language Models for the Genome

    Authors: Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, Bo Wang

    Abstract: In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on the foundation of traditional convolutional neural networks and recurrent neural networks, we explore… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  3. arXiv:2311.02455  [pdf, other

    cs.LG q-bio.GN q-bio.QM stat.AP

    Mixed Models with Multiple Instance Learning

    Authors: Jan P. Engelmann, Alessandro Palma, Jakub M. Tomczak, Fabian J. Theis, Francesco Paolo Casale

    Abstract: Predicting patient features from single-cell data can help identify cellular states implicated in health and disease. Linear models and average cell type expressions are typically favored for this task for their efficiency and robustness, but they overlook the rich cell heterogeneity inherent in single-cell data. To address this gap, we introduce MixMIL, a framework integrating Generalized Linear… ▽ More

    Submitted 8 March, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: AISTATS 2024 Oral, Code: https://github.com/AIH-SGML/MixMIL

  4. arXiv:2310.14935  [pdf

    cs.LG q-bio.GN

    Causal machine learning for single-cell genomics

    Authors: Alejandro Tejada-Lapuerta, Paul Bertin, Stefan Bauer, Hananeh Aliee, Yoshua Bengio, Fabian J. Theis

    Abstract: Advances in single-cell omics allow for unprecedented insights into the transcription profiles of individual cells. When combined with large-scale perturbation screens, through which specific biological mechanisms can be targeted, these technologies allow for measuring the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the ca… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 35 pages, 7 figures, 3 tables, 1 box

  5. arXiv:2307.00558  [pdf, other

    cs.LG q-bio.QM

    Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity

    Authors: Hananeh Aliee, Ferdinand Kapl, Soroor Hediyeh-Zadeh, Fabian J. Theis

    Abstract: This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors. Our approach identifies both spurious and invariant latent features necessary for achieving accurate reconstruction by placing distinct conditional priors on latent features. The invariant signals are disentangled from noise by enf… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

  6. arXiv:2305.14243  [pdf, other

    cs.AI cs.CV

    Training Transitive and Commutative Multimodal Transformers with LoReTTa

    Authors: Manuel Tran, Yashin Dicente Cid, Amal Lahiani, Fabian J. Theis, Tingying Peng, Eldad Klaiman

    Abstract: Training multimodal foundation models is challenging due to the limited availability of multimodal datasets. While many public datasets pair images with text, few combine images with audio or text with audio. Even rarer are datasets that align all three modalities at once. Critical domains such as healthcare, infrastructure, or transportation are particularly affected by missing modalities. This m… ▽ More

    Submitted 16 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023 (poster). Camera-ready version

  7. Noise transfer for unsupervised domain adaptation of retinal OCT images

    Authors: Valentin Koch, Olle Holmberg, Hannah Spitzer, Johannes Schiefelbein, Ben Asani, Michael Hafner, Fabian J Theis

    Abstract: Optical coherence tomography (OCT) imaging from different camera devices causes challenging domain shifts and can cause a severe drop in accuracy for machine learning models. In this work, we introduce a minimal noise adaptation method based on a singular value decomposition (SVDNA) to overcome the domain gap between target domains from three different device manufacturers in retinal OCT imaging.… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: published at MICCAI 2022

  8. arXiv:2205.07110  [pdf, other

    cs.LG q-bio.QM

    SystemMatch: optimizing preclinical drug models to human clinical outcomes via generative latent-space matching

    Authors: Scott Gigante, Varsha G. Raghavan, Amanda M. Robinson, Robert A. Barton, Adeeb H. Rahman, Drausin F. Wulsin, Jacques Banchereau, Noam Solomon, Luis F. Voloch, Fabian J. Theis

    Abstract: Translating the relevance of preclinical models ($\textit{in vitro}$, animal models, or organoids) to their relevance in humans presents an important challenge during drug development. The rising abundance of single-cell genomic data from human tumors and tissue offers a new opportunity to optimize model systems by their similarity to targeted human cell types in disease. In this work, we introduc… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: Published at the MLDD workshop, ICLR 2022

  9. arXiv:2106.12430  [pdf, other

    cs.LG cs.AI

    Beyond Predictions in Neural ODEs: Identification and Interventions

    Authors: Hananeh Aliee, Fabian J. Theis, Niki Kilbertus

    Abstract: Spurred by tremendous success in pattern matching and prediction tasks, researchers increasingly resort to machine learning to aid original scientific discovery. Given large amounts of observational data about a system, can we uncover the rules that govern its evolution? Solving this task holds the great promise of fully understanding the causal interactions and being able to make reliable predict… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

  10. arXiv:2104.11364  [pdf

    q-bio.OT cs.CY

    A field guide to cultivating computational biology

    Authors: Anne E Carpenter, Casey S Greene, Piero Carnici, Benilton S Carvalho, Michiel de Hoon, Stacey Finley, Kim-Anh Le Cao, Jerry SH Lee, Luigi Marchionni, Suzanne Sindi, Fabian J Theis, Gregory P Way, Jean YH Yang, Elana J Fertig

    Abstract: Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients. This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model. This interdisciplina… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

  11. arXiv:1910.01791  [pdf, other

    cs.LG eess.IV q-bio.CB q-bio.GN stat.ML

    Conditional out-of-sample generation for unpaired data using trVAE

    Authors: Mohammad Lotfollahi, Mohsen Naghipourfar, Fabian J. Theis, F. Alexander Wolf

    Abstract: While generative models have shown great success in generating high-dimensional samples conditional on low-dimensional descriptors (learning e.g. stroke thickness in MNIST, hair color in CelebA, or speaker identity in Wavenet), their generation out-of-sample poses fundamental problems. The conditional variational autoencoder (CVAE) as a simple conditional generative model does not explicitly relat… ▽ More

    Submitted 30 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: Added reference to Johansson et al. (2016) and removed sentences from Lopez et al. (2018) in the background section (see acknowledgements)

  12. arXiv:1909.12550  [pdf

    q-bio.GN q-bio.MN q-bio.PE

    Single-cell eQTLGen Consortium: a personalized understanding of disease

    Authors: Monique G. P. van der Wijst, Dylan H. de Vries, Hilde E. Groot, Gosia Trynka, Chung-Chau Hon, Martijn C. Nawijn, Youssef Idaghdour, Pim van der Harst, Chun J. Ye, Joseph Powell, Fabian J. Theis, Ahmed Mahfouz, Matthias Heinig, Lude Franke

    Abstract: In recent years, functional genomics approaches combining genetic information with bulk RNA-sequencing data have identified the downstream expression effects of disease-associated genetic risk factors through so-called expression quantitative trait locus (eQTL) analysis. Single-cell RNA-sequencing creates enormous opportunities for map** eQTLs across different cell types and in dynamic processes… ▽ More

    Submitted 27 September, 2019; originally announced September 2019.

    Comments: 26 pages, 5 figures, position paper of sc-eQTLGen consortium

  13. arXiv:1810.04281  [pdf, other

    stat.AP q-bio.QM

    Fully integrative data analysis of NMR metabolic fingerprints with comprehensive patient data: a case report based on the German Chronic Kidney Disease (GCKD) study

    Authors: Helena U. Zacharias, Michael Altenbuchinger, Stefan Solbrig, Andreas Schäfer, Mustafa Buyukozkan, Ulla T. Schultheiß, Fruzsina Kotsis, Anna Köttgen, Jan Krumsiek, Fabian J. Theis, Rainer Spang, Peter J. Oefner, Wolfram Gronwald, GCKD study investigators

    Abstract: Omics data facilitate the gain of novel insights into the pathophysiology of diseases and, consequently, their diagnosis, treatment, and prevention. To that end, it is necessary to integrate omics data with other data types such as clinical, phenotypic, and demographic parameters of categorical or continuous nature. Here, we exemplify this data integration issue for a study on chronic kidney disea… ▽ More

    Submitted 8 October, 2018; originally announced October 2018.

  14. arXiv:1511.01658  [pdf, other

    math.OC q-bio.MN

    A simulation-based approach for solving optimisation problems with ODE-type steady state constraints

    Authors: Anna Fiedler, Fabian J. Theis, Jan Hasenauer

    Abstract: Ordinary differential equations (ODEs) are widely used to model biological, (bio-)chemical and technical processes. The parameters of these ODEs are often estimated from experimental data using ODE-constrained optimisation. This article proposes a simple simulation-based approach for solving optimisation problems with steady state constraints relying on an ODE. This simulation-based optimisation m… ▽ More

    Submitted 5 November, 2015; originally announced November 2015.

    Comments: 11 pages, 3 figures

  15. arXiv:1506.06392  [pdf, other

    q-bio.MN q-bio.QM

    Data-driven modelling of biological multi-scale processes

    Authors: Jan Hasenauer, Nick Jagiella, Sabrina Hross, Fabian J. Theis

    Abstract: Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the rel… ▽ More

    Submitted 21 June, 2015; originally announced June 2015.

    Comments: This manuscript will appear in the Journal of Coupled Systems and Multiscale Dynamics (American Scientific Publishers)

    MSC Class: 92Bxx; 93A30

  16. arXiv:1407.2112  [pdf

    cs.GR cs.HC q-bio.QM

    MCA: Multiresolution Correlation Analysis, a graphical tool for subpopulation identification in single-cell gene expression data

    Authors: Justin Feigelman, Fabian J. Theis, Carsten Marr

    Abstract: Background: Biological data often originate from samples containing mixtures of subpopulations, corresponding e.g. to distinct cellular phenotypes. However, identification of distinct subpopulations may be difficult if biological measurements yield distributions that are not easily separable. Results: We present Multiresolution Correlation Analysis (MCA), a method for visually identifying subpopul… ▽ More

    Submitted 8 July, 2014; originally announced July 2014.

    Comments: BioVis 2014 conference

  17. Separation of uncorrelated stationary time series using autocovariance matrices

    Authors: Jari Miettinen, Katrin Illner, Klaus Nordhausen, Hannu Oja, Sara Taskinen, Fabian J. Theis

    Abstract: Blind source separation (BSS) is a signal processing tool, which is widely used in various fields. Examples include biomedical signal separation, brain imaging and economic time series applications. In BSS, one assumes that the observed $p$ time series are linear combinations of $p$ latent uncorrelated weakly stationary time series. The aim is then to find an estimate for an unmixing matrix, which… ▽ More

    Submitted 14 May, 2014; originally announced May 2014.

    MSC Class: 62H05; 62H10

    Journal ref: Journal of Time Series Analysis, Vol 37, 337-354 (2016)

  18. arXiv:1202.4605  [pdf, other

    physics.data-an

    Joining Forces of Bayesian and Frequentist Methodology: A Study for Inference in the Presence of Non-Identifiability

    Authors: Andreas Raue, Clemens Kreutz, Fabian Joachim Theis, Jens Timmer

    Abstract: Increasingly complex applications involve large datasets in combination with non-linear and high dimensional mathematical models. In this context, statistical inference is a challenging issue that calls for pragmatic approaches that take advantage of both Bayesian and frequentist methods. The elegance of Bayesian methodology is founded in the propagation of information content provided by experime… ▽ More

    Submitted 21 February, 2012; originally announced February 2012.

    Comments: Article to appear in Phil. Trans. Roy. Soc. A

    Journal ref: Phil. Trans. R. Soc. A. 371, 20110544, 2013

  19. Stability and multi-attractor dynamics of a toggle switch based on a two-stage model of stochastic gene expression

    Authors: Michael K. Strasser, Fabian J. Theis, Carsten Marr

    Abstract: A toggle switch consists of two genes that mutually repress each other. This regulatory motif is active during cell differentiation and is thought to act as a memory device, being able to choose and maintain cell fate decisions. In this contribution, we study the stability and dynamics of a two-stage gene expression switch within a probabilistic framework inspired by the properties of the Pu/Gata… ▽ More

    Submitted 1 December, 2011; originally announced December 2011.

    Comments: to appear in the Biophysical Journal

  20. Patterns of subnet usage reveal distinct scales of regulation in the transcriptional regulatory network of Escherichia coli

    Authors: Carsten Marr, Fabian J. Theis, Larry S. Liebovitch, Marc-Thorsten Hütt

    Abstract: The set of regulatory interactions between genes, mediated by transcription factors, forms a species' transcriptional regulatory network (TRN). By comparing this network with measured gene expression data one can identify functional properties of the TRN and gain general insight into transcriptional control. We define the subnet of a node as the subgraph consisting of all nodes topologically downs… ▽ More

    Submitted 24 May, 2010; originally announced May 2010.

    Comments: 14 pages, 8 figures, to be published in PLoS Computational Biology