Search | arXiv e-print repository

Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types

Authors: Shentong Mo, Xi Fu, Chenyang Hong, Yizhen Chen, Yuxuan Zheng, Xiangru Tang, Zhiqiang Shen, Eric P Xing, Yanyan Lan

Abstract: In the genome biology research, regulatory genome modeling is an important topic for many regulatory downstream tasks, such as promoter classification, transaction factor binding sites prediction. The core problem is to model how regulatory elements interact with each other and its variability across different cell types. However, current deep learning methods often focus on modeling genome sequen… ▽ More In the genome biology research, regulatory genome modeling is an important topic for many regulatory downstream tasks, such as promoter classification, transaction factor binding sites prediction. The core problem is to model how regulatory elements interact with each other and its variability across different cell types. However, current deep learning methods often focus on modeling genome sequences of a fixed set of cell types and do not account for the interaction between multiple regulatory elements, making them only perform well on the cell types in the training set and lack the generalizability required in biological applications. In this work, we propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT. Specifically, we simultaneously take the 1d sequence of genome data and a 2d matrix of (transcription factors x regions) as the input, where three pre-training tasks are proposed to improve the robustness and generalizability of our model. We pre-train our model on the ATAC-seq dataset with 17 million genome sequences. We evaluate our GeneBERT on regulatory downstream tasks across different cell types, including promoter classification, transaction factor binding sites prediction, disease risk estimation, and splicing sites prediction. Extensive experiments demonstrate the effectiveness of multi-modal and self-supervised pre-training for large-scale regulatory genomics data. △ Less

Submitted 3 November, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

arXiv:2105.06766 [pdf, other]

doi 10.1038/s41467-021-26320-w

Objective comparison of methods to decode anomalous diffusion

Authors: Gorka Muñoz-Gil, Giovanni Volpe, Miguel Angel Garcia-March, Erez Aghion, Aykut Argun, Chang Beom Hong, Tom Bland, Stefano Bo, J. Alberto Conejero, Nicolás Firbas, Òscar Garibo i Orts, Alessia Gentili, Zihan Huang, Jae-Hyung Jeon, Hélène Kabbech, Yeong** Kim, Patrycja Kowalek, Diego Krapf, Hanna Loch-Olszewska, Michael A. Lomholt, Jean-Baptiste Masson, Philipp G. Meyer, Seongyu Park, Borja Requena, Ihor Smal , et al. (9 additional authors not shown)

Abstract: Deviations from Brownian motion leading to anomalous diffusion are ubiquitously found in transport dynamics, playing a crucial role in phenomena from quantum physics to life sciences. The detection and characterization of anomalous diffusion from the measurement of an individual trajectory are challenging tasks, which traditionally rely on calculating the mean squared displacement of the trajector… ▽ More Deviations from Brownian motion leading to anomalous diffusion are ubiquitously found in transport dynamics, playing a crucial role in phenomena from quantum physics to life sciences. The detection and characterization of anomalous diffusion from the measurement of an individual trajectory are challenging tasks, which traditionally rely on calculating the mean squared displacement of the trajectory. However, this approach breaks down for cases of important practical interest, e.g., short or noisy trajectories, ensembles of heterogeneous trajectories, or non-ergodic processes. Recently, several new approaches have been proposed, mostly building on the ongoing machine-learning revolution. Aiming to perform an objective comparison of methods, we gathered the community and organized an open competition, the Anomalous Diffusion challenge (AnDi). Participating teams independently applied their own algorithms to a commonly-defined dataset including diverse conditions. Although no single method performed best across all scenarios, the results revealed clear differences between the various approaches, providing practical advice for users and a benchmark for developers. △ Less

Submitted 14 May, 2021; originally announced May 2021.

Comments: 63 pages, 5 main figures, 1 table, 28 supplementary figures. Website: http://www.andi-challenge.org

arXiv:1709.00583 [pdf]

Training Spiking Neural Networks for Cognitive Tasks: A Versatile Framework Compatible to Various Temporal Codes

Authors: Chaofei Hong

Abstract: Conventional modeling approaches have found limitations in matching the increasingly detailed neural network structures and dynamics recorded in experiments to the diverse brain functionalities. On another approach, studies have demonstrated to train spiking neural networks for simple functions using supervised learning. Here, we introduce a modified SpikeProp learning algorithm, which achieved be… ▽ More Conventional modeling approaches have found limitations in matching the increasingly detailed neural network structures and dynamics recorded in experiments to the diverse brain functionalities. On another approach, studies have demonstrated to train spiking neural networks for simple functions using supervised learning. Here, we introduce a modified SpikeProp learning algorithm, which achieved better learning stability in different activity states. In addition, we show biological realistic features such as lateral connections and sparse activities can be included in the network. We demonstrate the versatility of this framework by implementing three well-known temporal codes for different types of cognitive tasks, which are MNIST digits recognition, spatial coordinate transformation, and motor sequence generation. Moreover, we find several characteristic features have evolved alongside the task training, such as selective activity, excitatory-inhibitory balance, and weak pair-wise correlation. The coincidence between the self-evolved and experimentally observed features indicates their importance on the brain functionality. Our results suggest a unified setting in which diverse cognitive computations and mechanisms can be studied. △ Less

Submitted 2 September, 2017; originally announced September 2017.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version will be superseded

Showing 1–3 of 3 results for author: Hong, C