Search | arXiv e-print repository

arXiv:2406.19113 [pdf, other]

MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

Abstract: Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag… ▽ More Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storage processing can be a fundamental solution for reducing this overhead. However, designing an in-storage processing system for metagenomics is challenging because existing approaches to metagenomic analysis cannot be directly implemented in storage effectively due to the hardware limitations of modern SSDs. We propose MegIS, the first in-storage processing system designed to significantly reduce the data movement overhead of the end-to-end metagenomic analysis pipeline. MegIS is enabled by our lightweight design that effectively leverages and orchestrates processing inside and outside the storage system. We address in-storage processing challenges for metagenomics via specialized and efficient 1) task partitioning, 2) data/computation flow coordination, 3) storage technology-aware algorithmic optimizations, 4) data map**, and 5) lightweight in-storage accelerators. MegIS's design is flexible, capable of supporting different types of metagenomic input datasets, and can be integrated into various metagenomic analysis pipelines. Our evaluation shows that MegIS outperforms the state-of-the-art performance- and accuracy-optimized software metagenomic tools by 2.7$\times$-37.2$\times$ and 6.9$\times$-100.2$\times$, respectively, while matching the accuracy of the accuracy-optimized tool. MegIS achieves 1.5$\times$-5.1$\times$ speedup compared to the state-of-the-art metagenomic hardware-accelerated (using processing-in-memory) tool, while achieving significantly higher accuracy. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: To appear in ISCA 2024. arXiv admin note: substantial text overlap with arXiv:2311.12527

arXiv:2403.20109 [pdf, ps, other]

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Authors: **yeong Park, Jaegyoon Ahn, Jonghwan Choi, Jibum Kim

Abstract: Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing… ▽ More Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties without any prior knowledge, including penalized LogP, QED, and celecoxib similarity. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2402.05982 [pdf, other]

Decoupled Sequence and Structure Generation for Realistic Antibody Design

Authors: Nayoung Kim, Minsu Kim, Sungsoo Ahn, **kyoo Park

Abstract: Antibody design plays a pivotal role in advancing therapeutics. Although deep learning has made rapid progress in this field, existing methods jointly generate antibody sequences and structures, limiting task-specific optimization. In response, we propose an antibody sequence-structure decoupling (ASSD) framework, which separates sequence generation and structure prediction. Although our approach… ▽ More Antibody design plays a pivotal role in advancing therapeutics. Although deep learning has made rapid progress in this field, existing methods jointly generate antibody sequences and structures, limiting task-specific optimization. In response, we propose an antibody sequence-structure decoupling (ASSD) framework, which separates sequence generation and structure prediction. Although our approach is simple, such a decoupling strategy has been overlooked in previous works. We also find that the widely used non-autoregressive generators promote sequences with overly repeating tokens. Such sequences are both out-of-distribution and prone to undesirable developability properties that can trigger harmful immune responses in patients. To resolve this, we introduce a composition-based objective that allows an efficient trade-off between high performance and low token repetition. Our results demonstrate that ASSD consistently outperforms existing antibody design models, while the composition-based objective successfully mitigates token repetition of non-autoregressive models. Our code is available at \url{https://github.com/lkny123/ASSD_public}. △ Less

Submitted 27 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 18 pages, 6 figures

arXiv:2402.05961 [pdf, other]

Genetic-guided GFlowNets for Sample Efficient Molecular Optimization

Authors: Hyeonah Kim, Minsu Kim, Sanghyeok Choi, **kyoo Park

Abstract: The challenge of discovering new molecules with desired properties is crucial in domains like drug discovery and material design. Recent advances in deep learning-based generative methods have shown promise but face the issue of sample efficiency due to the computational expense of evaluating the reward function. This paper proposes a novel algorithm for sample-efficient molecular optimization by… ▽ More The challenge of discovering new molecules with desired properties is crucial in domains like drug discovery and material design. Recent advances in deep learning-based generative methods have shown promise but face the issue of sample efficiency due to the computational expense of evaluating the reward function. This paper proposes a novel algorithm for sample-efficient molecular optimization by distilling a powerful genetic algorithm into deep generative policy using GFlowNets training, the off-policy method for amortized inference. This approach enables the deep generative policy to learn from domain knowledge, which has been explicitly integrated into the genetic algorithm. Our method achieves state-of-the-art performance in the official molecular optimization benchmark, significantly outperforming previous methods. It also demonstrates effectiveness in designing inhibitors against SARS-CoV-2 with substantially fewer reward calls. △ Less

Submitted 25 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: 26 pages (including 13 pages of appendix)

arXiv:2402.05953 [pdf, other]

doi 10.1109/MCG.2023.3345742

idMotif: An Interactive Motif Identification in Protein Sequences

Authors: Ji Hwan Park, Vikash Prasad, Sydney Newsom, Fares Najar, Rakhi Rajan

Abstract: This article introduces idMotif, a visual analytics framework designed to aid domain experts in the identification of motifs within protein sequences. Motifs, short sequences of amino acids, are critical for understanding the distinct functions of proteins. Identifying these motifs is pivotal for predicting diseases or infections. idMotif employs a deep learning-based method for the categorization… ▽ More This article introduces idMotif, a visual analytics framework designed to aid domain experts in the identification of motifs within protein sequences. Motifs, short sequences of amino acids, are critical for understanding the distinct functions of proteins. Identifying these motifs is pivotal for predicting diseases or infections. idMotif employs a deep learning-based method for the categorization of protein sequences, enabling the discovery of potential motif candidates within protein groups through local explanations of deep learning model decisions. It offers multiple interactive views for the analysis of protein clusters or groups and their sequences. A case study, complemented by expert feedback, illustrates idMotif's utility in facilitating the analysis and identification of protein sequences and motifs. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: IEEE CGA

Journal ref: idMotif: An Interactive Motif Identification in Protein Sequences," in IEEE Computer Graphics and Applications, 2023

arXiv:2311.12527 [pdf, other]

MetaStore: High-Performance Metagenomic Analysis via In-Storage Computing

Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Ma, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

Abstract: Metagenomics has led to significant advancements in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases containing information on different species' genomes. Metagenomic analysis suffers from significant data movement overhead due to moving large amo… ▽ More Metagenomics has led to significant advancements in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases containing information on different species' genomes. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system to the rest of the system. In-storage processing can be a fundamental solution for reducing data movement overhead. However, designing an in-storage processing system for metagenomics is challenging because none of the existing approaches can be directly implemented in storage effectively due to the hardware limitations of modern SSDs. We propose MetaStore, the first in-storage processing system designed to significantly reduce the data movement overhead of end-to-end metagenomic analysis. MetaStore is enabled by our lightweight and cooperative design that effectively leverages and orchestrates processing inside and outside the storage system. Through our detailed analysis of the end-to-end metagenomic analysis pipeline and careful hardware/software co-design, we address in-storage processing challenges for metagenomics via specialized and efficient 1) task partitioning, 2) data/computation flow coordination, 3) storage technology-aware algorithmic optimizations, 4) light-weight in-storage accelerators, and 5) data map**. Our evaluation shows that MetaStore outperforms the state-of-the-art performance- and accuracy-optimized software metagenomic tools by 2.7-37.2$\times$ and 6.9-100.2$\times$, respectively, while matching the accuracy of the accuracy-optimized tool. MetaStore achieves 1.5-5.1$\times$ speedup compared to the state-of-the-art metagenomic hardware-accelerated tool, while achieving significantly higher accuracy. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.04468 [pdf]

A human brain atlas of chi-separation for normative iron and myelin distributions

Authors: Kyeongseon Min, Beomseok Sohn, Woo Jung Kim, Chae Jung Park, Soohwa Song, Dong Hoon Shin, Kyung Won Chang, Na-Young Shin, Minjun Kim, Hyeong-Geol Shin, Phil Hyu Lee, Jongho Lee

Abstract: Iron and myelin are primary susceptibility sources in the human brain. These substances are essential for healthy brain, and their abnormalities are often related to various neurological disorders. Recently, an advanced susceptibility map** technique, which is referred to as chi-separation, has been proposed, successfully disentangling paramagnetic iron from diamagnetic myelin. This method opene… ▽ More Iron and myelin are primary susceptibility sources in the human brain. These substances are essential for healthy brain, and their abnormalities are often related to various neurological disorders. Recently, an advanced susceptibility map** technique, which is referred to as chi-separation, has been proposed, successfully disentangling paramagnetic iron from diamagnetic myelin. This method opened a potential for generating high resolution iron and myelin maps in the brain. Utilizing this technique, this study constructs a normative chi-separation atlas from 106 healthy human brains. The resulting atlas provides detailed anatomical structures associated with the distributions of iron and myelin, clearly delineating subcortical nuclei, thalamic nuclei, and white matter fiber bundles. Additionally, susceptibility values in a number of regions of interest are reported along with age-dependent changes. This atlas may have direct applications such as localization of subcortical structures for deep brain stimulation or high-intensity focused ultrasound and also serve as a valuable resource for future research. △ Less

Submitted 2 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 19 pages, 9 figures

arXiv:2309.11438 [pdf, other]

doi 10.1073/pnas.2320242121

Brain-inspired computing with fluidic iontronic nanochannels

Authors: T. M. Kamsma, J. Kim, K. Kim, W. Q. Boon, C. Spitoni, J. Park, R. van Roij

Abstract: The brain's remarkable and efficient information processing capability is driving research into brain-inspired (neuromorphic) computing paradigms. Artificial aqueous ion channels are emerging as an exciting platform for neuromorphic computing, representing a departure from conventional solid-state devices by directly mimicking the brain's fluidic ion transport. Supported by a quantitative theoreti… ▽ More The brain's remarkable and efficient information processing capability is driving research into brain-inspired (neuromorphic) computing paradigms. Artificial aqueous ion channels are emerging as an exciting platform for neuromorphic computing, representing a departure from conventional solid-state devices by directly mimicking the brain's fluidic ion transport. Supported by a quantitative theoretical model, we present easy to fabricate tapered microchannels that embed a conducting network of fluidic nanochannels between a colloidal structure. Due to transient salt concentration polarisation our devices are volatile memristors (memory resistors) that are remarkably stable. The voltage-driven net salt flux and accumulation, that underpin the concentration polarisation, surprisingly combine into a diffusionlike quadratic dependence of the memory retention time on the channel length, allowing channel design for a specific timescale. We implement our device as a synaptic element for neuromorphic reservoir computing. Individual channels distinguish various time series, that together represent (handwritten) numbers, for subsequent in-silico classification with a simple readout function. Our results represent a significant step towards realising the promise of fluidic ion channels as a platform to emulate the rich aqueous dynamics of the brain. △ Less

Submitted 25 April, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Journal ref: Proceedings of the National Academy of Sciences (2024), Vol 121, Issue 18

arXiv:2309.05768 [pdf]

The Past, Present, and Future of the Brain Imaging Data Structure (BIDS)

Authors: Russell A. Poldrack, Christopher J. Markiewicz, Stefan Appelhoff, Yoni K. Ashar, Tibor Auer, Sylvain Baillet, Shashank Bansal, Leandro Beltrachini, Christian G. Benar, Giacomo Bertazzoli, Suyash Bhogawar, Ross W. Blair, Marta Bortoletto, Mathieu Boudreau, Teon L. Brooks, Vince D. Calhoun, Filippo Maria Castelli, Patricia Clement, Alexander L Cohen, Julien Cohen-Adad, Sasha D'Ambrosio, Gilles de Hollander, María de la iglesia-Vayá, Alejandro de la Vega, Arnaud Delorme , et al. (89 additional authors not shown)

Abstract: The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities. This paper is meant as a history of how the standard has developed and grown over time. We outline the principles behind the project, the mechanisms by which it has been extended, and some of the challenges being addressed as it evolves.… ▽ More The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities. This paper is meant as a history of how the standard has developed and grown over time. We outline the principles behind the project, the mechanisms by which it has been extended, and some of the challenges being addressed as it evolves. We also discuss the lessons learned through the project, with the aim of enabling researchers in other domains to learn from the success of BIDS. △ Less

Submitted 8 January, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.04423 [pdf, other]

doi 10.1109/VIS54172.2023.00030

Vis-SPLIT: Interactive Hierarchical Modeling for mRNA Expression Classification

Authors: Braden Roper, James C. Mathews, Saad Nadeem, Ji Hwan Park

Abstract: We propose an interactive visual analytics tool, Vis-SPLIT, for partitioning a population of individuals into groups with similar gene signatures. Vis-SPLIT allows users to interactively explore a dataset and exploit visual separations to build a classification model for specific cancers. The visualization components reveal gene expression and correlation to assist specific partitioning decisions,… ▽ More We propose an interactive visual analytics tool, Vis-SPLIT, for partitioning a population of individuals into groups with similar gene signatures. Vis-SPLIT allows users to interactively explore a dataset and exploit visual separations to build a classification model for specific cancers. The visualization components reveal gene expression and correlation to assist specific partitioning decisions, while also providing overviews for the decision model and clustered genetic signatures. We demonstrate the effectiveness of our framework through a case study and evaluate its usability with domain experts. Our results show that Vis-SPLIT can classify patients based on their genetic signatures to effectively gain insights into RNA sequencing data, as compared to an existing classification system. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: To be published in IEEE Visualization and Visual Analytics (VIS), 2023

arXiv:2309.01670 [pdf, other]

Blind Biological Sequence Denoising with Self-Supervised Set Learning

Authors: Nathan Ng, Ji Won Park, Jae Hyeon Lee, Ryan Lewis Kelly, Stephen Ra, Kyunghyun Cho

Abstract: Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are avai… ▽ More Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are available or error rates are too high. In this paper, we propose a novel method for blindly denoising sets of sequences without directly observing clean source sequence labels. Our method, Self-Supervised Set Learning (SSSL), gathers subreads together in an embedding space and estimates a single set embedding as the midpoint of the subreads in both the latent and sequence spaces. This set embedding represents the "average" of the subreads and can be decoded into a prediction of the clean sequence. In experiments on simulated long-read DNA data, SSSL methods denoise small reads of $\leq 6$ subreads with 17% fewer errors and large reads of $>6$ subreads with 8% fewer errors compared to the best baseline. On a real dataset of antibody sequences, SSSL improves over baselines on two self-supervised metrics, with a significant improvement on difficult small reads that comprise over 60% of the test set. By accurately denoising these reads, SSSL promises to better realize the potential of high-throughput DNA sequencing data for downstream scientific applications. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2306.16085 [pdf, other]

Mass Spectra Prediction with Structural Motif-based Graph Neural Networks

Authors: Jiwon Park, Jeonghee Jo, Sungroh Yoon

Abstract: Mass spectra, which are agglomerations of ionized fragments from targeted molecules, play a crucial role across various fields for the identification of molecular structures. A prevalent analysis method involves spectral library searches,where unknown spectra are cross-referenced with a database. The effectiveness of such search-based approaches, however, is restricted by the scope of the existing… ▽ More Mass spectra, which are agglomerations of ionized fragments from targeted molecules, play a crucial role across various fields for the identification of molecular structures. A prevalent analysis method involves spectral library searches,where unknown spectra are cross-referenced with a database. The effectiveness of such search-based approaches, however, is restricted by the scope of the existing mass spectra database, underscoring the need to expand the database via mass spectra prediction. In this research, we propose the Motif-based Mass Spectrum Prediction Network (MoMS-Net), a system that predicts mass spectra using the information derived from structural motifs and the implementation of Graph Neural Networks (GNNs). We have tested our model across diverse mass spectra and have observed its superiority over other existing models. MoMS-Net considers substructure at the graph level, which facilitates the incorporation of long-range dependencies while using less memory compared to the graph transformer model. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 19 pages, 3figures

arXiv:2306.03111 [pdf, other]

Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences

Authors: Minsu Kim, Federico Berto, Sungsoo Ahn, **kyoo Park

Abstract: We study the problem of optimizing biological sequences, e.g., proteins, DNA, and RNA, to maximize a black-box score function that is only evaluated in an offline dataset. We propose a novel solution, bootstrapped training of score-conditioned generator (BootGen) algorithm. Our algorithm repeats a two-stage process. In the first stage, our algorithm trains the biological sequence generator with ra… ▽ More We study the problem of optimizing biological sequences, e.g., proteins, DNA, and RNA, to maximize a black-box score function that is only evaluated in an offline dataset. We propose a novel solution, bootstrapped training of score-conditioned generator (BootGen) algorithm. Our algorithm repeats a two-stage process. In the first stage, our algorithm trains the biological sequence generator with rank-based weights to enhance the accuracy of sequence generation based on high scores. The subsequent stage involves bootstrap**, which augments the training dataset with self-generated data labeled by a proxy score function. Our key idea is to align the score-based generation with a proxy score function, which distills the knowledge of the proxy score function to the generator. After training, we aggregate samples from multiple bootstrapped generators and proxies to produce a diverse design. Extensive experiments show that our method outperforms competitive baselines on biological sequential design tasks. We provide reproducible source code: \href{https://github.com/kaist-silab/bootgen}{https://github.com/kaist-silab/bootgen}. △ Less

Submitted 22 March, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023, 19 pages, 5 figures

arXiv:2305.12341 [pdf, other]

Enhancing biodiversity through intraspecific suppression in large ecosystems

Authors: Seong-Gyu Yang, Hye ** Park

Abstract: The competitive exclusion principle (CEP) is a fundamental concept in the niche theory, which posits that the number of available resources constrains the coexistence of species. While the CEP offers an intuitive explanation on coexistence, it has been challenged by counterexamples observed in nature. One prominent counterexample is the phytoplankton community, known as the paradox of the plankton… ▽ More The competitive exclusion principle (CEP) is a fundamental concept in the niche theory, which posits that the number of available resources constrains the coexistence of species. While the CEP offers an intuitive explanation on coexistence, it has been challenged by counterexamples observed in nature. One prominent counterexample is the phytoplankton community, known as the paradox of the plankton. Diverse phytoplankton species coexist in the ocean even though they demand a limited number of resources. To shed light on this remarkable biodiversity in large ecosystems quantitatively, we consider \textit{intraspecific suppression} into the generalized MacArthur's consumer-resource model and study the relative diversity, the number ratio between coexisting consumers and resource kinds. By employing the cavity method and generating functional analysis, we demonstrate that, under intraspecific suppression, the number of consumer species can surpass the available resources. This phenomenon stems from the fact that intraspecific suppression prevents the emergence of dominant species, thereby fostering high biodiversity. Furthermore, our study highlights that the impact of this competition on biodiversity is contingent upon environmental conditions. Our work presents a comprehensive framework that encompasses the CEP and its counterexamples by introducing intraspecific suppression. △ Less

Submitted 1 April, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 40 pages (including Appendix), 25 figures (5 figures in main, 20 figures in Appendix)

arXiv:2304.10065 [pdf]

Machine learning traction force maps of cell monolayers

Authors: Changhao Li, Luyi Feng, Yang Jeong Park, Jian Yang, Ju Li, Sulin Zhang

Abstract: Cellular force transmission across a hierarchy of molecular switchers is central to mechanobiological responses. However, current cellular force microscopies suffer from low throughput and resolution. Here we introduce and train a generative adversarial network (GAN) to paint out traction force maps of cell monolayers with high fidelity to the experimental traction force microscopy (TFM). The GAN… ▽ More Cellular force transmission across a hierarchy of molecular switchers is central to mechanobiological responses. However, current cellular force microscopies suffer from low throughput and resolution. Here we introduce and train a generative adversarial network (GAN) to paint out traction force maps of cell monolayers with high fidelity to the experimental traction force microscopy (TFM). The GAN analyzes traction force maps as an image-to-image translation problem, where its generative and discriminative neural networks are simultaneously cross-trained by hybrid experimental and numerical datasets. In addition to capturing the colony-size and substrate-stiffness dependent traction force maps, the trained GAN predicts asymmetric traction force patterns for multicellular monolayers seeding on substrates with stiffness gradient, implicating collective durotaxis. Further, the neural network can extract experimentally inaccessible, the hidden relationship between substrate stiffness and cell contractility, which underlies cellular mechanotransduction. Trained solely on datasets for epithelial cells, the GAN can be extrapolated to other contractile cell types using only a single scaling factor. The digital TFM serves as a high-throughput tool for map** out cellular forces of cell monolayers and paves the way toward data-driven discoveries in cell mechanobiology. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2301.00556 [pdf, ps, other]

doi 10.1016/j.chaos.2022.113004

Competition of alliances in a cyclically dominant eight-species population

Authors: Junpyo Park, Xiaojie Chen, Attila Szolnoki

Abstract: In a diverse population, where many species are present, competitors can fight for surviving at individual and collective levels. In particular, species, which would beat each other individually, may form a specific alliance that ensures them stable coexistence against the invasion of an external species. Our principal goal is to identify those general features of a formation which determine its v… ▽ More In a diverse population, where many species are present, competitors can fight for surviving at individual and collective levels. In particular, species, which would beat each other individually, may form a specific alliance that ensures them stable coexistence against the invasion of an external species. Our principal goal is to identify those general features of a formation which determine its vitality. Therefore, we here study a traditional Lotka-Volterra model of eight-species where two four-species cycles can fight for space. Beside these formations, there are other solutions which may emerge when invasion rates are varied. The complete range of parameters is explored and we find that in most of the cases those alliances prevail which are formed by equally strong members. Interestingly, there are regions where the symmetry is broken and the system is dominated by a solution formed by seven species. Our work also highlights that serious finite-size effects may emerge which prevent observing the valid solution in a small system. △ Less

Submitted 2 January, 2023; originally announced January 2023.

Comments: 10 double-column pages, 11 figures

Journal ref: Chaos, Solitons and Fractals 166 (2023) 113004

arXiv:2212.05187 [pdf, other]

doi 10.1063/5.0142978

Invasion and Interaction Determine Population Composition in an Open Evolving System

Authors: Youngjai Park, Takashi Shimada, Seung-Woo Son, Hye ** Park

Abstract: It is well-known that interactions between species determine the population composition in an ecosystem. Conventional studies have focused on fixed population structures to reveal how interactions shape population compositions. However, interaction structures are not fixed, but change over time due to invasions. Thus, invasion and interaction play an important role in sha** communities. Despite… ▽ More It is well-known that interactions between species determine the population composition in an ecosystem. Conventional studies have focused on fixed population structures to reveal how interactions shape population compositions. However, interaction structures are not fixed, but change over time due to invasions. Thus, invasion and interaction play an important role in sha** communities. Despite its importance, however, the interplay between invasion and interaction has not been well explored. Here, we investigate how invasion affects the population composition with interactions in open evolving systems considering generalized Lotka-Volterra-type dynamics. Our results show that the system has two distinct regimes. One is characterized by low diversity with abrupt changes of dominant species in time, appearing when the interaction between species is strong and invasion slowly occurs. On the other hand, frequent invasions can induce higher diversity with slow changes in abundances despite strong interactions. It is because invasion happens before the system reaches its equilibrium, which drags the system from its equilibrium all the time. All species have similar abundances in this regime, which implies that fast invasion induces regime shift. Therefore, whether invasion or interaction dominates determines the population composition. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: 15 pages (including supplementary material), 8 figures (4 figures in main, 4 figures in SI)

arXiv:2210.04096 [pdf, other]

PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design

Authors: Ji Won Park, Samuel Stanton, Saeed Saremi, Andrew Watkins, Henri Dwyer, Vladimir Gligorijevic, Richard Bonneau, Stephen Ra, Kyunghyun Cho

Abstract: Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarch… ▽ More Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarchical dependency structure. We consider a common use case where some regions of the Pareto frontier are prioritized over others according to a specified $\textit{partial ordering}$ in the objectives. For instance, when designing antibodies, we would like to maximize the binding affinity to a target antigen only if it can be expressed in live cell culture -- modeling the experimental dependency in which affinity can only be measured for antibodies that can be expressed and thus produced in viable quantities. In general, we may want to confer a partial ordering to the properties such that each property is optimized conditioned on its parent properties satisfying some feasibility condition. To this end, we present PropertyDAG, a framework that operates on top of the traditional multi-objective BO to impose this desired ordering on the objectives, e.g. expression $\rightarrow$ affinity. We demonstrate its performance over multiple simulated active learning iterations on a penicillin production task, toy numerical problem, and a real-world antibody design task. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: 9 pages, 7 figures. Submitted to NeurIPS 2022 AI4Science Workshop

arXiv:2208.14959 [pdf]

Inference of Mixed Graphical Models for Dichotomous Phenotypes using Markov Random Field Model

Authors: Jaehyun Park, Sungho Won

Abstract: In this article, we propose a new method named fused mixed graphical model (FMGM), which can infer network structures for dichotomous phenotypes. We assumed that the interplay of different omics markers is associated with disease status and proposed an FMGM-based method to detect the associated omics marker network difference. The statistical models of the networks were based on a pairwise Markov… ▽ More In this article, we propose a new method named fused mixed graphical model (FMGM), which can infer network structures for dichotomous phenotypes. We assumed that the interplay of different omics markers is associated with disease status and proposed an FMGM-based method to detect the associated omics marker network difference. The statistical models of the networks were based on a pairwise Markov random field model, and penalty functions were added to minimize the effect of sparseness in the networks. The fast proximal gradient method (PGM) was used to optimize the target function. Method validity was measured using synthetic datasets that simulate power-law network structures, and it was found that FMGM showed superior performance, especially in terms of F1 scores, compared with the previous method inferring the networks sequentially (0.392 and 0.546). FMGM performed better not only in identifying the differences (0.217 and 0.410) but also in identifying the networks (0.492 and 0.572). The proposed method was applied to multi-omics profiles of 6-month-old infants with and without atopic dermatitis (AD), and different correlations were found between the abundance of microbial genes related to carotenoid biosynthesis and RNA degradation according to disease status, suggesting the importance of metabolism related to oxidative stress and microbial RNA balance. △ Less

Submitted 31 August, 2022; originally announced August 2022.

Comments: 31 pages (excluding figures and tables), 4 figures, 3 tables, submitted to Biometrics

MSC Class: 92B15 (Primary) 62P10 62H10 62-08 (Secondary)

arXiv:2208.10661 [pdf, other]

Therapeutic algebra of immunomodulatory drug responses at single-cell resolution

Authors: Jialong Jiang, Sisi Chen, Tiffany Tsou, Christopher S. McGinnis, Tahmineh Khazaei, Qin Zhu, Jong H. Park, Paul Rivaud, Inna-Marie Strazhnik, Eric D. Chow, David A. Sivak, Zev J. Gartner, Matt Thomson

Abstract: Therapeutic modulation of immune states is central to the treatment of human disease. However, how drugs and drug combinations impact the diverse cell types in the human immune system remains poorly understood at the transcriptome scale. Here, we apply single-cell mRNA-seq to profile the response of human immune cells to 502 immunomodulatory drugs alone and in combination. We develop a unified mat… ▽ More Therapeutic modulation of immune states is central to the treatment of human disease. However, how drugs and drug combinations impact the diverse cell types in the human immune system remains poorly understood at the transcriptome scale. Here, we apply single-cell mRNA-seq to profile the response of human immune cells to 502 immunomodulatory drugs alone and in combination. We develop a unified mathematical model that quantitatively describes the transcriptome scale response of myeloid and lymphoid cell types to individual drugs and drug combinations through a single inferred regulatory network. The mathematical model reveals how drug combinations generate novel, macrophage and T-cell states by recruiting combinations of gene expression programs through both additive and non-additive drug interactions. A simplified drug response algebra allows us to predict the continuous modulation of immune cell populations between activated, resting and hyper-inhibited states through combinatorial drug dose titrations. Our results suggest that transcriptome-scale mathematical models could enable the design of therapeutic strategies for programming the human immune system using combinations of therapeutics. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 19 pages, 5 figures

arXiv:2205.04259 [pdf, other]

Multi-segment preserving sampling for deep manifold sampler

Authors: Daniel Berenberg, Jae Hyeon Lee, Simon Kelow, Ji Won Park, Andrew Watkins, Vladimir Gligorijević, Richard Bonneau, Stephen Ra, Kyunghyun Cho

Abstract: Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guide… ▽ More Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guided sampling procedure, multi-segment preserving sampling, that enables the direct inclusion of domain-specific knowledge by designating preserved and non-preserved segments along the input sequence, thereby restricting variation to only select regions. We present its effectiveness in the context of antibody design by training two models: a deep manifold sampler and a GPT-2 language model on nearly six million heavy chain sequences annotated with the IGHV1-18 gene. During sampling, we restrict variation to only the complementarity-determining region 3 (CDR3) of the input. We obtain log probability scores from a GPT-2 model for each sampled CDR3 and demonstrate that multi-segment preserving sampling generates reasonable designs while maintaining the desired, preserved regions. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2204.03742 [pdf, other]

doi 10.1016/j.media.2022.102699

Mitosis domain generalization in histopathology images -- The MIDOG challenge

Authors: Marc Aubreville, Nikolas Stathonikos, Christof A. Bertram, Robert Klopleisch, Natalie ter Hoeve, Francesco Ciompi, Frauke Wilm, Christian Marzahl, Taryn A. Donovan, Andreas Maier, Jack Breen, Nishant Ravikumar, You** Chung, **ah Park, Ramin Nateghi, Fattaneh Pourakpour, Rutger H. J. Fick, Saima Ben Hadj, Mostafa Jahanifar, Nasir Rajpoot, Jakob Dexl, Thomas Wittenberg, Satoshi Kondo, Maxime W. Lafarge, Viktor H. Koelzer , et al. (10 additional authors not shown)

Abstract: The density of mitotic figures within tumor tissue is known to be highly correlated with tumor proliferation and thus is an important marker in tumor grading. Recognition of mitotic figures by pathologists is known to be subject to a strong inter-rater bias, which limits the prognostic value. State-of-the-art deep learning methods can support the expert in this assessment but are known to strongly… ▽ More The density of mitotic figures within tumor tissue is known to be highly correlated with tumor proliferation and thus is an important marker in tumor grading. Recognition of mitotic figures by pathologists is known to be subject to a strong inter-rater bias, which limits the prognostic value. State-of-the-art deep learning methods can support the expert in this assessment but are known to strongly deteriorate when applied in a different clinical environment than was used for training. One decisive component in the underlying domain shift has been identified as the variability caused by using different whole slide scanners. The goal of the MICCAI MIDOG 2021 challenge has been to propose and evaluate methods that counter this domain shift and derive scanner-agnostic mitosis detection algorithms. The challenge used a training set of 200 cases, split across four scanning systems. As a test set, an additional 100 cases split across four scanning systems, including two previously unseen scanners, were given. The best approaches performed on an expert level, with the winning algorithm yielding an F_1 score of 0.748 (CI95: 0.704-0.781). In this paper, we evaluate and compare the approaches that were submitted to the challenge and identify methodological factors contributing to better performance. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: 19 pages, 9 figures, summary paper of the 2021 MICCAI MIDOG challenge

Journal ref: Medical Image Analysis 84 (2023) 102699

arXiv:2202.10400 [pdf, other]

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

Authors: Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, Onur Mutlu

Abstract: Read map** is a fundamental, yet computationally-expensive step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). To address the computational challenges in genome analysis, many prior works propose various approaches such as filters that select th… ▽ More Read map** is a fundamental, yet computationally-expensive step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). To address the computational challenges in genome analysis, many prior works propose various approaches such as filters that select the reads that must undergo expensive computation, efficient heuristics, and hardware acceleration. While effective at reducing the computation overhead, all such approaches still require the costly movement of a large amount of data from storage to the rest of the system, which can significantly lower the end-to-end performance of read map** in conventional and emerging genomics systems. We propose GenStore, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. GenStore leverages hardware/software co-design to address the challenges of in-storage processing, supporting reads with 1) different read lengths and error rates, and 2) different degrees of genetic variation. Through rigorous analysis of read map** processes, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flash-based SSD. Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern SSDs, significantly improves the read map** performance of state-of-the-art software (hardware) baselines by 2.07-6.05$\times$ (1.52-3.32$\times$) for read sets with high similarity to the reference genome and 1.45-33.63$\times$ (2.70-19.2$\times$) for read sets with low similarity to the reference genome. △ Less

Submitted 6 April, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

Comments: Published at ASPLOS 2022

arXiv:2112.08687 [pdf, other]

doi 10.1093/nargab/lqad004

BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis

Authors: Can Firtina, Jisung Park, Mohammed Alser, Jeremie S. Kim, Damla Senol Cali, Taha Shahroodi, Nika Mansouri Ghiasi, Gagandeep Singh, Konstantinos Kanellopoulos, Can Alkan, Onur Mutlu

Abstract: Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only e… ▽ More Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only exact-matching seeds causes either 1) increasing the use of the costly sequence alignment or 2) limited sensitivity. We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches. BLEND 1) utilizes a technique called SimHash, that can generate the same hash value for similar sets, and 2) provides the proper mechanisms for using seeds as sets with the SimHash technique to find fuzzy seed matches efficiently. We show the benefits of BLEND when used in read overlap** and read map**. For read overlap**, BLEND is faster by 2.4x - 83.9x (on average 19.3x), has a lower memory footprint by 0.9x - 14.1x (on average 3.8x), and finds higher quality overlaps leading to accurate de novo assemblies than the state-of-the-art tool, minimap2. For read map**, BLEND is faster by 0.8x - 4.1x (on average 1.7x) than minimap2. Source code is available at https://github.com/CMU-SAFARI/BLEND. △ Less

Submitted 23 May, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: Published in NARGAB

Journal ref: NAR Genomics and Bioinformatics, vol. 5, no. 1, p. lqad004, Mar. 2023

arXiv:2112.05782 [pdf, ps, other]

Dynamical clustering of U.S. states reveals four distinct infection patterns that predict SARS-CoV-2 pandemic behavior

Authors: Joseph L. Natale, Varun Viswanath, Oscar Trujillo Acevedo, Sophia Pérez Giottonini, Sandy Ihuiyan Romero Hernández, Diana G. Cruz Millán, A. Montserrat Palacios-Puga, Ammar Mandvi, Brian M. Khan, Martin Lilik, Jay Park, Benjamin L. Smarr

Abstract: The SARS-CoV-2 pandemic has so far unfolded diversely across the fifty United States of America, reflected both in different time progressions of infection "waves" and in magnitudes of local infection rates. Despite a marked diversity of presentations, most U.S. states experienced their single greatest surge in daily new cases during the transition from Fall 2020 to Winter 2021. Popular media also… ▽ More The SARS-CoV-2 pandemic has so far unfolded diversely across the fifty United States of America, reflected both in different time progressions of infection "waves" and in magnitudes of local infection rates. Despite a marked diversity of presentations, most U.S. states experienced their single greatest surge in daily new cases during the transition from Fall 2020 to Winter 2021. Popular media also cite additional similarities between states -- often despite disparities in governmental policies, reported mask-wearing compliance rates, and vaccination percentages. Here, we identify a set of robust, low-dimensional clusters that 1) summarize the timings and relative heights of four historical COVID-19 "wave opportunities" accessible to all 50 U.S. states, 2) correlate with geographical and intervention patterns associated with those groups of states they encompass, and 3) predict aspects of the "fifth wave" of new infections in the late Summer of 2021. In particular, we argue that clustering elucidates a negative relationship between vaccination rates and subsequent case-load variabilities within state groups. We advance the hypothesis that vaccination acts as a ``seat belt," in effect constraining the likely range of new-case upticks, even in the context of the Summer 2021, variant-driven surge. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Comments: 22 pages, 4 figures; submitted to PLOS ONE

arXiv:2106.13202 [pdf, other]

SALT: Sea lice Adaptive Lattice Tracking -- An Unsupervised Approach to Generate an Improved Ocean Model

Authors: Ju An Park, Vikram Voleti, Kathryn E. Thomas, Alexander Wong, Jason L. Deglint

Abstract: Warming oceans due to climate change are leading to increased numbers of ectoparasitic copepods, also known as sea lice, which can cause significant ecological loss to wild salmon populations and major economic loss to aquaculture sites. The main transport mechanism driving the spread of sea lice populations are near-surface ocean currents. Present strategies to estimate the distribution of sea li… ▽ More Warming oceans due to climate change are leading to increased numbers of ectoparasitic copepods, also known as sea lice, which can cause significant ecological loss to wild salmon populations and major economic loss to aquaculture sites. The main transport mechanism driving the spread of sea lice populations are near-surface ocean currents. Present strategies to estimate the distribution of sea lice larvae are computationally complex and limit full-scale analysis. Motivated to address this challenge, we propose SALT: Sea lice Adaptive Lattice Tracking approach for efficient estimation of sea lice dispersion and distribution in space and time. Specifically, an adaptive spatial mesh is generated by merging nodes in the lattice graph of the Ocean Model based on local ocean properties, thus enabling highly efficient graph representation. SALT demonstrates improved efficiency while maintaining consistent results with the standard method, using near-surface current data for Hardangerfjord, Norway. The proposed SALT technique shows promise for enhancing proactive aquaculture management through predictive modelling of sea lice infestation pressure maps in a changing climate. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: 5 pages, 3 figures, 3 tables

arXiv:2106.10627 [pdf, other]

Experimentally testable whole brain manifolds that recapitulate behavior

Authors: Gerald M Pao, Cameron Smith, Joseph Park, Keichi Takahashi, Wassapon Watanakeesuntorn, Hiroaki Natsukawa, Sreekanth H Chalasani, Tom Lorimer, Ryousei Takano, Nuttida Rungratsameetaweemana, George Sugihara

Abstract: We propose an algorithm grounded in dynamical systems theory that generalizes manifold learning from a global state representation, to a network of local interacting manifolds termed a Generative Manifold Network (GMN). Manifolds are discovered using the convergent cross map** (CCM) causal inference algorithm which are then compressed into a reduced redundancy network. The representation is a ne… ▽ More We propose an algorithm grounded in dynamical systems theory that generalizes manifold learning from a global state representation, to a network of local interacting manifolds termed a Generative Manifold Network (GMN). Manifolds are discovered using the convergent cross map** (CCM) causal inference algorithm which are then compressed into a reduced redundancy network. The representation is a network of manifolds embedded from observational data where each orthogonal axis of a local manifold is an embedding of a individually identifiable neuron or brain area that has exact correspondence in the real world. As such these can be experimentally manipulated to test hypotheses derived from theory and data analysis. Here we demonstrate that this representation preserves the essential features of the brain of flies,larval zebrafish and humans. In addition to accurate near-term prediction, the GMN model can be used to synthesize realistic time series of whole brain neuronal activity and locomotion viewed over the long term. Thus, as a final validation of how well GMN captures essential dynamic information, we show that the artificially generated time series can be used as a training set to predict out-of-sample observed fly locomotion, as well as brain activity in out of sample withheld data not used in model building. Remarkably, the artificially generated time series show realistic novel behaviors that do not exist in the training data, but that do exist in the out-of-sample observational data. This suggests that GMN captures inherently emergent properties of the network. We suggest our approach may be a generic recipe for map** time series observations of any complex nonlinear network into a model that is able to generate naturalistic system behaviors that identifies variables that have real world correspondence and can be experimentally manipulated. △ Less

Submitted 20 June, 2021; originally announced June 2021.

Comments: 20 pages, 15 figures; corresponding author: Gerald Pao [email protected]

arXiv:2011.13554 [pdf]

Towards decoding the coupled decision-making of metabolism and epithelial-mesenchymal transition in cancer

Authors: Dongya Jia, Jun Hyoung Park, Harsimran Kaur, Kwang Hwa Jung, Suk** Yang, Shubham Tripathi, Madeline Galbraith, Youyuan Deng, Mohit Kumar Jolly, Benny Abraham Kaipparettu, Jose N. Onuchic, Herbert Levine

Abstract: Cancer cells have the plasticity to adjust their metabolic phenotypes for survival and metastasis. During metastasis, a developmental program known as the epithelial-mesenchymal transition (EMT) plays a critical role. There is extensive cross-talk between metabolism and EMT, but how this leads to coordinated physiological changes is still uncertain. The elusive connection between metabolism and EM… ▽ More Cancer cells have the plasticity to adjust their metabolic phenotypes for survival and metastasis. During metastasis, a developmental program known as the epithelial-mesenchymal transition (EMT) plays a critical role. There is extensive cross-talk between metabolism and EMT, but how this leads to coordinated physiological changes is still uncertain. The elusive connection between metabolism and EMT compromises the efficacy of metabolic therapies targeting metastasis. In this review, we aim for clarifying causation between metabolism and EMT based on recent experimental studies and propose integrated theoretical-experimental efforts to better understand the coupled decision-making of metabolism and EMT. △ Less

Submitted 26 November, 2020; originally announced November 2020.

Comments: 31 pages, 3 figures

arXiv:2011.11082 [pdf, other]

Massively Parallel Causal Inference of Whole Brain Dynamics at Single Neuron Resolution

Authors: Wassapon Watanakeesuntorn, Keichi Takahashi, Kohei Ichikawa, Joseph Park, George Sugihara, Ryousei Takano, Jason Haga, Gerald M. Pao

Abstract: Empirical Dynamic Modeling (EDM) is a nonlinear time series causal inference framework. The latest implementation of EDM, cppEDM, has only been used for small datasets due to computational cost. With the growth of data collection capabilities, there is a great need to identify causal relationships in large datasets. We present mpEDM, a parallel distributed implementation of EDM optimized for moder… ▽ More Empirical Dynamic Modeling (EDM) is a nonlinear time series causal inference framework. The latest implementation of EDM, cppEDM, has only been used for small datasets due to computational cost. With the growth of data collection capabilities, there is a great need to identify causal relationships in large datasets. We present mpEDM, a parallel distributed implementation of EDM optimized for modern GPU-centric supercomputers. We improve the original algorithm to reduce redundant computation and optimize the implementation to fully utilize hardware resources such as GPUs and SIMD units. As a use case, we run mpEDM on AI Bridging Cloud Infrastructure (ABCI) using datasets of an entire animal brain sampled at single neuron resolution to identify dynamical causation patterns across the brain. mpEDM is 1,530 X faster than cppEDM and a dataset containing 101,729 neuron was analyzed in 199 seconds on 512 nodes. This is the largest EDM causal inference achieved to date. △ Less

Submitted 22 November, 2020; originally announced November 2020.

Comments: 10 pges, 10 figures, accepted at IEEE International Conference on Parallel and Distributed Systems (ICPADS)2020, corresponding authors: Keichi Takahashi, Gerald M Pao

ACM Class: K.6.3; G.4; J.3

arXiv:2008.05377 [pdf]

Network reinforcement driven drug repurposing for COVID-19 by exploiting disease-gene-drug associations

Authors: Yonghyun Nam, Jae-Seung Yun, Seung Mi Lee, Ji Won Park, Ziqi Chen, Brian Lee, Anurag Verma, Xia Ning, Li Shen, Dokyoon Kim

Abstract: Currently, the number of patients with COVID-19 has significantly increased. Thus, there is an urgent need for develo** treatments for COVID-19. Drug repurposing, which is the process of reusing already-approved drugs for new medical conditions, can be a good way to solve this problem quickly and broadly. Many clinical trials for COVID-19 patients using treatments for other diseases have already… ▽ More Currently, the number of patients with COVID-19 has significantly increased. Thus, there is an urgent need for develo** treatments for COVID-19. Drug repurposing, which is the process of reusing already-approved drugs for new medical conditions, can be a good way to solve this problem quickly and broadly. Many clinical trials for COVID-19 patients using treatments for other diseases have already been in place or will be performed at clinical sites in the near future. Additionally, patients with comorbidities such as diabetes mellitus, obesity, liver cirrhosis, kidney diseases, hypertension, and asthma are at higher risk for severe illness from COVID-19. Thus, the relationship of comorbidity disease with COVID-19 may help to find repurposable drugs. To reduce trial and error in finding treatments for COVID-19, we propose building a network-based drug repurposing framework to prioritize repurposable drugs. First, we utilized knowledge of COVID-19 to construct a disease-gene-drug network (DGDr-Net) representing a COVID-19-centric interactome with components for diseases, genes, and drugs. DGDr-Net consisted of 592 diseases, 26,681 human genes and 2,173 drugs, and medical information for 18 common comorbidities. The DGDr-Net recommended candidate repurposable drugs for COVID-19 through network reinforcement driven scoring algorithms. The scoring algorithms determined the priority of recommendations by utilizing graph-based semi-supervised learning. From the predicted scores, we recommended 30 drugs, including dexamethasone, resveratrol, methotrexate, indomethacin, quercetin, etc., as repurposable drugs for COVID-19, and the results were verified with drugs that have been under clinical trials. The list of drugs via a data-driven computational approach could help reduce trial-and-error in finding treatment for COVID-19. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: 4 figures

arXiv:2006.00688 [pdf, other]

A Mathematical Description of Bacterial Chemotaxis in Response to Two Stimuli

Authors: Jeungeun Park, Zahra Aminzare

Abstract: Bacteria are often exposed to multiple stimuli in complex environments, and their efficient chemotactic decisions are critical to survive and grow in their native environments. Bacterial responses to the environmental stimuli depend on the ratio of their corresponding chemoreceptors. By incorporating the signaling machinery of individual cells, we analyze the collective motion of a population of E… ▽ More Bacteria are often exposed to multiple stimuli in complex environments, and their efficient chemotactic decisions are critical to survive and grow in their native environments. Bacterial responses to the environmental stimuli depend on the ratio of their corresponding chemoreceptors. By incorporating the signaling machinery of individual cells, we analyze the collective motion of a population of Escherichia coli bacteria in response to two stimuli, mainly serine and methyl-aspartate (MeAsp), in a one-dimensional and a two-dimensional environment, which is inspired by experimental results in Y. Kalinin et al., J. Bacteriol. 192(7):1796-1800, 2010. Under suitable conditions, we show that if the ratio of the main chemoreceptors of individual cells, namely Tar/Tsr is less than a specific threshold, the bacteria move to the gradient of serine, and if the ratio is greater than the threshold, the group of bacteria move toward the gradient of MeAsp. Finally, we examine the theory with Monte-Carlo agent-based simulations, and verify that our results qualitatively agree well with the experimental results in Y. Kalinin et al. (2010). △ Less

Submitted 8 June, 2021; v1 submitted 31 May, 2020; originally announced June 2020.

MSC Class: 35Q92; 58J55; 60J75; 92B05; 92C17; 92D25

arXiv:2005.12425 [pdf]

doi 10.1242/jeb.224121

Absolute ethanol intake drives ethanol preference in Drosophila

Authors: Scarlet J. Park, William W. Ja

Abstract: Factors that mediate ethanol preference in Drosophila melanogaster are not well understood. A major confound has been the use of diverse methods to estimate ethanol consumption. We measured fly consumptive ethanol preference on base diets varying in nutrients, taste, and ethanol concentration. Both sexes showed ethanol preference that was abolished on high nutrient concentration diets. Additionall… ▽ More Factors that mediate ethanol preference in Drosophila melanogaster are not well understood. A major confound has been the use of diverse methods to estimate ethanol consumption. We measured fly consumptive ethanol preference on base diets varying in nutrients, taste, and ethanol concentration. Both sexes showed ethanol preference that was abolished on high nutrient concentration diets. Additionally, manipulating total food intake without altering the nutritive value of the base diet or the ethanol concentration was sufficient to evoke or eliminate ethanol preference. Absolute ethanol intake and food volume consumed were stronger predictors of ethanol preference than caloric intake or the dietary caloric content. Our findings suggest that the effect of the base diet on ethanol preference is largely mediated by total consumption associated with the delivery medium, which ultimately determines the level of ethanol intake. We speculate that a physiologically relevant threshold for ethanol intake is essential for preferential ethanol consumption. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 11 pages, 2 figures, 1 table. Complete raw data accessible from https://github.com/HungryFly/JaLab/raw/master/publications/ethanol_JEB/SI_dataset.xlsx This version of the manuscript is original submission before undergoing peer review process. Final accepted and published version of this manuscript is available from https://doi.org/10.1242/jeb.224121 J Exp Biol (2020)

arXiv:2002.02601 [pdf, other]

Bidimensional linked matrix factorization for pan-omics pan-cancer analysis

Authors: Eric F. Lock, Jun Young Park, Katherine A. Hoadley

Abstract: Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies… ▽ More Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies have been limited by available statistical methodology. We propose a flexible approach to the simultaneous factorization and decomposition of variation across such bidimensionally linked matrices, BIDIFAC+. This decomposes variation into a series of low-rank components that may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., cancer types). This builds on a growing literature for the factorization and decomposition of linked matrices, which has primarily focused on multiple matrices that are linked in one dimension (rows or columns) only. Our objective function extends nuclear norm penalization, is motivated by random matrix theory, gives an identifiable decomposition under relatively mild conditions, and can be shown to give the mode of a Bayesian posterior distribution. We apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and specific modes of variability across 4 different omics platforms and 29 different cancer types. △ Less

Submitted 7 April, 2022; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: 26 pages, 5 figures

Journal ref: Annals of Applied Statistics 2022, Vol. 16, No. 1, 193-215

arXiv:1909.03992 [pdf]

Acoustomicrofluidic separation of tardigrades from raw cultures for sample preparation

Authors: Muhammad Afzal, **soo Park, Ghulam Destgeer, Husnain Ahmed, Syed Atif Iqrar, Sanghee Kim, Sunghyun Kang, Anas Alazzam, Tae-Sung Yoon, Hyung ** Sung

Abstract: Tardigrades are microscopic animals widely known for their survival capabilities under extreme conditions. They are the focus of current research in the fields of taxonomy, biogeography, genomics, proteomics, development, space biology, evolution, and ecology. Tardigrades, such as Hypsibius exemplaris, are being advocated as a next-generation model organism for genomic and developmental studies. T… ▽ More Tardigrades are microscopic animals widely known for their survival capabilities under extreme conditions. They are the focus of current research in the fields of taxonomy, biogeography, genomics, proteomics, development, space biology, evolution, and ecology. Tardigrades, such as Hypsibius exemplaris, are being advocated as a next-generation model organism for genomic and developmental studies. The raw culture of H. exemplaris usually contains tardigrades themselves, their eggs, and algal food and feces. Experimentation with tardigrades often requires the demanding and laborious separation of tardigrades from raw samples to prepare pure and contamination-free tardigrade samples. In this paper, we propose a two-step acousto-microfluidic separation method to isolate tardigrades from raw samples. In the first step, a passive microfluidic filter composed of an array of traps is used to remove large algal clusters in the raw sample. In the second step, a surface acoustic wave-based active microfluidic separation device is used to continuously deflect tardigrades from their original streamlines inside the microchannel and thus selectively isolate them from algae and eggs. The experimental results demonstrated the efficient tardigrade separation with a recovery rate of 96% and an algae impurity of 4% on average in a continuous, contactless, automated, rapid, biocompatible manner. △ Less

Submitted 9 September, 2019; originally announced September 2019.

arXiv:1907.09738 [pdf]

doi 10.1109/ACCESS.2019.2952098

Robust Nucleus Detection with Partially Labeled Exemplars

Authors: Linqing Feng, Jun Ho Song, Jiwon Kim, Soomin Jeong, ** Sung Park, **hyun Kim

Abstract: Quantitative analysis of cell nuclei in microscopic images is an essential yet challenging source of biological and pathological information. The major challenge is accurate detection and segmentation of densely packed nuclei in images acquired under a variety of conditions. Mask R-CNN-based methods have achieved state-of-the-art nucleus segmentation. However, the current pipeline requires fully a… ▽ More Quantitative analysis of cell nuclei in microscopic images is an essential yet challenging source of biological and pathological information. The major challenge is accurate detection and segmentation of densely packed nuclei in images acquired under a variety of conditions. Mask R-CNN-based methods have achieved state-of-the-art nucleus segmentation. However, the current pipeline requires fully annotated training images, which are time consuming to create and sometimes noisy. Importantly, nuclei often appear similar within the same image. This similarity could be utilized to segment nuclei with only partially labeled training examples. We propose a simple yet effective region-proposal module for the current Mask R-CNN pipeline to perform few-exemplar learning. To capture the similarities between unlabeled regions and labeled nuclei, we apply decomposed self-attention to learned features. On the self-attention map, we observe strong activation at the centers and edges of all nuclei, including unlabeled nuclei. On this basis, our region-proposal module propagates partial annotations to the whole image and proposes effective bounding boxes for the bounding box-regression and binary mask-generation modules. Our method effectively learns from unlabeled regions thereby improving detection performance. We test our method with various nuclear images. When trained with only 1/4 of the nuclei annotated, our approach retains a detection accuracy comparable to that from training with fully annotated data. Moreover, our method can serve as a bootstrap** step to create full annotations of datasets, iteratively generating and correcting annotations until a predetermined coverage and accuracy are reached. The source code is available at https://github.com/feng-lab/nuclei. △ Less

Submitted 13 November, 2019; v1 submitted 23 July, 2019; originally announced July 2019.

Journal ref: IEEE Access, vol. 7, pp. 162169-162178, 2019

arXiv:1907.02058 [pdf, other]

doi 10.1103/PhysRevE.100.042303

Evolution of cooperation driven by active information spreading

Authors: Bin Wu, Hye ** Park, Lingshan Wu, Da Zhou

Abstract: Cooperators forgo their interest to benefit others. Thus cooperation should not be favored by natural selection. It challenges the evolutionists, since cooperation is widespread. As one of the resolutions, information spreading has been revealed to play a key role in the emergence of cooperation. Individuals, however, are typically assumed to be passive in the information spreading. Here we assume… ▽ More Cooperators forgo their interest to benefit others. Thus cooperation should not be favored by natural selection. It challenges the evolutionists, since cooperation is widespread. As one of the resolutions, information spreading has been revealed to play a key role in the emergence of cooperation. Individuals, however, are typically assumed to be passive in the information spreading. Here we assume that individuals are active to spread the information via self-recommendation. Individuals with higher intensities of self-recommendation are likely to have more neighbors. We find that i) eloquent cooperators are necessary to promote cooperation; ii) individuals need to be open to the self-recommendation to enhance cooperation level; iii) the cost-to-benefit ratio should be smaller than one minus the ratio between self-recommendation intensities of defector and cooperator, which qualitatively measures the viscosity of the population. Our results highlight the importance of active information spreading on cooperation. △ Less

Submitted 3 July, 2019; originally announced July 2019.

Comments: 14 pages, 4 figures

Journal ref: Phys. Rev. E 100, 042303 (2019)

arXiv:1906.03722 [pdf, other]

doi 10.1111/biom.13141

Integrative Factorization of Bidimensionally Linked Matrices

Authors: Jun Young Park, Eric F. Lock

Abstract: Advances in molecular "omics'" technologies have motivated new methodology for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of b… ▽ More Advances in molecular "omics'" technologies have motivated new methodology for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (e.g., multiple cohorts measured on multiple platforms), which are increasingly common in large-scale biomedical studies. In this paper, we propose BIDIFAC (Bidimensional Integrative Factorization) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes the data into (i) globally shared, (ii) row-shared, (iii) column-shared, and (iv) single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation we use a penalized objective function that extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from random matrix theory to choose tuning parameters. We apply our method to integrate two genomics platforms (mRNA and miRNA expression) across two sample cohorts (tumor samples and normal tissue samples) using the breast cancer data from TCGA. We provide R code for fitting BIDIFAC, imputing missing values, and generating simulated data. △ Less

Submitted 9 June, 2019; originally announced June 2019.

Comments: 27 pages, 4 figures

Journal ref: Biometrics, 2019

arXiv:1903.06375 [pdf]

doi 10.1063/5.0091597

Exploiting product molecule number to consider reaction rate fluctuation in elementary reactions

Authors: Seong Jun Park

Abstract: In many chemical reactions, reaction rate fluctuation is inevitable. Reaction rates are different whenever chemical reaction occurs due to their dependence on the number of reaction events or the product number. As such, understanding the impact of rate fluctuation on product number counting statistics is of the utmost importance when develo** a quantitative explanation of chemical reactions. In… ▽ More In many chemical reactions, reaction rate fluctuation is inevitable. Reaction rates are different whenever chemical reaction occurs due to their dependence on the number of reaction events or the product number. As such, understanding the impact of rate fluctuation on product number counting statistics is of the utmost importance when develo** a quantitative explanation of chemical reactions. In this work, we present a master equation that describes reaction rates as a function of product number and time. Our equal reveals the relationship between the reaction rate and product number fluctuation. Product number counting statistics uncovers a stochastic property of the product number; product number directly manipulates the reaction rate. Specifically, we find that product number shows super-Poisson characteristics when the product number increases, inducing an increase in the reaction rate. While, on the other hand, when the product number shows sub-Poisson characteristics with an increase in the product number, this is induced by a decrease in the reaction rate. Furthermore, our analysis exploits reaction rate fluctuation, enabling the quantification of the deviation of an elementary reaction process from a renewal process. △ Less

Submitted 18 March, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

arXiv:1903.06370 [pdf]

Product number counting statistics from stochastic bursting birth-death processes

Authors: Seong Jun Park, Jaeyoung Sung

Abstract: Bursting and non-renewal processes are common phenomena in birth-death process, yet no theory can quantitatively describe a non-renewal birth process with bursting. Here, we present a theoretical model that yields the product number counting statistics of product creation occurring in bursts and of a non-renewal creation process. When product creation is a stationary process, our model confirms th… ▽ More Bursting and non-renewal processes are common phenomena in birth-death process, yet no theory can quantitatively describe a non-renewal birth process with bursting. Here, we present a theoretical model that yields the product number counting statistics of product creation occurring in bursts and of a non-renewal creation process. When product creation is a stationary process, our model confirms that product number fluctuation decreases with an increase in the product lifetime fluctuation, originating from the non-Poisson degradation dynamics, a result obtained in previous work. Our model additionally demonstrates that the dependence of product number fluctuation on product lifetime fluctuation varies with time, when product creation is a non-stationary process. We find that bursting increases product number fluctuation, compared to birth-processes without bursting. At time zero, in a burst-less birth process, product number fluctuation is unsurprisingly found to be zero, but we discover that, in a bulk creation process characterized by bursting, product number fluctuation is a finite value at time zero. The analytic expressions we obtain are applicable to many fields related to the study system population, such as queueing models and gene expression. △ Less

Submitted 27 August, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

arXiv:1808.06047 [pdf]

Frequency spectrum of biological noise: a probe of reaction dynamics in living cells

Authors: Sanggeun Song, Gil-Suk Yang, Seong Jun Park, Ji-Hyun Kim, Jaeyoung Sung

Abstract: Even in the steady-state, the number of biomolecules in living cells fluctuates dynamically; and the frequency spectrum of this chemical fluctuation carries valuable information about the mechanism and the dynamics of the intracellular reactions creating these biomolecules. Although recent advances in single-cell experimental techniques enable the direct monitoring of the time-traces of the biolog… ▽ More Even in the steady-state, the number of biomolecules in living cells fluctuates dynamically; and the frequency spectrum of this chemical fluctuation carries valuable information about the mechanism and the dynamics of the intracellular reactions creating these biomolecules. Although recent advances in single-cell experimental techniques enable the direct monitoring of the time-traces of the biological noise in each cell, the development of the theoretical tools needed to extract the information encoded in the stochastic dynamics of intracellular chemical fluctuation is still in its adolescence. Here, we present a simple and general equation that relates the power-spectrum of the product number fluctuation to the product lifetime and the reaction dynamics of the product creation process. By analyzing the time traces of the protein copy number using this theory, we can extract the power spectrum of the mRNA number, which cannot be directly measured by currently available experimental techniques. From the power spectrum of the mRNA number, we can further extract quantitative information about the transcriptional regulation dynamics. Our power spectrum analysis of gene expression noise is demonstrated for the gene network model of luciferase expression under the control of the Bmal 1a promoter in mouse fibroblast cells. Additionally, we investigate how the non-Poisson reaction dynamics and the cell-to-cell heterogeneity in transcription and translation affect the power-spectra of the mRNA and protein number. △ Less

Submitted 18 August, 2018; originally announced August 2018.

Comments: Main text: 29 pages, 4 figures Supporting Information: 42 pages, 4 supplementary figures

arXiv:1805.10422 [pdf, other]

doi 10.1088/1367-2630/aade6b

Generalized gravity model for human migration

Authors: Hye ** Park, Woo Seong Jo, Sang Hoon Lee, Beom Jun Kim

Abstract: The gravity model (GM) analogous to Newton's law of universal gravitation has successfully described the flow between different spatial regions, such as human migration, traffic flows, international economic trades, etc. This simple but powerful approach relies only on the 'mass' factor represented by the scale of the regions and the 'geometrical' factor represented by the geographical distance. H… ▽ More The gravity model (GM) analogous to Newton's law of universal gravitation has successfully described the flow between different spatial regions, such as human migration, traffic flows, international economic trades, etc. This simple but powerful approach relies only on the 'mass' factor represented by the scale of the regions and the 'geometrical' factor represented by the geographical distance. However, when the population has a subpopulation structure distinguished by different attributes, the estimation of the flow solely from the coarse-grained geographical factors in the GM causes the loss of differential geographical information for each attribute. To exploit the full information contained in the geographical information of subpopulation structure, we generalize the GM for population flow by explicitly harnessing the subpopulation properties characterized by both attributes and geography. As a concrete example, we examine the marriage patterns between the bride and the groom clans of Korea in the past. By exploiting more refined geographical and clan information, our generalized GM properly describes the real data, a part of which could not be explained by the conventional GM. Therefore, we would like to emphasize the necessity of using our generalized version of the GM, when the information on such nongeographical subpopulation structures is available. △ Less

Submitted 18 September, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

Comments: 14 pages, 6 figures, 2 tables

Journal ref: New J. Phys. 20, 093018 (2018)

arXiv:1710.01339 [pdf, other]

doi 10.1103/PhysRevE.96.042412

Extinction dynamics from meta-stable coexistences in an evolutionary game

Authors: Hye ** Park, Arne Traulsen

Abstract: Deterministic evolutionary game dynamics can lead to stable coexistences of different types. Stochasticity, however, drives the loss of such coexistences. This extinction is usually accompanied by population size fluctuations. We investigate the most probable extinction trajectory under such fluctuations by map** a stochastic evolutionary model to a problem of classical mechanics using the Wentz… ▽ More Deterministic evolutionary game dynamics can lead to stable coexistences of different types. Stochasticity, however, drives the loss of such coexistences. This extinction is usually accompanied by population size fluctuations. We investigate the most probable extinction trajectory under such fluctuations by map** a stochastic evolutionary model to a problem of classical mechanics using the Wentzel-Kramers-Brillouin (WKB) approximation. Our results show that more abundant types in a coexistence can be more likely to go extinct first well agreed with previous results, and also the distance between the coexistence and extinction point is not a good predictor of extinction. Instead, the WKB method correctly predicts the type going extinct first. △ Less

Submitted 3 October, 2017; originally announced October 2017.

Journal ref: Phys. Rev. E 96, 042412 (2017)

arXiv:1705.04660 [pdf]

doi 10.1038/s41567-018-0089-9

Universal geometric constraints during epithelial jamming

Authors: Lior Atia, Dapeng Bi, Yasha Sharma, Jennifer A. Mitchel, Bomi Gweon, Stephan Koehler, Stephen J. DeCamp, Bo Lan, Rebecca Hirsch, Adrian F. Pegoraro, Kyu Ha Lee, Jacqueline Starr, David A. Weitz, Adam C. Martin, **-Ah Park, James P. Butler, Jeffrey J. Fredberg

Abstract: As an injury heals, an embryo develops, or a carcinoma spreads, epithelial cells systematically change their shape. In each of these processes cell shape is studied extensively, whereas variation of shape from cell-to-cell is dismissed most often as biological noise. But where do cell shape and variation of cell shape come from? Here we report that cell shape and shape variation are mutually const… ▽ More As an injury heals, an embryo develops, or a carcinoma spreads, epithelial cells systematically change their shape. In each of these processes cell shape is studied extensively, whereas variation of shape from cell-to-cell is dismissed most often as biological noise. But where do cell shape and variation of cell shape come from? Here we report that cell shape and shape variation are mutually constrained through a relationship that is purely geometrical. That relationship is shown to govern maturation of the pseudostratified bronchial epithelial layer cultured from both non-asthmatic and asthmatic donors as well as formation of the ventral furrow in the epithelial monolayer of the Drosophila embryo in vivo. Across these and other vastly different epithelial systems, cell shape variation collapses to a family of distributions that is common to all and potentially universal. That distribution, in turn, is accounted for quantitatively by a mechanistic theory of cell-cell interaction showing that cell shape becomes progressively less elongated and less variable as the layer becomes progressively more jammed. These findings thus uncover a connection between jamming and geometry that is generic -spanning jammed living and inert systems alike- and demonstrate that proximity of the cell layer to the jammed state is the principal determinant of the most primitive features of epithelial cell shape and shape variation. △ Less

Submitted 12 May, 2017; originally announced May 2017.

Comments: First three authors had equal contribution | Video links are given in the Supplementary Videos section (pages 31-32)

arXiv:1704.01693 [pdf]

doi 10.1038/s41467-019-10729-5

Switch-like enhancement of epithelial-mesenchymal transition by YAP through feedback regulation of WT1 and small Rho-family GTPases

Authors: **Seok Park, Deok-Ho Kim, Sagar R. Shah, Hong-Nam Kim, Kshitiz, David Ellison, Peter Kim, Kahp-Yang Suh, Alfredo Quiñones-Hinojosa, Andre Levchenko

Abstract: Collective cell migration is a hallmark of developmental and patho-physiological states, including wound healing and invasive cancer growth. The integrity of the expanding epithelial sheets can be influenced by extracellular cues, including cell-cell and cell-matrix interactions. We show the nano-scale topography of the extracellular matrix underlying epithelial cell layers can have a strong effec… ▽ More Collective cell migration is a hallmark of developmental and patho-physiological states, including wound healing and invasive cancer growth. The integrity of the expanding epithelial sheets can be influenced by extracellular cues, including cell-cell and cell-matrix interactions. We show the nano-scale topography of the extracellular matrix underlying epithelial cell layers can have a strong effect on the speed and morphology of the fronts of the expanding sheet triggering epithelial-mesenchymal transition (EMT). We further demonstrate that this behavior depends on the mechano-sensitivity of the transcription regulator YAP and two new feedback cross-regulation mechanisms: through Wilms Tumor-1 and E-cadherin, loosening cell-cell contacts, and through Rho GTPase family proteins, enhancing cell migration. These YAP-dependent regulatory feedback loops result in a switch-like change in the signaling and expression of EMT-related markers, leading to a robust enhancement in invasive epithelial sheet expansion, which might lead to a poorer clinical outcome in renal and other cancers. △ Less

Submitted 5 April, 2017; originally announced April 2017.

arXiv:1612.08948 [pdf]

doi 10.1073/pnas.1700054114

A mechano-chemical feedback underlies co-existence of qualitatively distinct cell polarity patterns within diverse cell populations

Authors: **Seok Park, William R. Holmes, Sung-Hoon Lee, Hong-Nam Kim, Deok-Ho Kim, Moon Kyu Kwak, Chiaochun Joanne Wang, Kahp-Yang Suh, Leah Edelstein-Keshet, Andre Levchenko

Abstract: Cell polarization and directional cell migration can display random, persistent and oscillatory dynamic patterns. However, it is not clear if these polarity patterns can be explained by the same underlying regulatory mechanism. Here, we show that random, persistent and oscillatory migration accompanied by polarization can simultaneously occur in populations of melanoma cells derived from tumors wi… ▽ More Cell polarization and directional cell migration can display random, persistent and oscillatory dynamic patterns. However, it is not clear if these polarity patterns can be explained by the same underlying regulatory mechanism. Here, we show that random, persistent and oscillatory migration accompanied by polarization can simultaneously occur in populations of melanoma cells derived from tumors with different degrees of aggressiveness. We demonstrate that all these patterns and the probabilities of their occurrence are quantitatively accounted for by a simple mechanism involving a spatially distributed, mechano-chemical feedback coupling the dynamically changing extracellular matrix (ECM)-cell contacts to the activation of signaling downstream of the Rho-family small GTPases. This mechanism is supported by a predictive mathematical model and extensive experimental validation, and can explain previously reported results for diverse cell types. In melanoma, this mechanism also accounts for the effects of genetic and environmental perturbations, including mutations linked to invasive cell spread. The resulting mechanistic understanding of cell polarity quantitatively captures the relationship between population variability and phenotypic plasticity, with the potential to account for a wide variety of cell migration states in diverse pathological and physiological conditions. △ Less

Submitted 28 December, 2016; originally announced December 2016.

arXiv:1611.00730 [pdf, other]

doi 10.1371/journal.pcbi.1005524

A mathematical model coupling polarity signaling to cell adhesion explains diverse cell migration patterns

Authors: William R. Holmes, **Seok Park, Andre Levchenko, Leah Edelstein-Keshet

Abstract: Cells crawling through tissues migrate inside a complex fibrous environment called the extracellular matrix (ECM), which provides signals regulating motility. Here we investigate one such well-known pathway, involving mutually antagonistic signalling molecules (small GTPases Rac and Rho) that control the protrusion and contraction of the cell edges (lamellipodia). Invasive melanoma cells were obse… ▽ More Cells crawling through tissues migrate inside a complex fibrous environment called the extracellular matrix (ECM), which provides signals regulating motility. Here we investigate one such well-known pathway, involving mutually antagonistic signalling molecules (small GTPases Rac and Rho) that control the protrusion and contraction of the cell edges (lamellipodia). Invasive melanoma cells were observed migrating on surfaces with topography (array of posts), coated with adhesive molecules (fibronectin, FN) by Park et al., 2016. Several distinct qualitative behaviors they observed included persistent polarity, oscillation between the cell front and back, and random dynamics. To gain insight into the link between intracellular and ECM signaling, we compared experimental observations to a sequence of mathematical models encoding distinct hypotheses. The successful model required several critical factors. (1) Competition of lamellipodia for limited pools of GTPases. (2) Protrusion / contraction of lamellipodia influence ECM signaling. (3) ECM-mediated activation of Rho. A model combining these elements explains all three cellular behaviors and correctly predicts the results of experimental perturbations. This study yields new insight into how the dynamic interactions between intracellular signaling and the cell's environment influence cell behavior. △ Less

Submitted 2 November, 2016; originally announced November 2016.

arXiv:1509.02377 [pdf]

ERK/p38 MAPK inhibition reduces radio-resistance to pulsed proton beam in breast cancer stem cells cells

Authors: Myung-Hwan Jung, Jeong Chan Park

Abstract: Recent studies have identified highly tumorigenic cells with stem cell-like characteristics in human cancers, termed cancer stem cells (CSCs). CSCs are resistant to conventional radiotherapy and chemotherapy owing to their high DNA repair ability and oncogene overexpression. However, the mechanisms regulating CSC radio-resistance, particularly proton beam resistance, remain unclear. We isolated CS… ▽ More Recent studies have identified highly tumorigenic cells with stem cell-like characteristics in human cancers, termed cancer stem cells (CSCs). CSCs are resistant to conventional radiotherapy and chemotherapy owing to their high DNA repair ability and oncogene overexpression. However, the mechanisms regulating CSC radio-resistance, particularly proton beam resistance, remain unclear. We isolated CSCs from the breast cancer cell lines MCF-7 and MDA-MB-231, which expressed the characteristic breast CSC membrane protein markers CD44+/CD24-/low, and irradiated the CSCs with pulsed proton beams. We confirmed that CSCs are resistant to pulsed proton beams and showed that treatment with p38 and ERK inhibitors reduced CSC radioresistance. Based on these results, BCSC radio-resistance can be reduced during proton beam therapy by co-treatment with ERK1/2 or p38 inhibitors, representing a novel approach for breast cancer therapy. △ Less

Submitted 14 July, 2015; originally announced September 2015.

arXiv:1507.04863 [pdf]

Study of the Effects of High-Energy Proton Beams on Escherichia Coli

Authors: Jeong Chan Park, Myung-Hwan Jung

Abstract: Antibiotic-resistant bacterial infection becomes one of the most serious risks to public health care today. However, discouragingly, the development of new antibiotics has been little progressed over the last decade. There is an urgent need of the alternative approaches to treat the antibiotic-resistant bacteria. The novel methods, which include photothermal therapy based on gold nano-materials an… ▽ More Antibiotic-resistant bacterial infection becomes one of the most serious risks to public health care today. However, discouragingly, the development of new antibiotics has been little progressed over the last decade. There is an urgent need of the alternative approaches to treat the antibiotic-resistant bacteria. The novel methods, which include photothermal therapy based on gold nano-materials and ionizing radiation such as X-rays and gamma rays, have been reported. Studies of the effects of high-energy proton radiation on bacteria are mainly focused on Bacillus species and its spores. The effect of proton beams on Escherichia coli (E. coli) has been limitedly reported. The Escherichia coli is an important biological tool to obtain the metabolic and genetic information and also a common model microorganism for studying toxicity and antimicrobial activity. In addition, E. coli is a common bacterium in the intestinal tract of mammals. Herein, the morphological and physiological changes of E. coli after proton irradiation were investigated. The diluted solutions of the cells were used for proton beam radiation. LB agar plates were used to count the number of colonies formed. The growing profile of the cells was monitored by optical density at 600 nm. The morphology of the irradiated cells was analyzed with optical microscope. Microarray analysis was performed to examine the gene expression changes between irradiated samples and control samples without irradiation. △ Less

Submitted 17 July, 2015; originally announced July 2015.

arXiv:1506.02685 [pdf, other]

Quantifying Spatio-Temporal Variation of Invasion Spread

Authors: Joshua Goldstein, Jaewoo Park, Murali Haran, Andrew Liebhold, Ottar N. Bjornstad

Abstract: The spread of invasive species can have far reaching environmental and ecological consequences. Understanding invasion spread patterns and the underlying process driving invasions are key to predicting and managing invasions. We combine a set of statistical methods in a novel way to characterize local spread properties and demonstrate their application using simulated and historical data on invasi… ▽ More The spread of invasive species can have far reaching environmental and ecological consequences. Understanding invasion spread patterns and the underlying process driving invasions are key to predicting and managing invasions. We combine a set of statistical methods in a novel way to characterize local spread properties and demonstrate their application using simulated and historical data on invasive insects. Our method uses a Gaussian process fit to the surface of waiting times to invasion in order to characterize the vector field of spread. Using this method we estimate with statistical uncertainties the speed and direction of spread at each location. Simulations from a stratified diffusion model verify the accuracy of our method. We show how we may link local rates of spread to environmental covariates for two case studies: the spread of the gypsy moth (Lymantria dispar), and hemlock wolly adelgid (Adelges tsugae) in North America. We provide an R-package that automates the calculations for any spatially referenced waiting time data. △ Less

Submitted 10 October, 2018; v1 submitted 8 June, 2015; originally announced June 2015.

arXiv:1503.00385 [pdf, other]

Atomic Scale Design and Three-Dimensional Simulation of Ionic Diffusive Nanofluidic Channels

Authors: ** Kyoung Park, Kelin Xia, Guo-Wei We

Abstract: Recent advance in nanotechnology has led to rapid advances in nanofluidics, which has been established as a reliable means for a wide variety of applications, including molecular separation, detection, crystallization and biosynthesis. Although atomic and molecular level consideration is a key ingredient in experimental design and fabrication of nanfluidic systems, atomic and molecular modeling of… ▽ More Recent advance in nanotechnology has led to rapid advances in nanofluidics, which has been established as a reliable means for a wide variety of applications, including molecular separation, detection, crystallization and biosynthesis. Although atomic and molecular level consideration is a key ingredient in experimental design and fabrication of nanfluidic systems, atomic and molecular modeling of nanofluidics is rare and most simulations at nanoscale are restricted to one- or two-dimensions in the literature, to our best knowledge. The present work introduces atomic scale design and three-dimensional (3D) simulation of ionic diffusive nanofluidic systems. We propose a variational multiscale framework to represent the nanochannel in discrete atomic and/or molecular detail while describe the ionic solution by continuum. Apart from the major electrostatic and entropic effects, the non-electrostatic interactions between the channel and solution, and among solvent molecules are accounted in our modeling. We derive generalized Poisson-Nernst-Planck (PNP) equations for nanofluidic systems. Mathematical algorithms, such as Dirichlet to Neumann map** and the matched interface and boundary (MIB) methods are developed to rigorously solve the aforementioned equations to the second-order accuracy in 3D realistic settings. Three ionic diffusive nanofluidic systems, including a negatively charged nanochannel, a bipolar nanochannel and a double-well nanochannel are designed to investigate the impact of atomic charges to channel current, density distribution and electrostatic potential. Numerical findings, such as gating, ion depletion and inversion, are in good agreements with those from experimental measurements and numerical simulations in the literature. △ Less

Submitted 1 March, 2015; originally announced March 2015.

Comments: 20 figures. arXiv admin note: text overlap with arXiv:1412.0176 by other authors

Showing 1–50 of 62 results for author: Park, J