Search | arXiv e-print repository

Peptide Vaccine Design by Evolutionary Multi-Objective Optimization

Authors: Dan-Xuan Liu, Yi-Heng Xu, Chao Qian

Abstract: Peptide vaccines are growing in significance for fighting diverse diseases. Machine learning has improved the identification of peptides that can trigger immune responses, and the main challenge of peptide vaccine design now lies in selecting an effective subset of peptides due to the allelic diversity among individuals. Previous works mainly formulated this task as a constrained optimization prob… ▽ More Peptide vaccines are growing in significance for fighting diverse diseases. Machine learning has improved the identification of peptides that can trigger immune responses, and the main challenge of peptide vaccine design now lies in selecting an effective subset of peptides due to the allelic diversity among individuals. Previous works mainly formulated this task as a constrained optimization problem, aiming to maximize the expected number of peptide-Major Histocompatibility Complex (peptide-MHC) bindings across a broad range of populations by selecting a subset of diverse peptides with limited size; and employed a greedy algorithm, whose performance, however, may be limited due to the greedy nature. In this paper, we propose a new framework PVD-EMO based on Evolutionary Multi-objective Optimization, which reformulates Peptide Vaccine Design as a bi-objective optimization problem that maximizes the expected number of peptide-MHC bindings and minimizes the number of selected peptides simultaneously, and employs a Multi-Objective Evolutionary Algorithm (MOEA) to solve it. We also incorporate warm-start and repair strategies into MOEAs to improve efficiency and performance. We prove that the warm-start strategy ensures that PVD-EMO maintains the same worst-case approximation guarantee as the previous greedy algorithm, and meanwhile, the EMO framework can help avoid local optima. Experiments on a peptide vaccine design for COVID-19, caused by the SARS-CoV-2 virus, demonstrate the superiority of PVD-EMO. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: This paper has appeared at IJCAI'24

arXiv:2401.04323 [pdf, other]

Divergent Characteristics of Biomedical Research across Publication Types: A Quantitative Analysis on the Aging-related Research

Authors: Chenxing Qian, Qingyue Guo

Abstract: This paper investigates differences in characteristics across publication types for aging-related genetic research. We utilized bibliometric data for five model species retrieved from authoritative databases including PubMed. Publications are classified into types according to PubMed. Results indicate substantial divergence across publication types in attention paid to aging-related research, scop… ▽ More This paper investigates differences in characteristics across publication types for aging-related genetic research. We utilized bibliometric data for five model species retrieved from authoritative databases including PubMed. Publications are classified into types according to PubMed. Results indicate substantial divergence across publication types in attention paid to aging-related research, scopes of studied genes, and topical preferences. For instance, comparative studies and meta-analyses show a greater focus on aging than validation studies. Reviews concentrate more on cell biology while clinical studies emphasize translational topics. Publication types also manifest variations in highly studied genes, like APOE for reviews versus GH1 for clinical studies. Despite differences, top genes like insulin are universally emphasized. Publication types demonstrate similar levels of imbalance in research efforts to genes. Differences also exist in bibliometrics like authorship numbers, citation counts, etc. Publication types show distinct preferences for journals of certain topical specialties and scope of readership. Overall, findings showcase distinct characteristics of publication types in studying aging-related genetics, owing to their unique nature and objectives. This study is the first endeavor to systematically depict the inherent structure of a biomedical research field from the perspective of publication types and provides insights into knowledge production and evaluation patterns across biomedical communities. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 43 pages, 3 tables, 9 figures, supplementary figures and tables attached in latex code

ACM Class: H.3.3

arXiv:2307.07443 [pdf, other]

Can Large Language Models Empower Molecular Property Prediction?

Authors: Chen Qian, Huayi Tang, Zhirui Yang, Hong Liang, Yong Liu

Abstract: Molecular property prediction has gained significant attention due to its transformative potential in multiple scientific disciplines. Conventionally, a molecule graph can be represented either as a graph-structured data or a SMILES text. Recently, the rapid development of Large Language Models (LLMs) has revolutionized the field of NLP. Although it is natural to utilize LLMs to assist in understa… ▽ More Molecular property prediction has gained significant attention due to its transformative potential in multiple scientific disciplines. Conventionally, a molecule graph can be represented either as a graph-structured data or a SMILES text. Recently, the rapid development of Large Language Models (LLMs) has revolutionized the field of NLP. Although it is natural to utilize LLMs to assist in understanding molecules represented by SMILES, the exploration of how LLMs will impact molecular property prediction is still in its early stage. In this work, we advance towards this objective through two perspectives: zero/few-shot molecular classification, and using the new explanations generated by LLMs as representations of molecules. To be specific, we first prompt LLMs to do in-context molecular classification and evaluate their performance. After that, we employ LLMs to generate semantically enriched explanations for the original SMILES and then leverage that to fine-tune a small-scale LM model for multiple downstream tasks. The experimental results highlight the superiority of text explanations as molecular representations across multiple benchmark datasets, and confirm the immense potential of LLMs in molecular property prediction tasks. Codes are available at \url{https://github.com/ChnQ/LLM4Mol}. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2305.01770 [pdf, other]

DeCom: Deep Coupled-Factorization Machine for Post COVID-19 Respiratory Syncytial Virus Prediction with Nonpharmaceutical Interventions Awareness

Authors: Xinyan Li, Cheng Qian, Lucas Glass

Abstract: Respiratory syncytial virus (RSV) is one of the most dangerous respiratory diseases for infants and young children. Due to the nonpharmaceutical intervention (NPI) imposed in the COVID-19 outbreak, the seasonal transmission pattern of RSV has been discontinued in 2020 and then shifted months ahead in 2021 in the northern hemisphere. It is critical to understand how COVID-19 impacts RSV and build p… ▽ More Respiratory syncytial virus (RSV) is one of the most dangerous respiratory diseases for infants and young children. Due to the nonpharmaceutical intervention (NPI) imposed in the COVID-19 outbreak, the seasonal transmission pattern of RSV has been discontinued in 2020 and then shifted months ahead in 2021 in the northern hemisphere. It is critical to understand how COVID-19 impacts RSV and build predictive algorithms to forecast the timing and intensity of RSV reemergence in post-COVID-19 seasons. In this paper, we propose a deep coupled tensor factorization machine, dubbed as DeCom, for post COVID-19 RSV prediction. DeCom leverages tensor factorization and residual modeling. It enables us to learn the disrupted RSV transmission reliably under COVID-19 by taking both the regular seasonal RSV transmission pattern and the NPI into consideration. Experimental results on a real RSV dataset show that DeCom is more accurate than the state-of-the-art RSV prediction algorithms and achieves up to 46% lower root mean square error and 49% lower mean absolute error for country-level prediction compared to the baselines. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2106.05181 [pdf]

Condition Integration Memory Network: An Interpretation of the Meaning of the Neuronal Design

Authors: Cheng Qian

Abstract: Understanding the basic operational logics of the nervous system is essential to advancing neuroscientific research. However, theoretical efforts to tackle this fundamental problem are lacking, despite the abundant empirical data about the brain that has been collected in the past few decades. To address this shortcoming, this document introduces a hypothetical framework for the functional nature… ▽ More Understanding the basic operational logics of the nervous system is essential to advancing neuroscientific research. However, theoretical efforts to tackle this fundamental problem are lacking, despite the abundant empirical data about the brain that has been collected in the past few decades. To address this shortcoming, this document introduces a hypothetical framework for the functional nature of primitive neural networks. It analyzes the idea that the activity of neurons and synapses can symbolically reenact the dynamic changes in the world and thus enable an adaptive system of behavior. More significantly, the network achieves this without participating in an algorithmic structure. When a neuron's activation represents some symbolic element in the environment, each of its synapses can indicate a potential change to the element and its future state. The efficacy of a synaptic connection further specifies the element's particular probability for, or contribution to, such a change. As it fires, a neuron's activation is transformed to its postsynaptic targets, resulting in a chronological shift of the represented elements. As the inherent function of summation in a neuron integrates the various presynaptic contributions, the neural network mimics the collective causal relationship of events in the observed environment. △ Less

Submitted 6 September, 2021; v1 submitted 21 May, 2021; originally announced June 2021.

Comments: 40 pages, 6 figures; added a section

arXiv:2012.04747 [pdf, other]

STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological Regularization

Authors: Nikos Kargas, Cheng Qian, Nicholas D. Sidiropoulos, Cao Xiao, Lucas M. Glass, Jimeng Sun

Abstract: Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent… ▽ More Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent epidemiological model regularization named STELAR. Unlike standard tensor factorization methods which cannot predict slabs ahead, STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations of a widely adopted epidemiological model. We use latent instead of location/attribute-level epidemiological dynamics to capture common epidemic profile sub-types and improve collaborative learning and prediction. We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic. Finally, we evaluate the predictive ability of our method and show superior performance compared to the baselines, achieving up to 21% lower root mean square error and 25% lower mean absolute error for county-level prediction. △ Less

Submitted 17 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: AAAI 2021

arXiv:2008.04215 [pdf]

STAN: Spatio-Temporal Attention Network for Pandemic Prediction Using Real World Evidence

Authors: Junyi Gao, Rakshith Sharma, Cheng Qian, Lucas M. Glass, Jeffrey Spaeder, Justin Romberg, Jimeng Sun, Cao Xiao

Abstract: Objective: The COVID-19 pandemic has created many challenges that need immediate attention. Various epidemiological and deep learning models have been developed to predict the COVID-19 outbreak, but all have limitations that affect the accuracy and robustness of the predictions. Our method aims at addressing these limitations and making earlier and more accurate pandemic outbreak predictions by (1… ▽ More Objective: The COVID-19 pandemic has created many challenges that need immediate attention. Various epidemiological and deep learning models have been developed to predict the COVID-19 outbreak, but all have limitations that affect the accuracy and robustness of the predictions. Our method aims at addressing these limitations and making earlier and more accurate pandemic outbreak predictions by (1) using patients' EHR data from different counties and states that encode local disease status and medical resource utilization condition; (2) considering demographic similarity and geographical proximity between locations; and (3) integrating pandemic transmission dynamics into deep learning models. Materials and Methods: We proposed a spatio-temporal attention network (STAN) for pandemic prediction. It uses an attention-based graph convolutional network to capture geographical and temporal trends and predict the number of cases for a fixed number of days into the future. We also designed a physical law-based loss term for enhancing long-term prediction. STAN was tested using both massive real-world patient data and open source COVID-19 statistics provided by Johns Hopkins university across all U.S. counties. Results: STAN outperforms epidemiological modeling methods such as SIR and SEIR and deep learning models on both long-term and short-term predictions, achieving up to 87% lower mean squared error compared to the best baseline prediction model. Conclusions: By using information from real-world patient data and geographical data, STAN can better capture the disease status and medical resource utilization information and thus provides more accurate pandemic modeling. With pandemic transmission law based regularization, STAN also achieves good long-term prediction performance. △ Less

Submitted 7 December, 2020; v1 submitted 23 July, 2020; originally announced August 2020.

arXiv:1810.12758 [pdf, ps, other]

From Gene Expression to Drug Response: A Collaborative Filtering Approach

Authors: Cheng Qian, Nicholas D. Sidiropoulos, Magda Amiridi, Amin Emad

Abstract: Predicting the response of cancer cells to drugs is an important problem in pharmacogenomics. Recent efforts in generation of large scale datasets profiling gene expression and drug sensitivity in cell lines have provided a unique opportunity to study this problem. However, one major challenge is the small number of samples (cell lines) compared to the number of features (genes) even in these larg… ▽ More Predicting the response of cancer cells to drugs is an important problem in pharmacogenomics. Recent efforts in generation of large scale datasets profiling gene expression and drug sensitivity in cell lines have provided a unique opportunity to study this problem. However, one major challenge is the small number of samples (cell lines) compared to the number of features (genes) even in these large datasets. We propose a collaborative filtering (CF) like algorithm for modeling gene-drug relationship to identify patients most likely to benefit from a treatment. Due to the correlation of gene expressions in different cell lines, the gene expression matrix is approximately low-rank, which suggests that drug responses could be estimated from a reduced dimension latent space of the gene expression. Towards this end, we propose a joint low-rank matrix factorization and latent linear regression approach. Experiments with data from the Genomics of Drug Sensitivity in Cancer database are included to show that the proposed method can predict drug-gene associations better than the state-of-the-art methods. △ Less

Submitted 30 October, 2018; v1 submitted 29 October, 2018; originally announced October 2018.

arXiv:1612.01413 [pdf, other]

Multi-stage Clustering of Breast Cancer for Precision Medicine

Authors: Chenzhe Qian

Abstract: Cancer has become one of the most widespread diseases in the world. Specifically, breast cancer is diagnosed more often than any other type of cancer. However, breast cancer patients and their individual tumors are often unique. Identifying the underlying genetic phenotype can lead to precision (personalized) medicine. Tailoring medical treatment strategies to best fit the needs of individual pati… ▽ More Cancer has become one of the most widespread diseases in the world. Specifically, breast cancer is diagnosed more often than any other type of cancer. However, breast cancer patients and their individual tumors are often unique. Identifying the underlying genetic phenotype can lead to precision (personalized) medicine. Tailoring medical treatment strategies to best fit the needs of individual patients can dramatically improve their health. Such an approach requires sufficient knowledge of the patients and the diseases, which is currently unavailable to practitioners. This study focuses on breast cancer and proposes a novel two-stage clustering method to partition patients into hierarchical groups. The first stage is broad grou**, which is based on phenotypes such as demographic information and clinical features. The second stage is fine grou** based on genomic characteristics, such as copy number variation and somatic mutation, of patients in a subgroup resulting from the first stage. Generally, this framework offers a mechanism to mix multiple forms of data, both phenotypic and genomic, to most effectively define individual patients for personalized predictions. This method provides the ability to detect correlation among all factors. △ Less

Submitted 2 December, 2016; originally announced December 2016.

Showing 1–9 of 9 results for author: Qian, C