-
On the Robustness of AlphaFold: A COVID-19 Case Study
Authors:
Ismail Alkhouri,
Sumit Jha,
Andre Beckus,
George Atia,
Alvaro Velasquez,
Rickard Ewetz,
Arvind Ramanathan,
Susmit Jha
Abstract:
Protein folding neural networks (PFNNs) such as AlphaFold predict remarkably accurate structures of proteins compared to other approaches. However, the robustness of such networks has heretofore not been explored. This is particularly relevant given the broad social implications of such technologies and the fact that biologically small perturbations in the protein sequence do not generally lead to…
▽ More
Protein folding neural networks (PFNNs) such as AlphaFold predict remarkably accurate structures of proteins compared to other approaches. However, the robustness of such networks has heretofore not been explored. This is particularly relevant given the broad social implications of such technologies and the fact that biologically small perturbations in the protein sequence do not generally lead to drastic changes in the protein structure. In this paper, we demonstrate that AlphaFold does not exhibit such robustness despite its high accuracy. This raises the challenge of detecting and quantifying the extent to which these predicted protein structures can be trusted. To measure the robustness of the predicted structures, we utilize (i) the root-mean-square deviation (RMSD) and (ii) the Global Distance Test (GDT) similarity measure between the predicted structure of the original sequence and the structure of its adversarially perturbed version. We prove that the problem of minimally perturbing protein sequences to fool protein folding neural networks is NP-complete. Based on the well-established BLOSUM62 sequence alignment scoring matrix, we generate adversarial protein sequences and show that the RMSD between the predicted protein structure and the structure of the original sequence are very large when the adversarial changes are bounded by (i) 20 units in the BLOSUM62 distance, and (ii) five residues (out of hundreds or thousands of residues) in the given protein sequence. In our experimental evaluation, we consider 111 COVID-19 proteins in the Universal Protein resource (UniProt), a central resource for protein data managed by the European Bioinformatics Institute, Swiss Institute of Bioinformatics, and the US Protein Information Resource. These result in an overall GDT similarity test score average of around 34%, demonstrating a substantial drop in the performance of AlphaFold.
△ Less
Submitted 12 January, 2023; v1 submitted 10 January, 2023;
originally announced January 2023.
-
Protein Folding Neural Networks Are Not Robust
Authors:
Sumit Kumar Jha,
Arvind Ramanathan,
Rickard Ewetz,
Alvaro Velasquez,
Susmit Jha
Abstract:
Deep neural networks such as AlphaFold and RoseTTAFold predict remarkably accurate structures of proteins compared to other algorithmic approaches. It is known that biologically small perturbations in the protein sequence do not lead to drastic changes in the protein structure. In this paper, we demonstrate that RoseTTAFold does not exhibit such a robustness despite its high accuracy, and biologic…
▽ More
Deep neural networks such as AlphaFold and RoseTTAFold predict remarkably accurate structures of proteins compared to other algorithmic approaches. It is known that biologically small perturbations in the protein sequence do not lead to drastic changes in the protein structure. In this paper, we demonstrate that RoseTTAFold does not exhibit such a robustness despite its high accuracy, and biologically small perturbations for some input sequences result in radically different predicted protein structures. This raises the challenge of detecting when these predicted protein structures cannot be trusted. We define the robustness measure for the predicted structure of a protein sequence to be the inverse of the root-mean-square distance (RMSD) in the predicted structure and the structure of its adversarially perturbed sequence. We use adversarial attack methods to create adversarial protein sequences, and show that the RMSD in the predicted protein structure ranges from 0.119Å to 34.162Å when the adversarial perturbations are bounded by 20 units in the BLOSUM62 distance. This demonstrates very high variance in the robustness measure of the predicted structures. We show that the magnitude of the correlation (0.917) between our robustness measure and the RMSD between the predicted structure and the ground truth is high, that is, the predictions with low robustness measure cannot be trusted. This is the first paper demonstrating the susceptibility of RoseTTAFold to adversarial attacks.
△ Less
Submitted 19 September, 2021; v1 submitted 9 September, 2021;
originally announced September 2021.
-
A computational tool for trend analysis and forecast of the COVID-19 pandemic
Authors:
Henrique Mohallem Paiva,
Rubens Junqueira Magalhaes Afonso,
Fabiana Mara Scarpelli de Lima Alvarenga Caldeira,
Ester de Andrade Velasquez
Abstract:
Purpose: This paper proposes a methodology and a computational tool to study the COVID-19 pandemic throughout the world and to perform a trend analysis to assess its local dynamics.
Methods: Mathematical functions are employed to describe the number of cases and demises in each region and to predict their final numbers, as well as the dates of maximum daily occurrences and the local stabilizatio…
▽ More
Purpose: This paper proposes a methodology and a computational tool to study the COVID-19 pandemic throughout the world and to perform a trend analysis to assess its local dynamics.
Methods: Mathematical functions are employed to describe the number of cases and demises in each region and to predict their final numbers, as well as the dates of maximum daily occurrences and the local stabilization date. The model parameters are calibrated using a computational methodology for numerical optimization. Trend analyses are run, allowing to assess the effects of public policies. Easy to interpret metrics over the quality of the fitted curves are provided. Country-wise data from the European Centre for Disease Prevention and Control (ECDC) concerning the daily number of cases and demises around the world are used, as well as detailed data from Johns Hopkins University and from the Brasil.io project describing individually the occurrences in United States counties and in Brazilian states and cities, respectively. U. S. and Brazil were chosen for a more detailed analysis because they are the current foci of the pandemic.
Results: Illustrative results for different countries, U. S. counties and Brazilian states and cities are presented and discussed.
Conclusion: The main contributions of this work lie in (i) a straightforward model of the curves to represent the data, which allows automation of the process without requiring interventions from experts; (ii) an innovative approach for trend analysis, whose results provide important information to support authorities in their decision-making process; and (iii) the developed computational tool, which is freely available and allows the user to quickly update the COVID-19 analyses and forecasts for any country, United States county or Brazilian state or city present in the periodic reports from the authorities.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.