-
Standardised workflow for mass spectrometry-based single-cell proteomics data processing and analysis using the scp package
Authors:
Samuel Grégoire,
Christophe Vanderaa,
Sébastien Pyr dit Ruys,
Gabriel Mazzucchelli,
Christopher Kune,
Didier Vertommen,
Laurent Gatto
Abstract:
Mass spectrometry (MS) based single-cell proteomics (SCP) explores cellular heterogeneity by focusing on the functional effectors of the cells - proteins. However, extracting meaningful biological information from MS data is far from trivial, especially with single cells. Currently, data analysis workflows are substantially different from one research team to another. Moreover,it is difficult to e…
▽ More
Mass spectrometry (MS) based single-cell proteomics (SCP) explores cellular heterogeneity by focusing on the functional effectors of the cells - proteins. However, extracting meaningful biological information from MS data is far from trivial, especially with single cells. Currently, data analysis workflows are substantially different from one research team to another. Moreover,it is difficult to evaluate pipelines as ground truths are missing. Our team has developed the R/Bioconductor package called scp to provide a standardised framework for SCP data analysis. It relies on the widely used QFeatures and SingleCellExperiment data structures. In addition, we used a design containing cell lines mixed in known proportions to generate controlled variability for data analysis benchmarking. In this work, we provide a flexible data analysis protocol for SCP data using the scp package together with comprehensive explanations at each step of the processing. Our main steps are quality control on the feature and cell level, aggregation of the raw data into peptides and proteins, normalisation and batch correction. We validate our workflow using our ground truth data set. We illustrate how to use this modular, standardised framework and highlight some crucial steps.
△ Less
Submitted 13 December, 2023; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Revisiting the thorny issue of missing values in single-cell proteomics
Authors:
Christophe Vanderaa,
Laurent Gatto
Abstract:
Missing values are a notable challenge when analysing mass spectrometry-based proteomics data. While the field is still actively debating on the best practices, the challenge increased with the emergence of mass spectrometry-based single-cell proteomics and the dramatic increase in missing values. A popular approach to deal with missing values is to perform imputation. Imputation has several drawb…
▽ More
Missing values are a notable challenge when analysing mass spectrometry-based proteomics data. While the field is still actively debating on the best practices, the challenge increased with the emergence of mass spectrometry-based single-cell proteomics and the dramatic increase in missing values. A popular approach to deal with missing values is to perform imputation. Imputation has several drawbacks for which alternatives exist, but currently imputation is still a practical solution widely adopted in single-cell proteomics data analysis. This perspective discusses the advantages and drawbacks of imputation. We also highlight 5 main challenges linked to missing value management in single-cell proteomics. Future developments should aim to solve these challenges, whether it is through imputation or data modelling. The perspective concludes with recommendations for reporting missing values, for reporting methods that deal with missing values and for proper encoding of missing values.
△ Less
Submitted 11 July, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
The current state of single-cell proteomics data analysis
Authors:
Christophe Vanderaa,
Laurent Gatto
Abstract:
Sound data analysis is essential to retrieve meaningful biological information from single-cell proteomics experiments. This analysis is carried out by computational methods that are assembled into workflows, and their implementations influence the conclusions that can be drawn from the data. In this work, we explore and compare the computational workflows that have been used over the last four ye…
▽ More
Sound data analysis is essential to retrieve meaningful biological information from single-cell proteomics experiments. This analysis is carried out by computational methods that are assembled into workflows, and their implementations influence the conclusions that can be drawn from the data. In this work, we explore and compare the computational workflows that have been used over the last four years and identify a profound lack of consensus on how to analyze single-cell proteomics data. We highlight the need for benchmarking of computational workflows, standardization of computational tools and data, as well as carefully designed experiments. Finally, we cover the current standardization efforts that aim to fill the gap and list the remaining missing pieces, and conclude with lessons learned from the replication of published single-cell proteomics analyses.
△ Less
Submitted 1 December, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experiments
Authors:
Laurent Gatto,
Ruedi Aebersold,
Juergen Cox,
Vadim Demichev,
Jason Derks,
Edward Emmott,
Alexander M. Franks,
Alexander R. Ivanov,
Ryan T. Kelly,
Luke Khoury,
Andrew Leduc,
Michael J. MacCoss,
Peter Nemes,
David H. Perlman,
Aleksandra A. Petelski,
Christopher M. Rose,
Erwin M. Schoof,
Jennifer Van Eyk,
Christophe Vanderaa,
John R. Yates III,
Nikolai Slavov
Abstract:
Analyzing proteins from single cells by tandem mass spectrometry (MS) has become technically feasible. While such analysis has the potential to accurately quantify thousands of proteins across thousands of single cells, the accuracy and reproducibility of the results may be undermined by numerous factors affecting experimental design, sample preparation, data acquisition, and data analysis. Broadl…
▽ More
Analyzing proteins from single cells by tandem mass spectrometry (MS) has become technically feasible. While such analysis has the potential to accurately quantify thousands of proteins across thousands of single cells, the accuracy and reproducibility of the results may be undermined by numerous factors affecting experimental design, sample preparation, data acquisition, and data analysis. Broadly accepted community guidelines and standardized metrics will enhance rigor, data quality, and alignment between laboratories. Here we propose best practices, quality controls, and data reporting recommendations to assist in the broad adoption of reliable quantitative workflows for single-cell proteomics.
△ Less
Submitted 12 September, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.