-
Impact of phylogeny on the inference of functional sectors from protein sequence data
Authors:
Nicola Dietler,
Alia Abbara,
Subham Choudhury,
Anne-Florence Bitbol
Abstract:
Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that natural selection on an additive functional trait of a protein is generically expected to give rise…
▽ More
Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that natural selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Impact of phylogeny on structural contact inference from protein sequence data
Authors:
Nicola Dietler,
Umberto Lupo,
Anne-Florence Bitbol
Abstract:
Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino-acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating contr…
▽ More
Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino-acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalise to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference.
△ Less
Submitted 5 January, 2023; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences
Authors:
Andonis Gerardos,
Nicola Dietler,
Anne-Florence Bitbol
Abstract:
Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combin…
▽ More
Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural data set, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.
△ Less
Submitted 17 May, 2022; v1 submitted 22 November, 2021;
originally announced November 2021.
-
Shubnikov-de Haas oscillations in optical conductivity of monolayer MoSe$_2$
Authors:
T. Smoleński,
O. Cotlet,
A. Popert,
P. Back,
Y. Shimazaki,
P. Knüppel,
N. Dietler,
T. Taniguchi,
K. Watanabe,
M. Kroner,
A. Imamoglu
Abstract:
We report polarization-resolved resonant reflection spectroscopy of a charge-tunable atomically-thin valley semiconductor hosting tightly bound excitons coupled to a dilute system of fully spin- and valley-polarized holes in the presence of a strong magnetic field. We find that exciton-hole interactions manifest themselves in hole-density dependent, Shubnikov-de Haas-like oscillations in the energ…
▽ More
We report polarization-resolved resonant reflection spectroscopy of a charge-tunable atomically-thin valley semiconductor hosting tightly bound excitons coupled to a dilute system of fully spin- and valley-polarized holes in the presence of a strong magnetic field. We find that exciton-hole interactions manifest themselves in hole-density dependent, Shubnikov-de Haas-like oscillations in the energy and line broadening of the excitonic resonances. These oscillations are evidenced to be precisely correlated with the occupation of Landau levels, thus demonstrating that strong interactions between the excitons and Landau-quantized itinerant carriers enable optical investigation of quantum-Hall physics in transition metal dichalcogenides.
△ Less
Submitted 20 December, 2018;
originally announced December 2018.