-
igraph enables fast and robust network analysis across programming languages
Authors:
Michael Antonov,
Gábor Csárdi,
Szabolcs Horvát,
Kirill Müller,
Tamás Nepusz,
Daniel Noom,
Maëlle Salmon,
Vincent Traag,
Brooke Foucault Welles,
Fabio Zanini
Abstract:
Networks or graphs are widely used across the sciences to represent relationships of many kinds. igraph (https://igraph.org) is a general-purpose software library for graph construction, analysis, and visualisation, combining fast and robust performance with a low entry barrier. igraph pairs a fast core written in C with beginner-friendly interfaces in Python, R, and Mathematica. Over the last two…
▽ More
Networks or graphs are widely used across the sciences to represent relationships of many kinds. igraph (https://igraph.org) is a general-purpose software library for graph construction, analysis, and visualisation, combining fast and robust performance with a low entry barrier. igraph pairs a fast core written in C with beginner-friendly interfaces in Python, R, and Mathematica. Over the last two decades, igraph has expanded substantially. It now scales to billions of edges, supports Mathematica and interactive plotting, integrates with Jupyter notebooks and other network libraries, includes new graph layouts and community detection algorithms, and has streamlined the documentation with examples and Spanish translations. Modern testing features such as continuous integration, address sanitizers, stricter ty**, and memory-managed vectors have also increased robustness. Hundreds of bug reports have been fixed and a community forum has been opened to connect users and developers. Specific effort has been made to broaden use and community participation by women, non-binary people, and other demographic groups typically underrepresented in open source software.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off
Authors:
Zichen Zhang,
Johannes Kirschner,
Junxi Zhang,
Francesco Zanini,
Alex Ayoub,
Masood Dehghan,
Dale Schuurmans
Abstract:
A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle. Yet, many applications involve continuous-time systems where the time discretization, in principle, can be managed. The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its…
▽ More
A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle. Yet, many applications involve continuous-time systems where the time discretization, in principle, can be managed. The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its effect could reveal opportunities for improving data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation for LQR systems and uncover a fundamental trade-off between approximation and statistical error in value estimation. Importantly, these two errors behave differently to time discretization, leading to an optimal choice of temporal resolution for a given data budget. These findings show that managing the temporal resolution can provably improve policy evaluation efficiency in LQR systems with finite data. Empirically, we demonstrate the trade-off in numerical simulations of LQR instances and standard RL benchmarks for non-linear continuous control.
△ Less
Submitted 16 January, 2024; v1 submitted 17 December, 2022;
originally announced December 2022.
-
IGraph/M: graph theory and network analysis for Mathematica
Authors:
Szabolcs Horvát,
Jakub Podkalicki,
Gábor Csárdi,
Tamás Nepusz,
Vincent Traag,
Fabio Zanini,
Daniel Noom
Abstract:
IGraph/M is an efficient general purpose graph theory and network analysis package for Mathematica. IGraph/M serves as the Wolfram Language interfaces to the igraph C library, and also provides several unique pieces of functionality not yet present in igraph, but made possible by combining its capabilities with Mathematica's. The package is designed to support both graph theoretical research as we…
▽ More
IGraph/M is an efficient general purpose graph theory and network analysis package for Mathematica. IGraph/M serves as the Wolfram Language interfaces to the igraph C library, and also provides several unique pieces of functionality not yet present in igraph, but made possible by combining its capabilities with Mathematica's. The package is designed to support both graph theoretical research as well as the analysis of large-scale empirical networks.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
To Compute or not to Compute? Adaptive Smart Sensing in Resource-Constrained Edge Computing
Authors:
Luca Ballotta,
Giovanni Peserico,
Francesco Zanini,
Paolo Dini
Abstract:
We consider a network of smart sensors for an edge computing application that sample a time-varying signal and send updates to a base station for remote global monitoring. Sensors are equipped with sensing and compute, and can either send raw data or process them on-board before transmission. Limited hardware resources at the edge generate a fundamental latency-accuracy trade-off: raw measurements…
▽ More
We consider a network of smart sensors for an edge computing application that sample a time-varying signal and send updates to a base station for remote global monitoring. Sensors are equipped with sensing and compute, and can either send raw data or process them on-board before transmission. Limited hardware resources at the edge generate a fundamental latency-accuracy trade-off: raw measurements are inaccurate but timely, whereas accurate processed updates are available after processing delay. Hence, one needs to decide when sensors should transmit raw measurements or rely on local processing to maximize network monitoring performance. To tackle this sensing design problem, we model an estimation-theoretic optimization framework that embeds both computation and communication latency, and propose a Reinforcement Learning-based approach that dynamically allocates computational resources at each sensor. Effectiveness of our proposed approach is validated through numerical experiments motivated by smart sensing for the Internet of Drones and self-driving vehicles. In particular, we show that, under constrained computation at the base station, monitoring performance can be further improved by an online sensor selection.
△ Less
Submitted 18 August, 2023; v1 submitted 5 September, 2022;
originally announced September 2022.
-
A Reinforcement Learning Approach to Sensing Design in Resource-Constrained Wireless Networked Control Systems
Authors:
Luca Ballotta,
Giovanni Peserico,
Francesco Zanini
Abstract:
In this paper, we consider a wireless network of smart sensors (agents) that monitor a dynamical process and send measurements to a base station that performs global monitoring and decision-making. Smart sensors are equipped with both sensing and computation, and can either send raw measurements or process them prior to transmission. Constrained agent resources raise a fundamental latency-accuracy…
▽ More
In this paper, we consider a wireless network of smart sensors (agents) that monitor a dynamical process and send measurements to a base station that performs global monitoring and decision-making. Smart sensors are equipped with both sensing and computation, and can either send raw measurements or process them prior to transmission. Constrained agent resources raise a fundamental latency-accuracy trade-off. On the one hand, raw measurements are inaccurate but fast to produce. On the other hand, data processing on resource-constrained platforms generates accurate measurements at the cost of non-negligible computation latency. Further, if processed data are also compressed, latency caused by wireless communication might be higher for raw measurements. Hence, it is challenging to decide when and where sensors in the network should transmit raw measurements or leverage time-consuming local processing. To tackle this design problem, we propose a Reinforcement Learning approach to learn an efficient policy that dynamically decides when measurements are to be processed at each sensor. Effectiveness of our proposed approach is validated through a numerical simulation with case study on smart sensing motivated by the Internet of Drones.
△ Less
Submitted 10 January, 2024; v1 submitted 1 April, 2022;
originally announced April 2022.
-
Analysing high-throughput sequencing data in Python with HTSeq 2.0
Authors:
Givanna H Putri,
Simon Anders,
Paul Theodor Pyl,
John E Pimanda,
Fabio Zanini
Abstract:
Summary: HTSeq 2.0 provides a more extensive API including a new representation for sparse genomic data, enhancements in htseq-count to suit single cell omics, a new script for data using cell and molecular barcodes, improved documentation, testing and deployment, bug fixes, and Python 3 support. Availability and implementation: HTSeq 2.0 is released as an open-source software under the GNU Genera…
▽ More
Summary: HTSeq 2.0 provides a more extensive API including a new representation for sparse genomic data, enhancements in htseq-count to suit single cell omics, a new script for data using cell and molecular barcodes, improved documentation, testing and deployment, bug fixes, and Python 3 support. Availability and implementation: HTSeq 2.0 is released as an open-source software under the GNU General Public Licence and available from the Python Package Index at https://pypi.python.org/pypi/HTSeq. The source code is available on Github at https://github.com/htseq/htseq. Contact: [email protected]
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Estimating Koopman operators for nonlinear dynamical systems: a nonparametric approach
Authors:
Francesco Zanini,
Alessandro Chiuso
Abstract:
The Koopman operator is a mathematical tool that allows for a linear description of non-linear systems, but working in infinite dimensional spaces. Dynamic Mode Decomposition and Extended Dynamic Mode Decomposition are amongst the most popular finite dimensional approximation. In this paper we capture their core essence as a dual version of the same framework, incorporating them into the Kernel fr…
▽ More
The Koopman operator is a mathematical tool that allows for a linear description of non-linear systems, but working in infinite dimensional spaces. Dynamic Mode Decomposition and Extended Dynamic Mode Decomposition are amongst the most popular finite dimensional approximation. In this paper we capture their core essence as a dual version of the same framework, incorporating them into the Kernel framework. To do so, we leverage the RKHS as a suitable space for learning the Koopman dynamics, thanks to its intrinsic finite-dimensional nature, shaped by data. We finally establish a strong link between kernel methods and Koopman operators, leading to the estimation of the latter through Kernel functions. We provide also simulations for comparison with standard procedures.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Pore-Scale Transport and Two-Phase Fluid Structures in Fibrous Porous Layers: Application to Fuel Cells and Beyond
Authors:
Meisam Farzaneh,
Henrik Ström,
Filippo Zanini,
Simone Carmignato,
Srdjan Sasic,
Dario Maggiolo
Abstract:
We present pore-scale simulations of two-phase flows in a reconstructed fibrous porous layer. The three dimensional microstructure of the material, a fuel cell gas diffusion layer, is acquired via X-ray computed tomography and used as input for lattice Boltzmann simulations. We perform a quantitative analysis of the multiphase pore-scale dynamics and we identify the dominant fluid structures gover…
▽ More
We present pore-scale simulations of two-phase flows in a reconstructed fibrous porous layer. The three dimensional microstructure of the material, a fuel cell gas diffusion layer, is acquired via X-ray computed tomography and used as input for lattice Boltzmann simulations. We perform a quantitative analysis of the multiphase pore-scale dynamics and we identify the dominant fluid structures governing mass transport. The results show the existence of three different regimes of transport: a fast inertial dynamics at short times, characterised by a compact uniform front, a viscous-capillary regime at intermediate times, where liquid is transported along a gradually increasing number of preferential flow paths of the size of one-two pores, and a third regime at longer times, where liquid, after having reached the outlet, is exclusively flowing along such flow paths and the two-phase fluid structures are stabilised. We observe that the fibrous layer presents significant variations in its microscopic morphology, which have an important effect on the pore invasion dynamics, and counteract the stabilising viscous force. Liquid transport is indeed affected by the presence of microstructure-induced capillary pressures acting adversely to the flow, leading to capillary fingering transport mechanism and unstable front displacement, even in the absence of hydrophobic treatments of the porous material. We propose a macroscopic model based on an effective contact angle that mimics the effects of the such a dynamic capillary pressure. Finally, we underline the significance of the results for the optimal design of face masks in an effort to mitigate the current COVID-19 pandemic.
△ Less
Submitted 10 November, 2020; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Establishment and stability of the latent HIV-1 DNA reservoir
Authors:
Johanna Brodin,
Fabio Zanini,
Lina Thebo,
Christa Lanz,
Göran Bratt,
Richard A. Neher,
Jan Albert
Abstract:
HIV-1 infection currently cannot be cured because the virus persists as integrated proviral DNA in long-lived cells despite years of suppressive antiretroviral therapy (ART). To characterize establishment, turnover, and evolution of viral DNA reservoirs we deep-sequenced the p17gag region of the HIV-1 genome from samples obtained after 3-18 years of suppressive ART from 10 patients. For each of th…
▽ More
HIV-1 infection currently cannot be cured because the virus persists as integrated proviral DNA in long-lived cells despite years of suppressive antiretroviral therapy (ART). To characterize establishment, turnover, and evolution of viral DNA reservoirs we deep-sequenced the p17gag region of the HIV-1 genome from samples obtained after 3-18 years of suppressive ART from 10 patients. For each of these patients, whole genome deep-sequencing data of HIV-1 RNA populations before onset of ART were available from 6-12 longitudinal plasma samples spanning 5-8 years of untreated infection. This enabled a detailed analysis of the dynamics and origin of proviral DNA during ART. A median of 14% (range 0-42%) of the p17gag DNA sequences were overtly defective due to G-to-A hypermutation. The remaining sequences were remarkably similar to previously observed RNA sequences and showed no evidence of evolution over many years of suppressive ART. Most sequences from the DNA reservoirs were very similar to viruses actively replicating in plasma (RNA sequences) shortly before start of ART. The results do not support persistent HIV-1 replication as a mechanism to maintain the HIV-1 reservoir during suppressive therapy. Rather, the data indicate that viral DNA variants are turning over as long as patients are untreated and that suppressive ART halts this turnover.
△ Less
Submitted 24 May, 2016; v1 submitted 17 May, 2016;
originally announced May 2016.
-
In-vivo mutation rates and fitness landscape of HIV-1
Authors:
Fabio Zanini,
Vadim Puller,
Johanna Brodin,
Jan Albert,
Richard Neher
Abstract:
Mutation rates and fitness costs of deleterious mutations are difficult to measure in vivo but essential for a quantitative understanding of evolution. Using whole genome deep sequencing data from longitudinal samples during untreated HIV-1 infection, we estimated mutation rates and fitness costs in HIV-1 from the temporal dynamics of genetic variation. At approximately neutral sites, mutations ac…
▽ More
Mutation rates and fitness costs of deleterious mutations are difficult to measure in vivo but essential for a quantitative understanding of evolution. Using whole genome deep sequencing data from longitudinal samples during untreated HIV-1 infection, we estimated mutation rates and fitness costs in HIV-1 from the temporal dynamics of genetic variation. At approximately neutral sites, mutations accumulate with a rate of 1.2 x 10^-5 per site per day, in agreement with the rate measured in cell cultures. The rate from G to A is largest, followed by the other transitions C to T, T to C, and A to G, while transversions are more rare. At non-neutral sites, most mutations reduce virus replication; using a model of mutation selection balance, we estimated the fitness cost of mutations at every site in the HIV-1 genome. About half of all nonsynonymous mutations have large fitness costs (greater than 10\%), while most synonymous mutations have costs below 1\%. The cost of synonymous mutations is especially low in most of gag and pol, while much higher costs are observed in important RNA structures and regulatory regions. The intrapatient fitness cost estimates are consistent across multiple patients, suggesting that the deleterious part of the fitness landscape is universal and explains a large fraction of global HIV-1 group M diversity.
△ Less
Submitted 1 July, 2016; v1 submitted 21 March, 2016;
originally announced March 2016.
-
Population genomics of intrapatient HIV-1 evolution
Authors:
Fabio Zanini,
Johanna Brodin,
Lina Thebo,
Christa Lanz,
Göran Bratt,
Jan Albert,
Richard A. Neher
Abstract:
Many microbial populations rapidly adapt to changing environments with multiple variants competing for survival. To quantify such complex evolutionary dynamics in vivo, time resolved and genome wide data including rare variants are essential. We performed whole-genome deep sequencing of HIV-1 populations in 9 untreated patients, with 6-12 longitudinal samples per patient spanning 5-8 years of infe…
▽ More
Many microbial populations rapidly adapt to changing environments with multiple variants competing for survival. To quantify such complex evolutionary dynamics in vivo, time resolved and genome wide data including rare variants are essential. We performed whole-genome deep sequencing of HIV-1 populations in 9 untreated patients, with 6-12 longitudinal samples per patient spanning 5-8 years of infection. We show that patterns of minor diversity are reproducible between patients and mirror global HIV-1 diversity, suggesting a universal landscape of fitness costs that control diversity. Reversions towards the ancestral HIV-1 sequence are observed throughout infection and account for almost one third of all sequence changes. Reversion rates depend strongly on conservation. Frequent recombination limits linkage disequilibrium to about 100bp in most of the genome, but strong hitch-hiking due to short range linkage limits diversity.
△ Less
Submitted 8 September, 2015;
originally announced September 2015.
-
Deleterious synonymous mutations hitchhike to high frequency in HIV-1 env evolution
Authors:
Fabio Zanini,
Richard A. Neher
Abstract:
Intrapatient HIV-1 evolution is dominated by selection on the protein level in the arms race with the adaptive immune system. When cytotoxic CD8+ T-cells or neutralizing antibodies target a new epitope, the virus often escapes via nonsynonymous mutations that impair recognition. Synonymous mutations do not affect this interplay and are often assumed to be neutral. We analyze longitudinal intrapati…
▽ More
Intrapatient HIV-1 evolution is dominated by selection on the protein level in the arms race with the adaptive immune system. When cytotoxic CD8+ T-cells or neutralizing antibodies target a new epitope, the virus often escapes via nonsynonymous mutations that impair recognition. Synonymous mutations do not affect this interplay and are often assumed to be neutral. We analyze longitudinal intrapatient data from the C2-V5 part of the envelope gene (env) and observe that synonymous derived alleles rarely fix even though they often reach high frequencies in the viral population. We find that synonymous mutations that disrupt base pairs in RNA stems flanking the variable loops of gp120 are more likely to be lost than other synonymous changes, hinting at a direct fitness effect of these stem-loop structures in the HIV-1 RNA. Computational modeling indicates that these synonymous mutations have a (Malthusian) selection coefficient of the order of -0.002 and that they are brought up to high frequency by hitchhiking on neighboring beneficial nonsynonymous alleles. The patterns of fixation of nonsynonymous mutations estimated from the longitudinal data and comparisons with computer models suggest that escape mutations in C2-V5 are only transiently beneficial, either because the immune system is catching up or because of competition between equivalent escapes.
△ Less
Submitted 4 March, 2013;
originally announced March 2013.
-
FFPopSim: An efficient forward simulation package for the evolution of large populations
Authors:
Fabio Zanini,
Richard A. Neher
Abstract:
The analysis of the evolutionary dynamics of a population with many polymorphic loci is challenging since a large number of possible genotypes needs to be tracked. In the absence of analytical solutions, forward computer simulations are an important tool in multi-locus population genetics. The run time of standard algorithms to simulate sexual populations increases as 8^L with the number L of loci…
▽ More
The analysis of the evolutionary dynamics of a population with many polymorphic loci is challenging since a large number of possible genotypes needs to be tracked. In the absence of analytical solutions, forward computer simulations are an important tool in multi-locus population genetics. The run time of standard algorithms to simulate sexual populations increases as 8^L with the number L of loci, or with the square of the population size N. We have developed algorithms that allow to simulate large populations with a run-time that scales as 3^L. The algorithm is based on an analog of the Fast-Fourier Transform (FFT) and allows for arbitrary fitness functions (i.e. any epistasis) and genetic maps. The algorithm is implemented as a collection of C++ classes and a Python interface.
△ Less
Submitted 30 July, 2012;
originally announced July 2012.
-
Viscosity and Diffusion: Crowding and Salt Effects in Protein Solutions
Authors:
Marco Heinen,
Fabio Zanini,
Felix Roosen-Runge,
Diana Fedunová,
Fajun Zhang,
Marcus Hennig,
Tilo Seydel,
Ralf Schweins,
Michael Sztucki,
Marián Antalík,
Frank Schreiber,
Gerhard Nägele
Abstract:
We report on a joint experimental-theoretical study of collective diffusion in, and static shear viscosity of solutions of bovine serum albumin (BSA) proteins, focusing on the dependence on protein and salt concentration. Data obtained from dynamic light scattering and rheometric measurements are compared to theoretical calculations based on an analytically treatable spheroid model of BSA with iso…
▽ More
We report on a joint experimental-theoretical study of collective diffusion in, and static shear viscosity of solutions of bovine serum albumin (BSA) proteins, focusing on the dependence on protein and salt concentration. Data obtained from dynamic light scattering and rheometric measurements are compared to theoretical calculations based on an analytically treatable spheroid model of BSA with isotropic screened Coulomb plus hard-sphere interactions. The only input to the dynamics calculations is the static structure factor obtained from a consistent theoretical fit to a concentration series of small-angle X-ray scattering (SAXS) data. This fit is based on an integral equation scheme that combines high accuracy with low computational cost. All experimentally probed dynamic and static properties are reproduced theoretically with an at least semi-quantitative accuracy. For lower protein concentration and low salinity, both theory and experiment show a maximum in the reduced viscosity, caused by the electrostatic repulsion of proteins. The validity range of a generalized Stokes-Einstein (GSE) relation connecting viscosity, collective diffusion coefficient, and osmotic compressibility, proposed by Kholodenko and Douglas [PRE 51, 1081 (1995)] is examined. Significant violation of the GSE relation is found, both in experimental data and in theoretical models, in semi-dilute systems at physiological salinity, and under low-salt conditions for arbitrary protein concentrations.
△ Less
Submitted 14 September, 2011;
originally announced September 2011.