-
Geodesic statistics for random network families
Authors:
Sahil Loomba,
Nick S. Jones
Abstract:
A key task in the study of networked systems is to derive local and global properties that impact connectivity, synchronizability, and robustness. Computing shortest paths or geodesics in the network yields measures of node centrality and network connectivity that can contribute to explain such phenomena. We derive an analytic distribution of shortest path lengths, on the giant component in the su…
▽ More
A key task in the study of networked systems is to derive local and global properties that impact connectivity, synchronizability, and robustness. Computing shortest paths or geodesics in the network yields measures of node centrality and network connectivity that can contribute to explain such phenomena. We derive an analytic distribution of shortest path lengths, on the giant component in the supercritical regime or on small components in the subcritical regime, of any sparse (possibly directed) graph with conditionally independent edges, in the infinite-size limit. We provide specific results for widely used network families like stochastic block models, dot-product graphs, random geometric graphs, and graphons. The survival function of the shortest path length distribution possesses a simple closed-form lower bound which is asymptotically tight for finite lengths, has a natural interpretation of traversing independent geodesics in the network, and delivers novel insight in the above network families. Notably, the shortest path length distribution allows us to derive, for the network families above, important graph properties like the bond percolation threshold, size of the giant component, average shortest path length, and closeness and betweenness centralities. We also provide a corroborative analysis of a set of 20 empirical networks. This unifying framework demonstrates how geodesic statistics for a rich family of random graphs can be computed cheaply without having access to true or simulated networks, especially when they are sparse but prohibitively large.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Modularity maximisation for graphons
Authors:
Florian Klimm,
Nick S. Jones,
Michael T. Schaub
Abstract:
Networks are a widely-used tool to investigate the large-scale connectivity structure in complex systems and graphons have been proposed as an infinite size limit of dense networks. The detection of communities or other meso-scale structures is a prominent topic in network science as it allows the identification of functional building blocks in complex systems. When such building blocks may be pre…
▽ More
Networks are a widely-used tool to investigate the large-scale connectivity structure in complex systems and graphons have been proposed as an infinite size limit of dense networks. The detection of communities or other meso-scale structures is a prominent topic in network science as it allows the identification of functional building blocks in complex systems. When such building blocks may be present in graphons is an open question. In this paper, we define a graphon-modularity and demonstrate that it can be maximised to detect communities in graphons. We then investigate specific synthetic graphons and show that they may show a wide range of different community structures. We also reformulate the graphon-modularity maximisation as a continuous optimisation problem and so prove the optimal community structure or lack thereof for some graphons, something that is usually not possible for networks. Furthermore, we demonstrate that estimating a graphon from network data as an intermediate step can improve the detection of communities, in comparison with exclusively maximising the modularity of the network. While the choice of graphon-estimator may strongly influence the accord between the community structure of a network and its estimated graphon, we find that there is a substantial overlap if an appropriate estimator is used. Our study demonstrates that community detection for graphons is possible and may serve as a privacy-preserving way to cluster network data.
△ Less
Submitted 2 January, 2021;
originally announced January 2021.
-
Influencing dynamics on social networks without knowledge of network microstructure
Authors:
Matthew Garrod,
Nick S. Jones
Abstract:
Social network based information campaigns can be used for promoting beneficial health behaviours and mitigating polarisation (e.g. regarding climate change or vaccines). Network-based intervention strategies typically rely on full knowledge of network structure. It is largely not possible or desirable to obtain population-level social network data due to availability and privacy issues. It is eas…
▽ More
Social network based information campaigns can be used for promoting beneficial health behaviours and mitigating polarisation (e.g. regarding climate change or vaccines). Network-based intervention strategies typically rely on full knowledge of network structure. It is largely not possible or desirable to obtain population-level social network data due to availability and privacy issues. It is easier to obtain information about individuals' attributes (e.g. age, income), which are jointly informative of an individual's opinions and their social network position. We investigate strategies for influencing the system state in a statistical mechanics based model of opinion formation. Using synthetic and data based examples we illustrate the advantages of implementing coarse-grained influence strategies on Ising models with modular structure in the presence of external fields. Our work provides a scalable methodology for influencing Ising systems on large graphs and the first exploration of the Ising influence problem in the presence of ambient (social) fields. By exploiting the observation that strong ambient fields can simplify control of networked dynamics, our findings open the possibility of efficiently computing and implementing public information campaigns using insights from social network theory without costly or invasive levels of data collection.
△ Less
Submitted 27 July, 2021; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Inference of a universal social scale and segregation measures using social connectivity kernels
Authors:
Till Hoffmann,
Nick S. Jones
Abstract:
How people connect with one another is a fundamental question in the social sciences, and the resulting social networks can have a profound impact on our daily lives. Blau offered a powerful explanation: people connect with one another based on their positions in a social space. Yet a principled measure of social distance, allowing comparison within and between societies, remains elusive. We use t…
▽ More
How people connect with one another is a fundamental question in the social sciences, and the resulting social networks can have a profound impact on our daily lives. Blau offered a powerful explanation: people connect with one another based on their positions in a social space. Yet a principled measure of social distance, allowing comparison within and between societies, remains elusive. We use the connectivity kernel of conditionally-independent edge models to develop a family of segregation statistics with desirable properties: they offer an intuitive and universal characteristic scale on social space (facilitating comparison across datasets and societies), are applicable to multivariate and mixed node attributes, and capture segregation at the level of individuals, pairs of individuals, and society as a whole. We show that the segregation statistics can induce a metric on Blau space (a space spanned by the attributes of the members of society) and provide maps of two societies. Under a Bayesian paradigm, we infer the parameters of the connectivity kernel from eleven ego-network datasets collected in four surveys in the United Kingdom and United States. The importance of different dimensions of Blau space is similar across time and location, suggesting a macroscopically stable social fabric. Physical separation and age differences have the most significant impact on segregation within friendship networks with implications for intergenerational mixing and isolation in later stages of life.
△ Less
Submitted 28 October, 2020; v1 submitted 12 August, 2020;
originally announced August 2020.
-
Democratizing University Research
Authors:
Nick S. Jones,
Oscar Ces
Abstract:
We detail an experimental programme we have been testing in our university. Our Advanced Hackspace, attempts to give all members of the university, from students to technicians, free access to the means to develop their own interdisciplinary research ideas, with resources including access to specialized fellows and biological and chemical hacklabs. We assess the aspects of our programme that led t…
▽ More
We detail an experimental programme we have been testing in our university. Our Advanced Hackspace, attempts to give all members of the university, from students to technicians, free access to the means to develop their own interdisciplinary research ideas, with resources including access to specialized fellows and biological and chemical hacklabs. We assess the aspects of our programme that led to our community being one of the largest collectives in our university and critically examine the successes and failures of our trial programmes. We supply metrics for assessing progress and outline challenges. We conclude with future directions that advance interdisciplinary research empowerment for all university members.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Inference and Influence of Large-Scale Social Networks Using Snapshot Population Behaviour without Network Data
Authors:
Antonia Godoy-Lorite,
Nick S. Jones
Abstract:
Population behaviours, such as voting and vaccination, depend on social networks. Social networks can differ depending on behaviour type and are typically hidden. However, we do often have large-scale behavioural data, albeit only snapshots taken at one timepoint. We present a method that jointly infers large-scale network structure and a networked model of human behaviour using only snapshot popu…
▽ More
Population behaviours, such as voting and vaccination, depend on social networks. Social networks can differ depending on behaviour type and are typically hidden. However, we do often have large-scale behavioural data, albeit only snapshots taken at one timepoint. We present a method that jointly infers large-scale network structure and a networked model of human behaviour using only snapshot population behavioural data. This exploits the simplicity of a few parameter, geometric socio-demographic network model and a spin based model of behaviour. We illustrate, for the EU Referendum and two London Mayoral elections, how the model offers both prediction and the interpretation of our homophilic inclinations. Beyond offering the extraction of behaviour specific network structure from large-scale behavioural datasets, our approach yields a crude calculus linking inequalities and social preferences to behavioural outcomes. We give examples of potential network sensitive policies: how changes to income inequality, a social temperature and homophilic preferences might have reduced polarisation in a recent election.
△ Less
Submitted 23 March, 2020; v1 submitted 16 March, 2020;
originally announced March 2020.
-
CompEngine: a self-organizing, living library of time-series data
Authors:
Ben D. Fulcher,
Carl H. Lubba,
Sarab S. Sethi,
Nick S. Jones
Abstract:
Modern biomedical applications often involve time-series data, from high-throughput phenoty** of model organisms, through to individual disease diagnosis and treatment using biomedical data streams. Data and tools for time-series analysis are developed and applied across the sciences and in industry, but meaningful cross-disciplinary interactions are limited by the challenge of identifying fruit…
▽ More
Modern biomedical applications often involve time-series data, from high-throughput phenoty** of model organisms, through to individual disease diagnosis and treatment using biomedical data streams. Data and tools for time-series analysis are developed and applied across the sciences and in industry, but meaningful cross-disciplinary interactions are limited by the challenge of identifying fruitful connections. Here we introduce the web platform, CompEngine, a self-organizing, living library of time-series data that lowers the barrier to forming meaningful interdisciplinary connections between time series. Using a canonical feature-based representation, CompEngine places all time series in a common space, regardless of their origin, allowing users to upload their data and immediately explore interdisciplinary connections to other data with similar properties, and be alerted when similar data is uploaded in the future. In contrast to conventional databases, which are organized by assigned metadata, CompEngine incentivizes data sharing by automatically connecting experimental and theoretical scientists across disciplines based on the empirical structure of their data. CompEngine's growing library of interdisciplinary time-series data also facilitates comprehensively characterization of algorithm performance across diverse types of data, and can be used to empirically motivate the development of new time-series analysis algorithms.
△ Less
Submitted 3 May, 2019;
originally announced May 2019.
-
Biochemical Szilard engines for memory-limited inference
Authors:
Rory A. Brittain,
Nick S. Jones,
Thomas E. Ouldridge
Abstract:
By develo** and leveraging an explicit molecular realisation of a measurement-and-feedback-powered Szilard engine, we investigate the extraction of work from complex environments by minimal machines with finite capacity for memory and decision-making. Living systems perform inference to exploit complex structure, or correlations, in their environment, but the physical limits and underlying cost/…
▽ More
By develo** and leveraging an explicit molecular realisation of a measurement-and-feedback-powered Szilard engine, we investigate the extraction of work from complex environments by minimal machines with finite capacity for memory and decision-making. Living systems perform inference to exploit complex structure, or correlations, in their environment, but the physical limits and underlying cost/benefit trade-offs involved in doing so remain unclear. To probe these questions, we consider a minimal model for a structured environment - a correlated sequence of molecules - and explore mechanisms based on extended Szilard engines for extracting the work stored in these non-equilibrium correlations. We consider systems limited to a single bit of memory making binary 'choices' at each step. We demonstrate that increasingly complex environments allow increasingly sophisticated inference strategies to extract more energy than simpler alternatives, and argue that optimal design of such machines should also consider the energy reserves required to ensure robustness against fluctuations due to mistakes.
△ Less
Submitted 17 May, 2019; v1 submitted 20 December, 2018;
originally announced December 2018.
-
Community detection in networks without observing edges
Authors:
Till Hoffmann,
Leto Peel,
Renaud Lambiotte,
Nick S. Jones
Abstract:
We develop a Bayesian hierarchical model to identify communities in networks for which we do not observe the edges directly, but instead observe a series of interdependent signals for each of the nodes. Fitting the model provides an end-to-end community detection algorithm that does not extract information as a sequence of point estimates but propagates uncertainties from the raw data to the commu…
▽ More
We develop a Bayesian hierarchical model to identify communities in networks for which we do not observe the edges directly, but instead observe a series of interdependent signals for each of the nodes. Fitting the model provides an end-to-end community detection algorithm that does not extract information as a sequence of point estimates but propagates uncertainties from the raw data to the community labels. Our approach naturally supports multiscale community detection as well as the selection of an optimal scale using model comparison. We study the properties of the algorithm using synthetic data and apply it to daily returns of constituents of the S&P100 index as well as climate data from US cities.
△ Less
Submitted 11 February, 2020; v1 submitted 18 August, 2018;
originally announced August 2018.
-
Large algebraic connectivity fluctuations in spatial network ensembles imply a predictive advantage from node location information
Authors:
Matthew Garrod,
Nick S. Jones
Abstract:
A Random Geometric Graph (RGG) ensemble is defined by the disordered distribution of its node locations. We investigate how this randomness drives sample-to-sample fluctuations in the dynamical properties of these graphs. We study the distributional properties of the algebraic connectivity which is informative of diffusion and synchronization timescales in graphs. We use numerical simulations to p…
▽ More
A Random Geometric Graph (RGG) ensemble is defined by the disordered distribution of its node locations. We investigate how this randomness drives sample-to-sample fluctuations in the dynamical properties of these graphs. We study the distributional properties of the algebraic connectivity which is informative of diffusion and synchronization timescales in graphs. We use numerical simulations to provide the first characterisation of the algebraic connectivity distribution for RGG ensembles. We find that the algebraic connectivity can show fluctuations relative to its mean on the order of $30 \%$, even for relatively large RGG ensembles ($N=10^5$). We explore the factors driving these fluctuations for RGG ensembles with different choices of dimensionality, boundary conditions and node distributions. Within a given ensemble, the algebraic connectivity can covary with the minimum degree and can also be affected by the presence of density inhomogeneities in the nodal distribution. We also derive a closed-form expression for the expected algebraic connectivity for RGGs with periodic boundary conditions for general dimension.
△ Less
Submitted 2 November, 2018; v1 submitted 17 May, 2018;
originally announced May 2018.
-
Co-occurrence simplicial complexes in mathematics: identifying the holes of knowledge
Authors:
Vsevolod Salnikov,
Daniele Cassese,
Renaud Lambiotte,
Nick S. Jones
Abstract:
In the last years complex networks tools contributed to provide insights on the structure of research, through the study of collaboration, citation and co-occurrence networks. The network approach focuses on pairwise relationships, often compressing multidimensional data structures and inevitably losing information. In this paper we propose for the first time a simplicial complex approach to word…
▽ More
In the last years complex networks tools contributed to provide insights on the structure of research, through the study of collaboration, citation and co-occurrence networks. The network approach focuses on pairwise relationships, often compressing multidimensional data structures and inevitably losing information. In this paper we propose for the first time a simplicial complex approach to word co-occurrences, providing a natural framework for the study of higher-order relations in the space of scientific knowledge. Using topological methods we explore the conceptual landscape of mathematical research, focusing on homological holes, regions with low connectivity in the simplicial structure. We find that homological holes are ubiquitous, which suggests that they capture some essential feature of research practice in mathematics. Holes die when a subset of their concepts appear in the same article, hence their death may be a sign of the creation of new knowledge, as we show with some examples. We find a positive relation between the dimension of a hole and the time it takes to be closed: larger holes may represent potential for important advances in the field because they separate conceptually distant areas. We also show that authors' conceptual entropy is positively related with their contribution to homological holes, suggesting that polymaths tend to be on the frontier of research.
△ Less
Submitted 11 March, 2018;
originally announced March 2018.
-
Automatic time-series phenoty** using massive feature extraction
Authors:
Ben D Fulcher,
Nick S Jones
Abstract:
Across a far-reaching diversity of scientific and industrial applications, a general key problem involves relating the structure of time-series data to a meaningful outcome, such as detecting anomalous events from sensor recordings, or diagnosing patients from physiological time-series measurements like heart rate or brain activity. Currently, researchers must devote considerable effort manually d…
▽ More
Across a far-reaching diversity of scientific and industrial applications, a general key problem involves relating the structure of time-series data to a meaningful outcome, such as detecting anomalous events from sensor recordings, or diagnosing patients from physiological time-series measurements like heart rate or brain activity. Currently, researchers must devote considerable effort manually devising, or searching for, properties of their time series that are suitable for the particular analysis problem at hand. Addressing this non-systematic and time-consuming procedure, here we introduce a new tool, hctsa, that selects interpretable and useful properties of time series automatically, by comparing implementations over 7700 time-series features drawn from diverse scientific literatures. Using two exemplar biological applications, we show how hctsa allows researchers to leverage decades of time-series research to quantify and understand informative structure in their time-series data.
△ Less
Submitted 15 December, 2016;
originally announced December 2016.
-
Looplessness in networks is linked to trophic coherence
Authors:
Samuel Johnson,
Nick S. Jones
Abstract:
Many natural, complex systems are remarkably stable thanks to an absence of feedback acting on their elements. When described as networks, these exhibit few or no cycles, and associated matrices have small leading eigenvalues. It has been suggested that this architecture can confer advantages to the system as a whole, such as `qualitative stability', but this observation does not in itself explain…
▽ More
Many natural, complex systems are remarkably stable thanks to an absence of feedback acting on their elements. When described as networks, these exhibit few or no cycles, and associated matrices have small leading eigenvalues. It has been suggested that this architecture can confer advantages to the system as a whole, such as `qualitative stability', but this observation does not in itself explain how a loopless structure might arise. We show here that the number of feedback loops in a network, as well as the eigenvalues of associated matrices, are determined by a structural property called trophic coherence, a measure of how neatly nodes fall into distinct levels. Our theory correctly classifies a variety of networks -- including those derived from genes, metabolites, species, neurons, words, computers and trading nations -- into two distinct regimes of high and low feedback, and provides a null model to gauge the significance of related magnitudes. Since trophic coherence suppresses feedback, whereas an absence of feedback alone does not lead to coherence, our work suggests that the reasons for `looplessness' in nature should be sought in coherence-inducing mechanisms.
△ Less
Submitted 30 May, 2017; v1 submitted 20 May, 2015;
originally announced May 2015.
-
Highly comparative feature-based time-series classification
Authors:
Ben D. Fulcher,
Nick S. Jones
Abstract:
A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity…
▽ More
A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large datasets containing long time series or time series of different lengths. For many of the datasets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using Euclidean distances and dynamic time war** and, most importantly, the features selected provide an understanding of the properties of the dataset, insight that can guide further scientific investigation.
△ Less
Submitted 8 May, 2014; v1 submitted 15 January, 2014;
originally announced January 2014.
-
How modular structure can simplify tasks on networks
Authors:
Binh-Minh Bui-Xuan,
Nick S. Jones
Abstract:
By considering the task of finding the shortest walk through a network we find an algorithm for which the run time is not as O(2^n), with n being the number of nodes, but instead scales with the number of nodes in a coarsened network. This coarsened network has a number of nodes related to the number of dense regions in the original graph. Since we exploit a form of local community detection as a…
▽ More
By considering the task of finding the shortest walk through a network we find an algorithm for which the run time is not as O(2^n), with n being the number of nodes, but instead scales with the number of nodes in a coarsened network. This coarsened network has a number of nodes related to the number of dense regions in the original graph. Since we exploit a form of local community detection as a preprocessing, this work gives support to the project of develo** heuristic algorithms for detecting dense regions in networks: preprocessing of this kind can accelerate optimization tasks on networks. Our work also suggests a class of empirical conjectures for how structural features of efficient networked systems might scale with system size.
△ Less
Submitted 21 May, 2013;
originally announced May 2013.
-
Highly comparative time-series analysis: The empirical structure of time series and their methods
Authors:
Ben D. Fulcher,
Max A. Little,
Nick S. Jones
Abstract:
The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording, and analyzing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 re…
▽ More
The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording, and analyzing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 real-world and model-generated time series and over 9000 time-series analysis algorithms are analyzed in this work. We introduce reduced representations of both time series, in terms of their properties measured by diverse scientific methods, and of time-series analysis methods, in terms of their behaviour on empirical time series, and use them to organize these interdisciplinary resources. This new approach to comparing across diverse scientific data and methods allows us to organize time-series datasets automatically according to their properties, retrieve alternatives to particular analysis methods developed in other scientific disciplines, and automate the selection of useful methods for time-series classification and regression tasks. The broad scientific utility of these tools is demonstrated on datasets of electroencephalograms, self-affine time series, heart beat intervals, speech signals, and others, in each case contributing novel analysis techniques to the existing literature. Highly comparative techniques that compare across an interdisciplinary literature can thus be used to guide more focused research in time-series analysis for applications across the scientific disciplines.
△ Less
Submitted 3 April, 2013;
originally announced April 2013.
-
Mitochondrial Variability as a Source of Extrinsic Cellular Noise
Authors:
Iain G. Johnston,
Bernadett Gaal,
Ricardo Pires das Neves,
Tariq Enver,
Francisco J. Iborra,
Nick S. Jones
Abstract:
We present a study investigating the role of mitochondrial variability in generating noise in eukaryotic cells. Noise in cellular physiology plays an important role in many fundamental cellular processes, including transcription, translation, stem cell differentiation and response to medication, but the specific random influences that affect these processes have yet to be clearly elucidated. Here…
▽ More
We present a study investigating the role of mitochondrial variability in generating noise in eukaryotic cells. Noise in cellular physiology plays an important role in many fundamental cellular processes, including transcription, translation, stem cell differentiation and response to medication, but the specific random influences that affect these processes have yet to be clearly elucidated. Here we present a mechanism by which variability in mitochondrial volume and functionality, along with cell cycle dynamics, is linked to variability in transcription rate and hence has a profound effect on downstream cellular processes. Our model mechanism is supported by an appreciable volume of recent experimental evidence, and we present the results of several new experiments with which our model is also consistent. We find that noise due to mitochondrial variability can sometimes dominate over other extrinsic noise sources (such as cell cycle asynchronicity) and can significantly affect large-scale observable properties such as cell cycle length and gene expression levels. We also explore two recent regulatory network-based models for stem cell differentiation, and find that extrinsic noise in transcription rate causes appreciable variability in the behaviour of these model systems. These results suggest that mitochondrial and transcriptional variability may be an important mechanism influencing a large variety of cellular processes and properties.
△ Less
Submitted 1 December, 2011; v1 submitted 22 July, 2011;
originally announced July 2011.
-
Advection, diffusion and delivery over a network
Authors:
Luke L. M. Heaton,
Eduardo Lopez,
Philip K. Maini,
Mark D. Fricker,
Nick S. Jones
Abstract:
Many biological, geophysical and technological systems involve the transport of resource over a network. In this paper we present an algorithm for calculating the exact concentration of resource at any point in space or time, given that the resource in the network is lost or delivered out of the network at a given rate, while being subject to advection and diffusion. We consider the implications o…
▽ More
Many biological, geophysical and technological systems involve the transport of resource over a network. In this paper we present an algorithm for calculating the exact concentration of resource at any point in space or time, given that the resource in the network is lost or delivered out of the network at a given rate, while being subject to advection and diffusion. We consider the implications of advection, diffusion and delivery for simple models of glucose delivery through a vascular network, and conclude that in certain circumstances, increasing the volume of blood and the number of glucose transporters can actually decrease the total rate of glucose delivery. We also consider the case of empirically determined fungal networks, and analyze the distribution of resource that emerges as such networks grow over time. Fungal growth involves the expansion of fluid filled vessels, which necessarily involves the movement of fluid. In three empirically determined fungal networks we found that the minimum currents consistent with the observed growth would effectively transport resource throughout the network over the time-scale of growth. This suggests that in foraging fungi, the active transport mechanisms observed in the growing tips may not be required for long range transport.
△ Less
Submitted 10 May, 2011; v1 submitted 9 May, 2011;
originally announced May 2011.
-
Generalized Methods and Solvers for Noise Removal from Piecewise Constant Signals
Authors:
Max A. Little,
Nick S. Jones
Abstract:
Removing noise from piecewise constant (PWC) signals, is a challenging signal processing problem arising in many practical contexts. For example, in exploration geosciences, noisy drill hole records need separating into stratigraphic zones, and in biophysics, jumps between molecular dwell states need extracting from noisy fluorescence microscopy signals. Many PWC denoising methods exist, including…
▽ More
Removing noise from piecewise constant (PWC) signals, is a challenging signal processing problem arising in many practical contexts. For example, in exploration geosciences, noisy drill hole records need separating into stratigraphic zones, and in biophysics, jumps between molecular dwell states need extracting from noisy fluorescence microscopy signals. Many PWC denoising methods exist, including total variation regularization, mean shift clustering, stepwise jump placement, running medians, convex clustering shrinkage and bilateral filtering; conventional linear signal processing methods are fundamentally unsuited however. This paper shows that most of these methods are associated with a special case of a generalized functional, minimized to achieve PWC denoising. The minimizer can be obtained by diverse solver algorithms, including stepwise jump placement, convex programming, finite differences, iterated running medians, least angle regression, regularization path following, and coordinate descent. We introduce novel PWC denoising methods, which, for example, combine global mean shift clustering with local total variation smoothing. Head-to-head comparisons between these methods are performed on synthetic data, revealing that our new methods have a useful role to play. Finally, overlaps between the methods of this paper and others such as wavelet shrinkage, hidden Markov models, and piecewise smooth filtering are touched on.
△ Less
Submitted 4 January, 2011; v1 submitted 22 December, 2010;
originally announced December 2010.
-
Temporal Evolution of Financial Market Correlations
Authors:
Daniel J. Fenn,
Mason A. Porter,
Stacy Williams,
Mark McDonald,
Neil F. Johnson,
Nick S. Jones
Abstract:
We investigate financial market correlations using random matrix theory and principal component analysis. We use random matrix theory to demonstrate that correlation matrices of asset price changes contain structure that is incompatible with uncorrelated random price changes. We then identify the principal components of these correlation matrices and demonstrate that a small number of components a…
▽ More
We investigate financial market correlations using random matrix theory and principal component analysis. We use random matrix theory to demonstrate that correlation matrices of asset price changes contain structure that is incompatible with uncorrelated random price changes. We then identify the principal components of these correlation matrices and demonstrate that a small number of components accounts for a large proportion of the variability of the markets that we consider. We then characterize the time-evolving relationships between the different assets by investigating the correlations between the asset price time series and principal components. Using this approach, we uncover notable changes that occurred in financial markets and identify the assets that were significantly affected by these changes. We show in particular that there was an increase in the strength of the relationships between several different markets following the 2007--2008 credit and liquidity crisis.
△ Less
Submitted 23 May, 2011; v1 submitted 14 November, 2010;
originally announced November 2010.
-
Taxonomies of Networks
Authors:
Jukka-Pekka Onnela,
Daniel J. Fenn,
Stephen Reid,
Mason A. Porter,
Peter J. Mucha,
Mark D. Fricker,
Nick S. Jones
Abstract:
The study of networks has grown into a substantial interdisciplinary endeavour that encompasses myriad disciplines in the natural, social, and information sciences. Here we introduce a framework for constructing taxonomies of networks based on their structural similarities. These networks can arise from any of numerous sources: they can be empirical or synthetic, they can arise from multiple reali…
▽ More
The study of networks has grown into a substantial interdisciplinary endeavour that encompasses myriad disciplines in the natural, social, and information sciences. Here we introduce a framework for constructing taxonomies of networks based on their structural similarities. These networks can arise from any of numerous sources: they can be empirical or synthetic, they can arise from multiple realizations of a single process, empirical or synthetic, or they can represent entirely different systems in different disciplines. Since the mesoscopic properties of networks are hypothesized to be important for network function, we base our comparisons on summaries of network community structures. While we use a specific method for uncovering network communities, much of the introduced framework is independent of that choice. After introducing the framework, we apply it to construct a taxonomy for 746 individual networks and demonstrate that our approach usefully identifies similar networks. We also construct taxonomies within individual categories of networks, and in each case we expose non-trivial structure. For example we create taxonomies for similarity networks constructed from both political voting data and financial data. We also construct network taxonomies to compare the social structures of 100 Facebook networks and the growth structures produced by different types of fungi.
△ Less
Submitted 18 May, 2012; v1 submitted 29 June, 2010;
originally announced June 2010.
-
Growth-induced mass flows in fungal networks
Authors:
Luke Heaton,
Eduardo Lopez,
Philip K. Maini,
Mark D. Fricker,
Nick S. Jones
Abstract:
Cord-forming fungi form extensive networks that continuously adapt to maintain an efficient transport system. As osmotically driven water uptake is often distal from the tips, and aqueous fluids are incompressible, we propose that growth induces mass flows across the mycelium, whether or not there are intrahyphal concentration gradients. We imaged the temporal evolution of networks formed by Phane…
▽ More
Cord-forming fungi form extensive networks that continuously adapt to maintain an efficient transport system. As osmotically driven water uptake is often distal from the tips, and aqueous fluids are incompressible, we propose that growth induces mass flows across the mycelium, whether or not there are intrahyphal concentration gradients. We imaged the temporal evolution of networks formed by Phanerochaete velutina, and at each stage calculated the unique set of currents that account for the observed changes in cord volume, while minimising the work required to overcome viscous drag. Predicted speeds were in reasonable agreement with experimental data, and the pressure gradients needed to produce these flows are small. Furthermore, cords that were predicted to carry fast-moving or large currents were significantly more likely to increase in size than cords with slow-moving or small currents. The incompressibility of the fluids within fungi means there is a rapid global response to local fluid movements. Hence velocity of fluid flow is a local signal that conveys quasi-global information about the role of a cord within the mycelium. We suggest that fluid incompressibility and the coupling of growth and mass flow are critical physical features that enable the development of efficient, adaptive, biological transport networks.
△ Less
Submitted 28 May, 2010;
originally announced May 2010.
-
Evolutionary Inference for Function-valued Traits: Gaussian Process Regression on Phylogenies
Authors:
Nick S. Jones,
John Moriarty
Abstract:
Biological data objects often have both of the following features: (i) they are functions rather than single numbers or vectors, and (ii) they are correlated due to phylogenetic relationships. In this paper we give a flexible statistical model for such data, by combining assumptions from phylogenetics with Gaussian processes. We describe its use as a nonparametric Bayesian prior distribution, both…
▽ More
Biological data objects often have both of the following features: (i) they are functions rather than single numbers or vectors, and (ii) they are correlated due to phylogenetic relationships. In this paper we give a flexible statistical model for such data, by combining assumptions from phylogenetics with Gaussian processes. We describe its use as a nonparametric Bayesian prior distribution, both for prediction (placing posterior distributions on ancestral functions) and model selection (comparing rates of evolution across a phylogeny, or identifying the most likely phylogenies consistent with the observed data). Our work is integrative, extending the popular phylogenetic Brownian Motion and Ornstein-Uhlenbeck models to functional data and Bayesian inference, and extending Gaussian Process regression to phylogenies. We provide a brief illustration of the application of our method.
△ Less
Submitted 3 August, 2012; v1 submitted 26 April, 2010;
originally announced April 2010.
-
Steps and bumps: precision extraction of discrete states of molecular machines using physically-based, high-throughput time series analysis
Authors:
Max A. Little,
Bradley C. Steel,
Fan Bai,
Yoshiyuki Sowa,
Thomas Bilyard,
David M. Mueller,
Richard M. Berry,
Nick S. Jones
Abstract:
We report new statistical time-series analysis tools providing significant improvements in the rapid, precision extraction of discrete state dynamics from large databases of experimental observations of molecular machines. By building physical knowledge and statistical innovations into analysis tools, we demonstrate new techniques for recovering discrete state transitions buried in highly correlat…
▽ More
We report new statistical time-series analysis tools providing significant improvements in the rapid, precision extraction of discrete state dynamics from large databases of experimental observations of molecular machines. By building physical knowledge and statistical innovations into analysis tools, we demonstrate new techniques for recovering discrete state transitions buried in highly correlated molecular noise. We demonstrate the effectiveness of our approach on simulated and real examples of step-like rotation of the bacterial flagellar motor and the F1-ATPase enzyme. We show that our method can clearly identify molecular steps, symmetries and cascaded processes that are too weak for existing algorithms to detect, and can do so much faster than existing algorithms. Our techniques represent a major advance in the drive towards automated, precision, highthroughput studies of molecular machine dynamics. Modular, open-source software that implements these techniques is provided at http://www.eng.ox.ac.uk/samp/members/max/software/
△ Less
Submitted 7 April, 2010;
originally announced April 2010.
-
Revisiting Date and Party Hubs: Novel Approaches to Role Assignment in Protein Interaction Networks
Authors:
Sumeet Agarwal,
Charlotte M. Deane,
Mason A. Porter,
Nick S. Jones
Abstract:
The idea of 'date' and 'party' hubs has been influential in the study of protein-protein interaction networks. Date hubs display low co-expression with their partners, whilst party hubs have high co-expression. It was proposed that party hubs are local coordinators whereas date hubs are global connectors. Here we show that the reported importance of date hubs to network connectivity can in fact be…
▽ More
The idea of 'date' and 'party' hubs has been influential in the study of protein-protein interaction networks. Date hubs display low co-expression with their partners, whilst party hubs have high co-expression. It was proposed that party hubs are local coordinators whereas date hubs are global connectors. Here we show that the reported importance of date hubs to network connectivity can in fact be attributed to a tiny subset of them. Crucially, these few, extremely central, hubs do not display particularly low expression correlation, undermining the idea of a link between this quantity and hub function. The date/party distinction was originally motivated by an approximately bimodal distribution of hub co-expression; we show that this feature is not always robust to methodological changes. Additionally, topological properties of hubs do not in general correlate with co-expression. Thus, we suggest that a date/party dichotomy is not meaningful and it might be more useful to conceive of roles for protein-protein interactions rather than individual proteins. We find significant correlations between interaction centrality and the functional similarity of the interacting proteins.
△ Less
Submitted 5 May, 2010; v1 submitted 2 November, 2009;
originally announced November 2009.
-
Dynamical Clustering of Exchange Rates
Authors:
Daniel J. Fenn,
Mason A. Porter,
Peter J. Mucha,
Mark McDonald,
Stacy Williams,
Neil F. Johnson,
Nick S. Jones
Abstract:
We use techniques from network science to study correlations in the foreign exchange (FX) market over the period 1991--2008. We consider an FX market network in which each node represents an exchange rate and each weighted edge represents a time-dependent correlation between the rates. To provide insights into the clustering of the exchange rate time series, we investigate dynamic communities in…
▽ More
We use techniques from network science to study correlations in the foreign exchange (FX) market over the period 1991--2008. We consider an FX market network in which each node represents an exchange rate and each weighted edge represents a time-dependent correlation between the rates. To provide insights into the clustering of the exchange rate time series, we investigate dynamic communities in the network. We show that there is a relationship between an exchange rate's functional role within the market and its position within its community and use a node-centric community analysis to track the time dynamics of this role. This reveals which exchange rates dominate the market at particular times and also identifies exchange rates that experienced significant changes in market role. We also use the community dynamics to uncover major structural changes that occurred in the FX market. Our techniques are general and will be similarly useful for investigating correlations in other markets.
△ Less
Submitted 12 April, 2010; v1 submitted 29 May, 2009;
originally announced May 2009.
-
The Function of Communities in Protein Interaction Networks at Multiple Scales
Authors:
Anna C. F. Lewis,
Nick S. Jones,
Mason A. Porter,
Charlotte M. Deane
Abstract:
Background: If biology is modular then clusters, or communities, of proteins derived using only protein interaction network structure should define protein modules with similar biological roles. We investigate the link between biological modules and network communities in yeast and its relationship to the scale at which we probe the network.
Results: Our results demonstrate that the functional…
▽ More
Background: If biology is modular then clusters, or communities, of proteins derived using only protein interaction network structure should define protein modules with similar biological roles. We investigate the link between biological modules and network communities in yeast and its relationship to the scale at which we probe the network.
Results: Our results demonstrate that the functional homogeneity of communities depends on the scale selected, and that almost all proteins lie in a functionally homogeneous community at some scale. We judge functional homogeneity using a novel test and three independent characterizations of protein function, and find a high degree of overlap between these measures. We show that a high mean clustering coefficient of a community can be used to identify those that are functionally homogeneous. By tracing the community membership of a protein through multiple scales we demonstrate how our approach could be useful to biologists focusing on a particular protein.
Conclusions: We show that there is no one scale of interest in the community structure of the yeast protein interaction network, but we can identify the range of resolution parameters that yield the most functionally coherent communities, and predict which communities are most likely to be functionally homogeneous.
△ Less
Submitted 12 March, 2010; v1 submitted 6 April, 2009;
originally announced April 2009.
-
Using the Memories of Multiscale Machines to Characterize Complex Systems
Authors:
Nick S. Jones
Abstract:
A scheme is presented to extract detailed dynamical signatures from successive measurements of complex systems. Relative entropy based time series tools are used to quantify the gain in predictive power of increasing past knowledge. By lossy compression, data is represented by increasingly coarsened symbolic strings. Each compression resolution is modeled by a machine: a finite memory transition…
▽ More
A scheme is presented to extract detailed dynamical signatures from successive measurements of complex systems. Relative entropy based time series tools are used to quantify the gain in predictive power of increasing past knowledge. By lossy compression, data is represented by increasingly coarsened symbolic strings. Each compression resolution is modeled by a machine: a finite memory transition matrix. Applying the relative entropy tools to each machine's memory exposes correlations within many time scales. Examples are given for cardiac arrhythmias and different heart conditions are distinguished.
△ Less
Submitted 30 December, 2008;
originally announced December 2008.
-
Dynamic communities in multichannel data: An application to the foreign exchange market during the 2007--2008 credit crisis
Authors:
Daniel J. Fenn,
Mason A. Porter,
Mark McDonald,
Stacy Williams,
Neil F. Johnson,
Nick S. Jones
Abstract:
We study the cluster dynamics of multichannel (multivariate) time series by representing their correlations as time-dependent networks and investigating the evolution of network communities. We employ a node-centric approach that allows us to track the effects of the community evolution on the functional roles of individual nodes without having to track entire communities. As an example, we cons…
▽ More
We study the cluster dynamics of multichannel (multivariate) time series by representing their correlations as time-dependent networks and investigating the evolution of network communities. We employ a node-centric approach that allows us to track the effects of the community evolution on the functional roles of individual nodes without having to track entire communities. As an example, we consider a foreign exchange market network in which each node represents an exchange rate and each edge represents a time-dependent correlation between the rates. We study the period 2005-2008, which includes the recent credit and liquidity crisis. Using dynamical community detection, we find that exchange rates that are strongly attached to their community are persistently grouped with the same set of rates, whereas exchange rates that are important for the transfer of information tend to be positioned on the edges of communities. Our analysis successfully uncovers major trading changes that occurred in the market during the credit crisis.
△ Less
Submitted 1 July, 2009; v1 submitted 24 November, 2008;
originally announced November 2008.
-
Master-equation analysis of accelerating networks
Authors:
David M. D. Smith,
Jukka-Pekka Onnela,
Nick S. Jones
Abstract:
In many real-world networks, the rates of node and link addition are time dependent. This observation motivates the definition of accelerating networks. There has been relatively little investigation of accelerating networks and previous efforts at analyzing their degree distributions have employed mean-field techniques. By contrast, we show that it is possible to apply a master-equation approac…
▽ More
In many real-world networks, the rates of node and link addition are time dependent. This observation motivates the definition of accelerating networks. There has been relatively little investigation of accelerating networks and previous efforts at analyzing their degree distributions have employed mean-field techniques. By contrast, we show that it is possible to apply a master-equation approach to such network development. We provide full time-dependent expressions for the evolution of the degree distributions for the canonical situations of random and preferential attachment in networks undergoing constant acceleration. These results are in excellent agreement with results obtained from simulations. We note that a growing, non-equilibrium network undergoing constant acceleration with random attachment is equivalent to a classical random graph, bridging the gap between non-equilibrium and classical equilibrium networks.
△ Less
Submitted 6 April, 2009; v1 submitted 22 October, 2008;
originally announced October 2008.
-
Rapidly detecting disorder in rhythmic biological signals: A spectral entropy measure to identify cardiac arrhythmias
Authors:
Phillip P. A. Staniczenko,
Chiu Fan Lee,
Nick S. Jones
Abstract:
We consider the use of a running measure of power spectrum disorder to distinguish between the normal sinus rhythm of the heart and two forms of cardiac arrhythmia: atrial fibrillation and atrial flutter. This spectral entropy measure is motivated by characteristic differences in the spectra of beat timings during the three rhythms. We plot patient data derived from ten-beat windows on a "disord…
▽ More
We consider the use of a running measure of power spectrum disorder to distinguish between the normal sinus rhythm of the heart and two forms of cardiac arrhythmia: atrial fibrillation and atrial flutter. This spectral entropy measure is motivated by characteristic differences in the spectra of beat timings during the three rhythms. We plot patient data derived from ten-beat windows on a "disorder map" and identify rhythm-defining ranges in the level and variance of spectral entropy values. Employing the spectral entropy within an automatic arrhythmia detection algorithm enables the classification of periods of atrial fibrillation from the time series of patients' beats. When the algorithm is set to identify abnormal rhythms within 6 s it agrees with 85.7% of the annotations of professional rhythm assessors; for a response time of 30 s this becomes 89.5%, and with 60 s it is 90.3%. The algorithm provides a rapid way to detect atrial fibrillation, demonstrating usable response times as low as 6 s. Measures of disorder in the frequency domain have practical significance in a range of biological signals: the techniques described in this paper have potential application for the rapid identification of disorder in other rhythmic signals.
△ Less
Submitted 16 February, 2009; v1 submitted 8 October, 2008;
originally announced October 2008.