-
A Comparative Analysis of Gene Expression Profiling by Statistical and Machine Learning Approaches
Authors:
Myriam Bontonou,
Anaïs Haget,
Maria Boulougouri,
Benjamin Audit,
Pierre Borgnat,
Jean-Michel Arbona
Abstract:
Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can potentially provide some understanding of phenotypes by extracting explanations for their decisions. These explanations often take the form of a list of genes ranked in order of importance for the predictions, the highest-ranked genes being inter…
▽ More
Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can potentially provide some understanding of phenotypes by extracting explanations for their decisions. These explanations often take the form of a list of genes ranked in order of importance for the predictions, the highest-ranked genes being interpreted as linked to the phenotype. We discuss the biological and the methodological limitations of such explanations. Experiments are performed on several datasets gathering cancer and healthy tissue samples from the TCGA, GTEx and TARGET databases. A collection of machine learning models including logistic regression, multilayer perceptron, and graph neural network are trained to classify samples according to their cancer type. Gene rankings are obtained from explainability methods adapted to these models, and compared to the ones from classical statistical feature selection methods such as mutual information, DESeq2, and EdgeR. Interestingly, on simple tasks, we observe that the information learned by black-box neural networks is related to the notion of differential expression. In all cases, a small set containing the best-ranked genes is sufficient to achieve a good classification. However, these genes differ significantly between the methods and similar classification performance can be achieved with numerous lower ranked genes. In conclusion, although these methods enable the identification of biomarkers characteristic of certain pathologies, our results question the completeness of the selected gene sets and thus of explainability by the identification of the underlying biological processes.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Mosaic benchmark networks: Modular link streams for testing dynamic community detection algorithms
Authors:
Yasaman Asgari,
Remy Cazabet,
Pierre Borgnat
Abstract:
Community structure is a critical feature of real networks, providing insights into nodes' internal organization. Nowadays, with the availability of highly detailed temporal networks such as link streams, studying community structures becomes more complex due to increased data precision and time sensitivity. Despite numerous algorithms developed in the past decade for dynamic community discovery,…
▽ More
Community structure is a critical feature of real networks, providing insights into nodes' internal organization. Nowadays, with the availability of highly detailed temporal networks such as link streams, studying community structures becomes more complex due to increased data precision and time sensitivity. Despite numerous algorithms developed in the past decade for dynamic community discovery, assessing their performance on link streams remains a challenge. Synthetic benchmark graphs are a well-accepted approach for evaluating static community detection algorithms. Additionally, there have been some proposals for slowly evolving communities in low-resolution temporal networks like snapshots. Nevertheless, this approach is not yet suitable for link streams. To bridge this gap, we introduce a novel framework that generates synthetic modular link streams with predefined communities. Subsequently, we evaluate established dynamic community detection methods to uncover limitations that may not be evident in snapshots with slowly evolving communities. While no method emerges as a clear winner, we observe notable differences among them.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Temporal network compression via network hashing
Authors:
Rémi Vaudaine,
Pierre Borgnat,
Paulo Goncalves,
Rémi Gribonval,
Márton Karsai
Abstract:
Pairwise temporal interactions between entities can be represented as temporal networks, which code the propagation of processes such as epidemic spreading or information cascades, evolving on top of them. The largest outcome of these processes is directly linked to the structure of the underlying network. Indeed, a node of a network at given time cannot affect more nodes in the future than it can…
▽ More
Pairwise temporal interactions between entities can be represented as temporal networks, which code the propagation of processes such as epidemic spreading or information cascades, evolving on top of them. The largest outcome of these processes is directly linked to the structure of the underlying network. Indeed, a node of a network at given time cannot affect more nodes in the future than it can reach via time-respecting paths. This set of nodes reachable from a source defines an out-component, which identification is costly. In this paper, we propose an efficient matrix algorithm to tackle this issue and show that it outperforms other state-of-the-art methods. Secondly, we propose a hashing framework to coarsen large temporal networks into smaller proxies on which out-components are easier to estimate, and then recombined to obtain the initial components. Our graph hashing solution has implications in privacy respecting representation of temporal networks.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Studying Limits of Explainability by Integrated Gradients for Gene Expression Models
Authors:
Myriam Bontonou,
Anaïs Haget,
Maria Boulougouri,
Jean-Michel Arbona,
Benjamin Audit,
Pierre Borgnat
Abstract:
Understanding the molecular processes that drive cellular life is a fundamental question in biological research. Ambitious programs have gathered a number of molecular datasets on large populations. To decipher the complex cellular interactions, recent work has turned to supervised machine learning methods. The scientific questions are formulated as classical learning problems on tabular data or o…
▽ More
Understanding the molecular processes that drive cellular life is a fundamental question in biological research. Ambitious programs have gathered a number of molecular datasets on large populations. To decipher the complex cellular interactions, recent work has turned to supervised machine learning methods. The scientific questions are formulated as classical learning problems on tabular data or on graphs, e.g. phenotype prediction from gene expression data. In these works, the input features on which the individual predictions are predominantly based are often interpreted as indicative of the cause of the phenotype, such as cancer identification. Here, we propose to explore the relevance of the biomarkers identified by Integrated Gradients, an explainability method for feature attribution in machine learning. Through a motivating example on The Cancer Genome Atlas, we show that ranking features by importance is not enough to robustly identify biomarkers. As it is difficult to evaluate whether biomarkers reflect relevant causes without known ground truth, we simulate gene expression data by proposing a hierarchical model based on Latent Dirichlet Allocation models. We also highlight good practices for evaluating explanations for genomics data and propose a direction to derive more insights from these explanations.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Clustering with Simplicial Complexes
Authors:
Thummaluru Siddartha Reddy,
Sundeep Prabhakar Chepuri,
Pierre Borgnat
Abstract:
In this work, we propose a new clustering algorithm to group nodes in networks based on second-order simplices (aka filled triangles) to leverage higher-order network interactions. We define a simplicial conductance function, which on minimizing, yields an optimal partition with a higher density of filled triangles within the set while the density of filled triangles is smaller across the sets. To…
▽ More
In this work, we propose a new clustering algorithm to group nodes in networks based on second-order simplices (aka filled triangles) to leverage higher-order network interactions. We define a simplicial conductance function, which on minimizing, yields an optimal partition with a higher density of filled triangles within the set while the density of filled triangles is smaller across the sets. To this end, we propose a simplicial adjacency operator that captures the relation between the nodes through second-order simplices. This allows us to extend the well-known Cheeger inequality to cluster a simplicial complex. Then, leveraging the Cheeger inequality, we propose the simplicial spectral clustering algorithm. We report results from numerical experiments on synthetic and real-world network data to demonstrate the efficacy of the proposed approach.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
A Simple Way to Learn Metrics Between Attributed Graphs
Authors:
Yacouba Kaloga,
Pierre Borgnat,
Amaury Habrard
Abstract:
The choice of good distances and similarity measures between objects is important for many machine learning methods. Therefore, many metric learning algorithms have been developed in recent years, mainly for Euclidean data in order to improve performance of classification or clustering methods. However, due to difficulties in establishing computable, efficient and differentiable distances between…
▽ More
The choice of good distances and similarity measures between objects is important for many machine learning methods. Therefore, many metric learning algorithms have been developed in recent years, mainly for Euclidean data in order to improve performance of classification or clustering methods. However, due to difficulties in establishing computable, efficient and differentiable distances between attributed graphs, few metric learning algorithms adapted to graphs have been developed despite the strong interest of the community. In this paper, we address this issue by proposing a new Simple Graph Metric Learning - SGML - model with few trainable parameters based on Simple Graph Convolutional Neural Networks - SGCN - and elements of Optimal Transport theory. This model allows us to build an appropriate distance from a database of labeled (attributed) graphs to improve the performance of simple classification algorithms such as $k$-NN. This distance can be quickly trained while maintaining good performances as illustrated by the experimental study presented in this paper.
△ Less
Submitted 21 December, 2022; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Probabilistic forecasts of extreme heatwaves using convolutional neural networks in a regime of lack of data
Authors:
George Miloshevich,
Bastien Cozian,
Patrice Abry,
Pierre Borgnat,
Freddy Bouchet
Abstract:
Understanding extreme events and their probability is key for the study of climate change impacts, risk assessment, adaptation, and the protection of living beings. Forecasting the occurrence probability of extreme heatwaves is a primary challenge for risk assessment and attribution, but also for fundamental studies about processes, dataset and model validation, and climate change studies. In this…
▽ More
Understanding extreme events and their probability is key for the study of climate change impacts, risk assessment, adaptation, and the protection of living beings. Forecasting the occurrence probability of extreme heatwaves is a primary challenge for risk assessment and attribution, but also for fundamental studies about processes, dataset and model validation, and climate change studies. In this work we develop a methodology to build forecasting models which are based on convolutional neural networks, trained on extremely long climate model outputs. We demonstrate that neural networks have positive predictive skills, with respect to random climatological forecasts, for the occurrence of long-lasting 14-day heatwaves over France, up to 15 days ahead of time for fast dynamical drivers (500 hPa geopotential height fields), and also at much longer lead times for slow physical drivers (soil moisture). This forecast is made seamlessly in time and space, for fast hemispheric and slow local drivers. We find that the neural network selects extreme heatwaves associated with a North-Hemisphere wavenumber-3 pattern. The main scientific message is that most of the time, training neural networks for predicting extreme heatwaves occurs in a regime of lack of data. We suggest that this is likely to be the case for most other applications to large scale atmosphere and climate phenomena. For instance, using one hundred years-long training sets, a regime of drastic lack of data, leads to severely lower predictive skills and general inability to extract useful information available in the 500 hPa geopotential height field at a hemispheric scale in contrast to the dataset of several thousand years long. We discuss perspectives for dealing with the lack of data regime, for instance rare event simulations and how transfer learning may play a role in this latter task.
△ Less
Submitted 17 February, 2023; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Fast Multiscale Diffusion on Graphs
Authors:
Sibylle Marcotte,
Amélie Barbe,
Rémi Gribonval,
Titouan Vayer,
Marc Sebban,
Pierre Borgnat,
Paulo Gonçalves
Abstract:
Diffusing a graph signal at multiple scales requires computing the action of the exponential of several multiples of the Laplacian matrix. We tighten a bound on the approximation error of truncated Chebyshev polynomial approximations of the exponential, hence significantly improving a priori estimates of the polynomial order for a prescribed error. We further exploit properties of these approximat…
▽ More
Diffusing a graph signal at multiple scales requires computing the action of the exponential of several multiples of the Laplacian matrix. We tighten a bound on the approximation error of truncated Chebyshev polynomial approximations of the exponential, hence significantly improving a priori estimates of the polynomial order for a prescribed error. We further exploit properties of these approximations to factorize the computation of the action of the diffusion operator over multiple scales, thus reducing drastically its computational cost.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Deep Learning-based Extreme Heatwave Forecast
Authors:
Valérian Jacques-Dumas,
Francesco Ragone,
Pierre Borgnat,
Patrice Abry,
Freddy Bouchet
Abstract:
Because of the impact of extreme heat waves and heat domes on society and biodiversity, their study is a key challenge. We specifically study long-lasting extreme heat waves, which are among the most important for climate impacts. Physics driven weather forecast systems or climate models can be used to forecast their occurrence or predict their probability. The present work explores the use of dee…
▽ More
Because of the impact of extreme heat waves and heat domes on society and biodiversity, their study is a key challenge. We specifically study long-lasting extreme heat waves, which are among the most important for climate impacts. Physics driven weather forecast systems or climate models can be used to forecast their occurrence or predict their probability. The present work explores the use of deep learning architectures, trained using outputs of a climate model, as an alternative strategy to forecast the occurrence of extreme long-lasting heatwaves. This new approach will be useful for several key scientific goals which include the study of climate model statistics, building a quantitative proxy for resampling rare events in climate models, study the impact of climate change, and should eventually be useful for forecasting. Fulfilling these important goals implies addressing issues such as class-size imbalance that is intrinsically associated with rare event prediction, assessing the potential benefits of transfer learning to address the nested nature of extreme events (naturally included in less extreme ones). We train a Convolutional Neural Network, using 1000 years of climate model outputs, with large-class undersampling and transfer learning. From the observed snapshots of the surface temperature and the 500 hPa geopotential height fields, the trained network achieves significant performance in forecasting the occurrence of long-lasting extreme heatwaves. We are able to predict them at three different levels of intensity, and as early as 15 days ahead of the start of the event (30 days ahead of the end of the event).
△ Less
Submitted 13 January, 2022; v1 submitted 17 March, 2021;
originally announced March 2021.
-
Multiview Variational Graph Autoencoders for Canonical Correlation Analysis
Authors:
Yacouba Kaloga,
Pierre Borgnat,
Sundeep Prabhakar Chepuri,
Patrice Abry,
Amaury Habrard
Abstract:
We present a novel multiview canonical correlation analysis model based on a variational approach. This is the first nonlinear model that takes into account the available graph-based geometric constraints while being scalable for processing large scale datasets with multiple views. It is based on an autoencoder architecture with graph convolutional neural network layers. We experiment with our app…
▽ More
We present a novel multiview canonical correlation analysis model based on a variational approach. This is the first nonlinear model that takes into account the available graph-based geometric constraints while being scalable for processing large scale datasets with multiple views. It is based on an autoencoder architecture with graph convolutional neural network layers. We experiment with our approach on classification, clustering, and recommendation tasks on real datasets. The algorithm is competitive with state-of-the-art multiview representation learning techniques.
△ Less
Submitted 4 October, 2021; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Hierarchical and Unsupervised Graph Representation Learning with Loukas's Coarsening
Authors:
Louis Béthune,
Yacouba Kaloga,
Pierre Borgnat,
Aurélien Garivier,
Amaury Habrard
Abstract:
We propose a novel algorithm for unsupervised graph representation learning with attributed graphs. It combines three advantages addressing some current limitations of the literature: i) The model is inductive: it can embed new graphs without re-training in the presence of new data; ii) The method takes into account both micro-structures and macro-structures by looking at the attributed graphs at…
▽ More
We propose a novel algorithm for unsupervised graph representation learning with attributed graphs. It combines three advantages addressing some current limitations of the literature: i) The model is inductive: it can embed new graphs without re-training in the presence of new data; ii) The method takes into account both micro-structures and macro-structures by looking at the attributed graphs at different scales; iii) The model is end-to-end differentiable: it is a building block that can be plugged into deep learning pipelines and allows for back-propagation. We show that combining a coarsening method having strong theoretical guarantees with mutual information maximization suffices to produce high quality embeddings. We evaluate them on classification tasks with common benchmarks of the literature. We show that our algorithm is competitive with state of the art among unsupervised graph representation learning methods.
△ Less
Submitted 17 August, 2020; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Asymptotic control of FWER under Gaussian assumption: application to correlation tests
Authors:
Sophie Achard,
Pierre Borgnat,
Irène Gannaz
Abstract:
In many applications, hypothesis testing is based on an asymptotic distribution of statistics. The aim of this paper is to clarify and extend multiple correction procedures when the statistics are asymptotically Gaussian. We propose a unified framework to prove their asymptotic behavior which is valid in the case of highly correlated tests. We focus on correlation tests where several test statisti…
▽ More
In many applications, hypothesis testing is based on an asymptotic distribution of statistics. The aim of this paper is to clarify and extend multiple correction procedures when the statistics are asymptotically Gaussian. We propose a unified framework to prove their asymptotic behavior which is valid in the case of highly correlated tests. We focus on correlation tests where several test statistics are proposed. All these multiple testing procedures on correlations are shown to control FWER. An extensive simulation study on correlation-based graph estimation highlights finite sample behavior, independence on the sparsity of graphs and dependence on the values of correlations. Empirical evaluation of power provides comparisons of the proposed methods. Finally validation of our procedures is proposed on real dataset of rats brain connectivity measured by fMRI. We confirm our theoretical findings by applying our procedures on a full null hypotheses with data from dead rats. Data on alive rats show the performance of the proposed procedures to correctly identify brain connectivity graphs with controlled errors.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
Solving NMF with smoothness and sparsity constraints using PALM
Authors:
Raimon Fabregat,
Nelly Pustelnik,
Paulo Gonçalves,
Pierre Borgnat
Abstract:
Non-negative matrix factorization is a problem of dimensionality reduction and source separation of data that has been widely used in many fields since it was studied in depth in 1999 by Lee and Seung, including in compression of data, document clustering, processing of audio spectrograms and astronomy. In this work we have adapted a minimization scheme for convex functions with non-differentiable…
▽ More
Non-negative matrix factorization is a problem of dimensionality reduction and source separation of data that has been widely used in many fields since it was studied in depth in 1999 by Lee and Seung, including in compression of data, document clustering, processing of audio spectrograms and astronomy. In this work we have adapted a minimization scheme for convex functions with non-differentiable constraints called PALM to solve the NMF problem with solutions that can be smooth and/or sparse, two properties frequently desired.
△ Less
Submitted 18 March, 2021; v1 submitted 31 October, 2019;
originally announced October 2019.
-
Combining traffic counts and Bluetooth data for link-origin-destination matrix estimation in large urban networks: The Brisbane case study
Authors:
Gabriel Michau,
Nelly Pustelnik,
Pierre Borgnat,
Patrice Abry,
Ashish Bhaskar,
Edward Chung
Abstract:
Origin-Destination matrix estimation is a keystone for traffic representation and analysis. Traditionally estimated thanks to traffic counts, surveys and socio-economic models, recent technological advances permit to rethink the estimation problem. Road user identification technologies, such as connected GPS, Bluetooth or Wifi detectors bring additional information, that is, for a fraction of the…
▽ More
Origin-Destination matrix estimation is a keystone for traffic representation and analysis. Traditionally estimated thanks to traffic counts, surveys and socio-economic models, recent technological advances permit to rethink the estimation problem. Road user identification technologies, such as connected GPS, Bluetooth or Wifi detectors bring additional information, that is, for a fraction of the users, the origin, the destination and to some extend the itinerary taken. In the present work, this additional information is used for the estimation of a more comprehensive traffic representation tool: the link-origin-destination matrix. Such three-dimensional matrices extend the concept of traditional origin-destination matrices by also giving information on the traffic assignment. Their estimation is solved as an inverse problem whose objective function represents a trade-off between important properties the traffic has to satisfy. This article presents the theory and how to implement such method on real dataset. With the case study of Brisbane City where over 600 hundreds Bluetooth detectors have been installed it also illustrates the opportunities such link-origin-destination matrices create for traffic analysis.
△ Less
Submitted 17 July, 2019;
originally announced July 2019.
-
Graph-based era segmentation of international financial integration
Authors:
Cécile Bastidon,
Antoine Parent,
Pablo Jensen,
Patrice Abry,
Pierre Borgnat
Abstract:
Assessing world-wide financial integration constitutes a recurrent challenge in macroeconometrics, often addressed by visual inspections searching for data patterns. Econophysics literature enables us to build complementary, data-driven measures of financial integration using graphs. The present contribution investigates the potential and interests of a novel 3-step approach that combines several…
▽ More
Assessing world-wide financial integration constitutes a recurrent challenge in macroeconometrics, often addressed by visual inspections searching for data patterns. Econophysics literature enables us to build complementary, data-driven measures of financial integration using graphs. The present contribution investigates the potential and interests of a novel 3-step approach that combines several state-of-the-art procedures to i) compute graph-based representations of the multivariate dependence structure of asset prices time series representing the financial states of 32 countries world-wide (1955-2015); ii) compute time series of 5 graph-based indices that characterize the time evolution of the topologies of the graph; iii) segment these time evolutions in piece-wise constant eras, using an optimization framework constructed on a multivariate multi-norm total variation penalized functional. The method shows first that it is possible to find endogenous stable eras of world-wide financial integration. Then, our results suggest that the most relevant globalization eras would be based on the historical patterns of global capital flows, while the major regulatory events of the 1970s would only appear as a cause of sub-segmentation.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Harmonic analysis on directed graphs and applications: from Fourier analysis to wavelets
Authors:
Harry Sevi,
Gabriel Rilling,
Pierre Borgnat
Abstract:
We introduce a novel harmonic analysis for functions defined on the vertices of a strongly connected directed graph of which the random walk operator is the cornerstone. As a first step, we consider the set of eigenvectors of the random walk operator as a non-orthogonal Fourier-type basis for functions over directed graphs. We found a frequency interpretation by linking the variation of the eigenv…
▽ More
We introduce a novel harmonic analysis for functions defined on the vertices of a strongly connected directed graph of which the random walk operator is the cornerstone. As a first step, we consider the set of eigenvectors of the random walk operator as a non-orthogonal Fourier-type basis for functions over directed graphs. We found a frequency interpretation by linking the variation of the eigenvectors of the random walk operator obtained from their Dirichlet energy to the real part of their associated eigenvalues. From this Fourier basis, we can proceed further and build multi-scale analyses on directed graphs. We propose both a redundant wavelet transform and a decimated wavelet transform by extending the diffusion wavelets framework by Coifman and Maggioni for directed graphs. The development of our harmonic analysis on directed graphs thus leads us to consider both semi-supervised learning problems and signal modeling problems on graphs applied to directed graphs highlighting the efficiency of our framework.
△ Less
Submitted 1 November, 2021; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Evolutions of Individuals Use of Lyon's Bike Sharing System
Authors:
Jordan Cambe,
Patrice Abry,
Julien Barnier,
Pierre Borgnat,
Marie Vogel,
Pablo Jensen
Abstract:
Bike sharing systems (BSS) have been growing fast all over the world, along with the number of articles analyzing such systems. However the lack of temporally large trip databases has limited the analysis of BSS users behavior in the long term. This article studies the V'elo'v - a BSS located in Lyon, France - subscribers commitment in the long term and the evolution of their usage over time. Usin…
▽ More
Bike sharing systems (BSS) have been growing fast all over the world, along with the number of articles analyzing such systems. However the lack of temporally large trip databases has limited the analysis of BSS users behavior in the long term. This article studies the V'elo'v - a BSS located in Lyon, France - subscribers commitment in the long term and the evolution of their usage over time. Using a 5-year dataset covering 121,000 long-term distinct users, we show the heterogeneous individual trajectories masked by the overall system stability. Users follow two main trajectories: about 60% remain in the system for at most one year, showing a low median activity (47 trips); the remaining 40% correspond to more active users (median activity of 96 trips in their first year) that remain continuously active for several years (mean time = 2.9 years). This latter class exhibits a relatively stable activity, decreasing slightly over the years. We show that middle-aged, male and urban users are over represented among the 'stable' users.
△ Less
Submitted 2 September, 2018; v1 submitted 30 March, 2018;
originally announced March 2018.
-
Analytic signal in many dimensions
Authors:
Mikhail Tsitsvero,
Pierre Borgnat,
Paulo Gonçalves
Abstract:
In this work we extend analytic signal theory to the multidimensional case when oscillations are observed in the $d$ orthogonal directions. First it is shown how to obtain separate phase-shifted components and how to combine them into instantaneous amplitude and phases. Second, the proper hypercomplex analytic signal is defined as holomorphic hypercomplex function on the boundary of certain upper…
▽ More
In this work we extend analytic signal theory to the multidimensional case when oscillations are observed in the $d$ orthogonal directions. First it is shown how to obtain separate phase-shifted components and how to combine them into instantaneous amplitude and phases. Second, the proper hypercomplex analytic signal is defined as holomorphic hypercomplex function on the boundary of certain upper half-space. Next it is shown that correct phase-shifted components can be obtained by positive frequency restriction of hypercomplex Fourier transform. Necessary and sufficient conditions for analytic extension of the hypercomplex analytic signal into the upper hypercomplex half-space by means of holomorphic Fourier transform are given by the corresponding Paley-Wiener theorem. Moreover it is demonstrated that for $d>2$ there is no corresponding non-commutative hypercomplex Fourier transform (including Clifford and Cayley-Dickson based) that allows to recover phase-shifted components correctly.
△ Less
Submitted 24 April, 2019; v1 submitted 25 December, 2017;
originally announced December 2017.
-
Design of graph filters and filterbanks
Authors:
Nicolas Tremblay,
Paulo Gonçalves,
Pierre Borgnat
Abstract:
Basic operations in graph signal processing consist in processing signals indexed on graphs either by filtering them, to extract specific part out of them, or by changing their domain of representation, using some transformation or dictionary more adapted to represent the information contained in them. The aim of this chapter is to review general concepts for the introduction of filters and repres…
▽ More
Basic operations in graph signal processing consist in processing signals indexed on graphs either by filtering them, to extract specific part out of them, or by changing their domain of representation, using some transformation or dictionary more adapted to represent the information contained in them. The aim of this chapter is to review general concepts for the introduction of filters and representations of graph signals. We first begin by recalling the general framework to achieve that, which put the emphasis on introducing some spectral domain that is relevant for graph signals to define a Graph Fourier Transform. We show how to introduce a notion of frequency analysis for graph signals by looking at their variations. Then, we move to the introduction of graph filters, that are defined like the classical equivalent for 1D signals or 2D images, as linear systems which operate on each frequency of a signal. Some examples of filters and of their implementations are given. Finally, as alternate representations of graph signals, we focus on multiscale transforms that are defined from filters. Continuous multiscale transforms such as spectral wavelets on graphs are reviewed, as well as the versatile approaches of filterbanks on graphs. Several variants of graph filterbanks are discussed, for structured as well as arbitrary graphs, with a focus on the central point of the choice of the decimation or aggregation operators.
△ Less
Submitted 3 November, 2017;
originally announced November 2017.
-
Scaling in Internet Traffic: a 14 year and 3 day longitudinal study, with multiscale analyses and random projections
Authors:
Romain Fontugne,
Patrice Abry,
Kensuke Fukuda,
Darryl Veitch,
Kenjiro Cho,
Pierre Borgnat,
Herwig Wendt
Abstract:
In the mid-90's, it was shown that the statistics of aggregated time series from Internet traffic departed from those of traditional short range dependent models, and were instead characterized by asymptotic self-similarity. Following this seminal contribution, over the years, many studies have investigated the existence and form of scaling in Internet traffic. This contribution aims first at pres…
▽ More
In the mid-90's, it was shown that the statistics of aggregated time series from Internet traffic departed from those of traditional short range dependent models, and were instead characterized by asymptotic self-similarity. Following this seminal contribution, over the years, many studies have investigated the existence and form of scaling in Internet traffic. This contribution aims first at presenting a methodology, combining multiscale analysis (wavelet and wavelet leaders) and random projections (or sketches), permitting a precise, efficient and robust characterization of scaling which is capable of seeing through non-stationary anomalies. Second, we apply the methodology to a data set spanning an unusually long period: 14 years, from the MAWI traffic archive, thereby allowing an in-depth longitudinal analysis of the form, nature and evolutions of scaling in Internet traffic, as well as network mechanisms producing them. We also study a separate 3-day long trace to obtain complementary insight into intra-day behavior. We find that a biscaling (two ranges of independent scaling phenomena) regime is systematically observed: long-range dependence over the large scales, and multifractal-like scaling over the fine scales. We quantify the actual scaling ranges precisely, verify to high accuracy the expected relationship between the long range dependent parameter and the heavy tail parameter of the flow size distribution, and relate fine scale multifractal scaling to typical IP packet inter-arrival and to round-trip time distributions.
△ Less
Submitted 6 March, 2017;
originally announced March 2017.
-
A Primal-Dual Algorithm for Link Dependent Origin Destination Matrix Estimation
Authors:
Gabriel Michau,
Nelly Pustelnik,
Pierre Borgnat,
Patrice Abry,
Alfredo Nantes,
Ashish Bhaskar,
Edward Chung
Abstract:
Origin-Destination Matrix (ODM) estimation is a classical problem in transport engineering aiming to recover flows from every Origin to every Destination from measured traffic counts and a priori model information. In addition to traffic counts, the present contribution takes advantage of probe trajectories, whose capture is made possible by new measurement technologies. It extends the concept of…
▽ More
Origin-Destination Matrix (ODM) estimation is a classical problem in transport engineering aiming to recover flows from every Origin to every Destination from measured traffic counts and a priori model information. In addition to traffic counts, the present contribution takes advantage of probe trajectories, whose capture is made possible by new measurement technologies. It extends the concept of ODM to that of Link dependent ODM (LODM), kee** the information about the flow distribution on links and containing inherently the ODM assignment. Further, an original formulation of LODM estimation, from traffic counts and probe trajectories is presented as an optimisation problem, where the functional to be minimized consists of five convex functions, each modelling a constraint or property of the transport problem: consistency with traffic counts, consistency with sampled probe trajectories, consistency with traffic conservation (Kirchhoff's law), similarity of flows having close origins and destinations, positivity of traffic flows. A primal-dual algorithm is devised to minimize the designed functional, as the corresponding objective functions are not necessarily differentiable. A case study, on a simulated network and traffic, validates the feasibility of the procedure and details its benefits for the estimation of an LODM matching real-network constraints and observations.
△ Less
Submitted 1 April, 2016;
originally announced April 2016.
-
Accelerated Spectral Clustering Using Graph Filtering Of Random Signals
Authors:
Nicolas Tremblay,
Gilles Puy,
Pierre Borgnat,
Remi Gribonval,
Pierre Vandergheynst
Abstract:
We build upon recent advances in graph signal processing to propose a faster spectral clustering algorithm. Indeed, classical spectral clustering is based on the computation of the first k eigenvectors of the similarity matrix' Laplacian, whose computation cost, even for sparse matrices, becomes prohibitive for large datasets. We show that we can estimate the spectral clustering distance matrix wi…
▽ More
We build upon recent advances in graph signal processing to propose a faster spectral clustering algorithm. Indeed, classical spectral clustering is based on the computation of the first k eigenvectors of the similarity matrix' Laplacian, whose computation cost, even for sparse matrices, becomes prohibitive for large datasets. We show that we can estimate the spectral clustering distance matrix without computing these eigenvectors: by graph filtering random signals. Also, we take advantage of the stochasticity of these random vectors to estimate the number of clusters k. We compare our method to classical spectral clustering on synthetic data, and show that it reaches equal performance while being faster by a factor at least two for large datasets.
△ Less
Submitted 29 September, 2015;
originally announced September 2015.
-
Subgraph-based filterbanks for graph signals
Authors:
Nicolas Tremblay,
Pierre Borgnat
Abstract:
We design a critically-sampled compact-support biorthogonal transform for graph signals, via graph filterbanks. Instead of partitioning the nodes in two sets so as to remove one every two nodes in the filterbank downsampling operations, the design is based on a partition of the graph in connected subgraphs. Coarsening is achieved by defining one "supernode" for each subgraph and the edges for this…
▽ More
We design a critically-sampled compact-support biorthogonal transform for graph signals, via graph filterbanks. Instead of partitioning the nodes in two sets so as to remove one every two nodes in the filterbank downsampling operations, the design is based on a partition of the graph in connected subgraphs. Coarsening is achieved by defining one "supernode" for each subgraph and the edges for this coarsened graph derives from the connectivity between the subgraphs. Unlike the "one every two nodes" downsampling on bipartite graphs, this coarsening operation does not have an exact formulation in the graph Fourier domain. Instead, we rely on the local Fourier bases of each subgraph to define filtering operations. We apply successfully this method to decompose graph signals, and show promising performance on compression and denoising.
△ Less
Submitted 11 February, 2016; v1 submitted 18 September, 2015;
originally announced September 2015.
-
Duality between Temporal Networks and Signals: Extraction of the Temporal Network Structures
Authors:
Ronan Hamon,
Pierre Borgnat,
Patrick Flandrin,
Céline Robardet
Abstract:
We develop a framework to track the structure of temporal networks with a signal processing approach. The method is based on the duality between networks and signals using a multidimensional scaling technique. This enables a study of the network structure using frequency patterns of the corresponding signals. An extension is proposed for temporal networks, thereby enabling a tracking of the networ…
▽ More
We develop a framework to track the structure of temporal networks with a signal processing approach. The method is based on the duality between networks and signals using a multidimensional scaling technique. This enables a study of the network structure using frequency patterns of the corresponding signals. An extension is proposed for temporal networks, thereby enabling a tracking of the network structure over time. A method to automatically extract the most significant frequency patterns and their activation coefficients over time is then introduced, using nonnegative matrix factorization of the temporal spectra. The framework, inspired by audio decomposition, allows transforming back these frequency patterns into networks, to highlight the evolution of the underlying structure of the network over time. The effectiveness of the method is first evidenced on a toy example, prior being used to study a temporal network of face-to-face contacts. The extraction of sub-networks highlights significant structures decomposed on time intervals.
△ Less
Submitted 12 May, 2015;
originally announced May 2015.
-
From graphs to signals and back: Identification of network structures using spectral analysis
Authors:
Ronan Hamon,
Pierre Borgnat,
Patrick Flandrin,
Céline Robardet
Abstract:
Many systems comprising entities in interactions can be represented as graphs, whose structure gives significant insights about how these systems work. Network theory has undergone further developments, in particular in relation to detection of communities in graphs, to catch this structure. Recently, an approach has been proposed to transform a graph into a collection of signals: Using a multidim…
▽ More
Many systems comprising entities in interactions can be represented as graphs, whose structure gives significant insights about how these systems work. Network theory has undergone further developments, in particular in relation to detection of communities in graphs, to catch this structure. Recently, an approach has been proposed to transform a graph into a collection of signals: Using a multidimensional scaling technique on a distance matrix representing relations between vertices of the graph, points in a Euclidean space are obtained and interpreted as signals, indexed by the vertices. In this article, we propose several extensions to this approach, develo** a framework to study graph structures using signal processing tools. We first extend the current methodology, enabling us to highlight connections between properties of signals and graph structures, such as communities, regularity or randomness, as well as combinations of those. A robust inverse transformation method is next described, taking into account possible changes in the signals compared to original ones. This technique uses, in addition to the relationships between the points in the Euclidean space, the energy of each signal, coding the different scales of the graph structure. These contributions open up new perspectives in the study of graphs, by enabling processing of graphs through the processing of the corresponding collection of signals, using reliable tools from signal processing. A technique of denoising of a graph by filtering of the corresponding signals is then described, suggesting considerable potential of the approach.
△ Less
Submitted 10 June, 2016; v1 submitted 16 February, 2015;
originally announced February 2015.
-
Discovering the structure of complex networks by minimizing cyclic bandwidth sum
Authors:
Ronan Hamon,
Pierre Borgnat,
Patrick Flandrin,
Céline Robardet
Abstract:
Getting a labeling of vertices close to the structure of the graph has been proved to be of interest in many applications e.g., to follow smooth signals indexed by the vertices of the network. This question can be related to a graph labeling problem known as the cyclic bandwidth sum problem. It consists in finding a labeling of the vertices of an undirected and unweighted graph with distinct integ…
▽ More
Getting a labeling of vertices close to the structure of the graph has been proved to be of interest in many applications e.g., to follow smooth signals indexed by the vertices of the network. This question can be related to a graph labeling problem known as the cyclic bandwidth sum problem. It consists in finding a labeling of the vertices of an undirected and unweighted graph with distinct integers such that the sum of (cyclic) difference of labels of adjacent vertices is minimized. Although theoretical results exist that give optimal value of cyclic bandwidth sum for standard graphs, there are neither results in the general case, nor explicit methods to reach this optimal result. In addition to this lack of theoretical knowledge, only a few methods have been proposed to approximately solve this problem. In this paper, we introduce a new heuristic to find an approximate solution for the cyclic bandwidth sum problem, by following the structure of the graph. The heuristic is a two-step algorithm: the first step consists of traversing the graph to find a set of paths which follow the structure of the graph, using a similarity criterion based on the Jaccard index to jump from one vertex to the next one. The second step is the merging of all obtained paths, based on a greedy approach that extends a partial solution by inserting a new path at the position that minimizes the cyclic bandwidth sum. The effectiveness of the proposed heuristic, both in terms of performance and time execution, is shown through experiments on graphs whose optimal value of CBS is known as well as on real-world networks, where the consistence between labeling and topology is highlighted. An extension to weighted graphs is also proposed.
△ Less
Submitted 16 February, 2015; v1 submitted 22 October, 2014;
originally announced October 2014.
-
Sensitivity of predictions in an effective model -- application to the chiral critical end point position in the Nambu--Jona-Lasinio model
Authors:
Alexandre Biguet,
Hubert Hansen,
Pedro Costa,
Pierre Borgnat,
Timothée Brugière
Abstract:
The measurement of the position of the chiral critical end point (CEP) in the QCD phase diagram is under debate. While it is possible to predict its position by using effective models specifically built to reproduce some of the features of the underlying theory (QCD), the quality of the predictions (\textit{e.g.}, the CEP position) obtained by such effective models, depends on whether solving the…
▽ More
The measurement of the position of the chiral critical end point (CEP) in the QCD phase diagram is under debate. While it is possible to predict its position by using effective models specifically built to reproduce some of the features of the underlying theory (QCD), the quality of the predictions (\textit{e.g.}, the CEP position) obtained by such effective models, depends on whether solving the model equations constitute a well- or ill-posed inverse problem.
Considering these predictions as being inverse problems provides tools to evaluate if the problem is ill-conditioned, meaning that infinitesimal variations of the inputs of the model can cause comparatively large variations of the predictions. If it is ill-conditioned, it has major consequences because of finite variations that could come from experimental and/or theoretical errors.
In the following, we shall apply such a reasoning on the predictions of a particular Nambu--Jona-Lasinio model within the mean field + ring approximations, with special attention to the prediction of the chiral CEP position in the $(T-μ)$ plane. We find that the problem is ill-conditioned (\idest very sensitive to input variations) for the $T$-coordinate of the CEP, whereas, it is well-posed for the $μ$-coordinate of the CEP. As a consequence, when the chiral condensate varies in a $10$ MeV range, $μ_{CEP}$ varies far less.
As an illustration to understand how problematic this could be, we show that the main consequence when taking into account finite variation of the inputs, is that the existence of the CEP itself cannot be predicted anymore: for a deviation as low as 0.6 \% with respect to vacuum phenomenology (well within the estimation of the first correction to the ring approximation) the CEP may or may not exist.
△ Less
Submitted 14 September, 2015; v1 submitted 3 September, 2014;
originally announced September 2014.
-
2-D Prony-Huang Transform: A New Tool for 2-D Spectral Analysis
Authors:
Jérémy Schmitt,
Nelly Pustelnik,
Pierre Borgnat,
Patrick Flandrin,
Laurent Condat
Abstract:
This work proposes an extension of the 1-D Hilbert Huang transform for the analysis of images. The proposed method consists in (i) adaptively decomposing an image into oscillating parts called intrinsic mode functions (IMFs) using a mode decomposition procedure, and (ii) providing a local spectral analysis of the obtained IMFs in order to get the local amplitudes, frequencies, and orientations. Fo…
▽ More
This work proposes an extension of the 1-D Hilbert Huang transform for the analysis of images. The proposed method consists in (i) adaptively decomposing an image into oscillating parts called intrinsic mode functions (IMFs) using a mode decomposition procedure, and (ii) providing a local spectral analysis of the obtained IMFs in order to get the local amplitudes, frequencies, and orientations. For the decomposition step, we propose two robust 2-D mode decompositions based on non-smooth convex optimization: a "Genuine 2-D" approach, that constrains the local extrema of the IMFs, and a "Pseudo 2-D" approach, which constrains separately the extrema of lines, columns, and diagonals. The spectral analysis step is based on Prony annihilation property that is applied on small square patches of the IMFs. The resulting 2-D Prony-Huang transform is validated on simulated and real data.
△ Less
Submitted 30 April, 2014;
originally announced April 2014.
-
Bootstrap** under constraint for the assessment of group behavior in human contact networks
Authors:
Nicolas Tremblay,
Alain Barrat,
Cary Forest,
Mark Nornberg,
Jean-François Pinton,
Pierre Borgnat
Abstract:
The increasing availability of time --and space-- resolved data describing human activities and interactions gives insights into both static and dynamic properties of human behavior. In practice, nevertheless, real-world datasets can often be considered as only one realisation of a particular event. This highlights a key issue in social network analysis: the statistical significance of estimated p…
▽ More
The increasing availability of time --and space-- resolved data describing human activities and interactions gives insights into both static and dynamic properties of human behavior. In practice, nevertheless, real-world datasets can often be considered as only one realisation of a particular event. This highlights a key issue in social network analysis: the statistical significance of estimated properties. In this context, we focus here on the assessment of quantitative features of specific subset of nodes in empirical networks. We present a method of statistical resampling based on bootstrap** groups of nodes under constraints within the empirical network. The method enables us to define acceptance intervals for various Null Hypotheses concerning relevant properties of the subset of nodes under consideration, in order to characterize by a statistical test its behavior as ``normal'' or not. We apply this method to a high resolution dataset describing the face-to-face proximity of individuals during two co-located scientific conferences. As a case study, we show how to probe whether co-locating the two conferences succeeded in bringing together the two corresponding groups of scientists.
△ Less
Submitted 8 November, 2013; v1 submitted 14 December, 2012;
originally announced December 2012.
-
Multiscale Community Mining in Networks Using Spectral Graph Wavelets
Authors:
Nicolas Tremblay,
Pierre Borgnat
Abstract:
For data represented by networks, the community structure of the underlying graph is of great interest. A classical clustering problem is to uncover the overall ``best'' partition of nodes in communities. Here, a more elaborate description is proposed in which community structures are identified at different scales. To this end, we take advantage of the local and scale-dependent information encode…
▽ More
For data represented by networks, the community structure of the underlying graph is of great interest. A classical clustering problem is to uncover the overall ``best'' partition of nodes in communities. Here, a more elaborate description is proposed in which community structures are identified at different scales. To this end, we take advantage of the local and scale-dependent information encoded in graph wavelets. After new developments for the practical use of graph wavelets, studying proper scale boundaries and parameters and introducing scaling functions, we propose a method to mine for communities in complex networks in a scale-dependent manner. It relies on classifying nodes according to their wavelets or scaling functions, using a scale-dependent modularity function. An example on a graph benchmark having hierarchical communities shows that we estimate successfully its multiscale structure.
△ Less
Submitted 8 November, 2013; v1 submitted 4 December, 2012;
originally announced December 2012.