Search | arXiv e-print repository

A Comparative Analysis of Gene Expression Profiling by Statistical and Machine Learning Approaches

Authors: Myriam Bontonou, Anaïs Haget, Maria Boulougouri, Benjamin Audit, Pierre Borgnat, Jean-Michel Arbona

Abstract: Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can potentially provide some understanding of phenotypes by extracting explanations for their decisions. These explanations often take the form of a list of genes ranked in order of importance for the predictions, the highest-ranked genes being inter… ▽ More Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can potentially provide some understanding of phenotypes by extracting explanations for their decisions. These explanations often take the form of a list of genes ranked in order of importance for the predictions, the highest-ranked genes being interpreted as linked to the phenotype. We discuss the biological and the methodological limitations of such explanations. Experiments are performed on several datasets gathering cancer and healthy tissue samples from the TCGA, GTEx and TARGET databases. A collection of machine learning models including logistic regression, multilayer perceptron, and graph neural network are trained to classify samples according to their cancer type. Gene rankings are obtained from explainability methods adapted to these models, and compared to the ones from classical statistical feature selection methods such as mutual information, DESeq2, and EdgeR. Interestingly, on simple tasks, we observe that the information learned by black-box neural networks is related to the notion of differential expression. In all cases, a small set containing the best-ranked genes is sufficient to achieve a good classification. However, these genes differ significantly between the methods and similar classification performance can be achieved with numerous lower ranked genes. In conclusion, although these methods enable the identification of biomarkers characteristic of certain pathologies, our results question the completeness of the selected gene sets and thus of explainability by the identification of the underlying biological processes. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2310.02840 [pdf, ps, other]

Mosaic benchmark networks: Modular link streams for testing dynamic community detection algorithms

Authors: Yasaman Asgari, Remy Cazabet, Pierre Borgnat

Abstract: Community structure is a critical feature of real networks, providing insights into nodes' internal organization. Nowadays, with the availability of highly detailed temporal networks such as link streams, studying community structures becomes more complex due to increased data precision and time sensitivity. Despite numerous algorithms developed in the past decade for dynamic community discovery,… ▽ More Community structure is a critical feature of real networks, providing insights into nodes' internal organization. Nowadays, with the availability of highly detailed temporal networks such as link streams, studying community structures becomes more complex due to increased data precision and time sensitivity. Despite numerous algorithms developed in the past decade for dynamic community discovery, assessing their performance on link streams remains a challenge. Synthetic benchmark graphs are a well-accepted approach for evaluating static community detection algorithms. Additionally, there have been some proposals for slowly evolving communities in low-resolution temporal networks like snapshots. Nevertheless, this approach is not yet suitable for link streams. To bridge this gap, we introduce a novel framework that generates synthetic modular link streams with predefined communities. Subsequently, we evaluate established dynamic community detection methods to uncover limitations that may not be evident in snapshots with slowly evolving communities. While no method emerges as a clear winner, we observe notable differences among them. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2307.04890 [pdf, other]

Temporal network compression via network hashing

Authors: Rémi Vaudaine, Pierre Borgnat, Paulo Goncalves, Rémi Gribonval, Márton Karsai

Abstract: Pairwise temporal interactions between entities can be represented as temporal networks, which code the propagation of processes such as epidemic spreading or information cascades, evolving on top of them. The largest outcome of these processes is directly linked to the structure of the underlying network. Indeed, a node of a network at given time cannot affect more nodes in the future than it can… ▽ More Pairwise temporal interactions between entities can be represented as temporal networks, which code the propagation of processes such as epidemic spreading or information cascades, evolving on top of them. The largest outcome of these processes is directly linked to the structure of the underlying network. Indeed, a node of a network at given time cannot affect more nodes in the future than it can reach via time-respecting paths. This set of nodes reachable from a source defines an out-component, which identification is costly. In this paper, we propose an efficient matrix algorithm to tackle this issue and show that it outperforms other state-of-the-art methods. Secondly, we propose a hashing framework to coarsen large temporal networks into smaller proxies on which out-components are easier to estimate, and then recombined to obtain the initial components. Our graph hashing solution has implications in privacy respecting representation of temporal networks. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: 17 pages, 8 figures

arXiv:2303.11336 [pdf, other]

Studying Limits of Explainability by Integrated Gradients for Gene Expression Models

Authors: Myriam Bontonou, Anaïs Haget, Maria Boulougouri, Jean-Michel Arbona, Benjamin Audit, Pierre Borgnat

Abstract: Understanding the molecular processes that drive cellular life is a fundamental question in biological research. Ambitious programs have gathered a number of molecular datasets on large populations. To decipher the complex cellular interactions, recent work has turned to supervised machine learning methods. The scientific questions are formulated as classical learning problems on tabular data or o… ▽ More Understanding the molecular processes that drive cellular life is a fundamental question in biological research. Ambitious programs have gathered a number of molecular datasets on large populations. To decipher the complex cellular interactions, recent work has turned to supervised machine learning methods. The scientific questions are formulated as classical learning problems on tabular data or on graphs, e.g. phenotype prediction from gene expression data. In these works, the input features on which the individual predictions are predominantly based are often interpreted as indicative of the cause of the phenotype, such as cancer identification. Here, we propose to explore the relevance of the biomarkers identified by Integrated Gradients, an explainability method for feature attribution in machine learning. Through a motivating example on The Cancer Genome Atlas, we show that ranking features by importance is not enough to robustly identify biomarkers. As it is difficult to evaluate whether biomarkers reflect relevant causes without known ground truth, we simulate gene expression data by proposing a hierarchical model based on Latent Dirichlet Allocation models. We also highlight good practices for evaluating explanations for genomics data and propose a direction to derive more insights from these explanations. △ Less

Submitted 19 March, 2023; originally announced March 2023.

arXiv:2303.07646 [pdf, other]

Clustering with Simplicial Complexes

Authors: Thummaluru Siddartha Reddy, Sundeep Prabhakar Chepuri, Pierre Borgnat

Abstract: In this work, we propose a new clustering algorithm to group nodes in networks based on second-order simplices (aka filled triangles) to leverage higher-order network interactions. We define a simplicial conductance function, which on minimizing, yields an optimal partition with a higher density of filled triangles within the set while the density of filled triangles is smaller across the sets. To… ▽ More In this work, we propose a new clustering algorithm to group nodes in networks based on second-order simplices (aka filled triangles) to leverage higher-order network interactions. We define a simplicial conductance function, which on minimizing, yields an optimal partition with a higher density of filled triangles within the set while the density of filled triangles is smaller across the sets. To this end, we propose a simplicial adjacency operator that captures the relation between the nodes through second-order simplices. This allows us to extend the well-known Cheeger inequality to cluster a simplicial complex. Then, leveraging the Cheeger inequality, we propose the simplicial spectral clustering algorithm. We report results from numerical experiments on synthetic and real-world network data to demonstrate the efficacy of the proposed approach. △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2209.12727 [pdf, other]

A Simple Way to Learn Metrics Between Attributed Graphs

Authors: Yacouba Kaloga, Pierre Borgnat, Amaury Habrard

Abstract: The choice of good distances and similarity measures between objects is important for many machine learning methods. Therefore, many metric learning algorithms have been developed in recent years, mainly for Euclidean data in order to improve performance of classification or clustering methods. However, due to difficulties in establishing computable, efficient and differentiable distances between… ▽ More The choice of good distances and similarity measures between objects is important for many machine learning methods. Therefore, many metric learning algorithms have been developed in recent years, mainly for Euclidean data in order to improve performance of classification or clustering methods. However, due to difficulties in establishing computable, efficient and differentiable distances between attributed graphs, few metric learning algorithms adapted to graphs have been developed despite the strong interest of the community. In this paper, we address this issue by proposing a new Simple Graph Metric Learning - SGML - model with few trainable parameters based on Simple Graph Convolutional Neural Networks - SGCN - and elements of Optimal Transport theory. This model allows us to build an appropriate distance from a database of labeled (attributed) graphs to improve the performance of simple classification algorithms such as $k$-NN. This distance can be quickly trained while maintaining good performances as illustrated by the experimental study presented in this paper. △ Less

Submitted 21 December, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

arXiv:2208.00971 [pdf, other]

Probabilistic forecasts of extreme heatwaves using convolutional neural networks in a regime of lack of data

Authors: George Miloshevich, Bastien Cozian, Patrice Abry, Pierre Borgnat, Freddy Bouchet

Abstract: Understanding extreme events and their probability is key for the study of climate change impacts, risk assessment, adaptation, and the protection of living beings. Forecasting the occurrence probability of extreme heatwaves is a primary challenge for risk assessment and attribution, but also for fundamental studies about processes, dataset and model validation, and climate change studies. In this… ▽ More Understanding extreme events and their probability is key for the study of climate change impacts, risk assessment, adaptation, and the protection of living beings. Forecasting the occurrence probability of extreme heatwaves is a primary challenge for risk assessment and attribution, but also for fundamental studies about processes, dataset and model validation, and climate change studies. In this work we develop a methodology to build forecasting models which are based on convolutional neural networks, trained on extremely long climate model outputs. We demonstrate that neural networks have positive predictive skills, with respect to random climatological forecasts, for the occurrence of long-lasting 14-day heatwaves over France, up to 15 days ahead of time for fast dynamical drivers (500 hPa geopotential height fields), and also at much longer lead times for slow physical drivers (soil moisture). This forecast is made seamlessly in time and space, for fast hemispheric and slow local drivers. We find that the neural network selects extreme heatwaves associated with a North-Hemisphere wavenumber-3 pattern. The main scientific message is that most of the time, training neural networks for predicting extreme heatwaves occurs in a regime of lack of data. We suggest that this is likely to be the case for most other applications to large scale atmosphere and climate phenomena. For instance, using one hundred years-long training sets, a regime of drastic lack of data, leads to severely lower predictive skills and general inability to extract useful information available in the 500 hPa geopotential height field at a hemispheric scale in contrast to the dataset of several thousand years long. We discuss perspectives for dealing with the lack of data regime, for instance rare event simulations and how transfer learning may play a role in this latter task. △ Less

Submitted 17 February, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: 34 pages, 12 figures

arXiv:2104.14652 [pdf, other]

Fast Multiscale Diffusion on Graphs

Authors: Sibylle Marcotte, Amélie Barbe, Rémi Gribonval, Titouan Vayer, Marc Sebban, Pierre Borgnat, Paulo Gonçalves

Abstract: Diffusing a graph signal at multiple scales requires computing the action of the exponential of several multiples of the Laplacian matrix. We tighten a bound on the approximation error of truncated Chebyshev polynomial approximations of the exponential, hence significantly improving a priori estimates of the polynomial order for a prescribed error. We further exploit properties of these approximat… ▽ More Diffusing a graph signal at multiple scales requires computing the action of the exponential of several multiples of the Laplacian matrix. We tighten a bound on the approximation error of truncated Chebyshev polynomial approximations of the exponential, hence significantly improving a priori estimates of the polynomial order for a prescribed error. We further exploit properties of these approximations to factorize the computation of the action of the diffusion operator over multiple scales, thus reducing drastically its computational cost. △ Less

Submitted 29 April, 2021; originally announced April 2021.

arXiv:2103.09743 [pdf, ps, other]

doi 10.3389/fclim.2022.789641

Deep Learning-based Extreme Heatwave Forecast

Authors: Valérian Jacques-Dumas, Francesco Ragone, Pierre Borgnat, Patrice Abry, Freddy Bouchet

Abstract: Because of the impact of extreme heat waves and heat domes on society and biodiversity, their study is a key challenge. We specifically study long-lasting extreme heat waves, which are among the most important for climate impacts. Physics driven weather forecast systems or climate models can be used to forecast their occurrence or predict their probability. The present work explores the use of dee… ▽ More Because of the impact of extreme heat waves and heat domes on society and biodiversity, their study is a key challenge. We specifically study long-lasting extreme heat waves, which are among the most important for climate impacts. Physics driven weather forecast systems or climate models can be used to forecast their occurrence or predict their probability. The present work explores the use of deep learning architectures, trained using outputs of a climate model, as an alternative strategy to forecast the occurrence of extreme long-lasting heatwaves. This new approach will be useful for several key scientific goals which include the study of climate model statistics, building a quantitative proxy for resampling rare events in climate models, study the impact of climate change, and should eventually be useful for forecasting. Fulfilling these important goals implies addressing issues such as class-size imbalance that is intrinsically associated with rare event prediction, assessing the potential benefits of transfer learning to address the nested nature of extreme events (naturally included in less extreme ones). We train a Convolutional Neural Network, using 1000 years of climate model outputs, with large-class undersampling and transfer learning. From the observed snapshots of the surface temperature and the 500 hPa geopotential height fields, the trained network achieves significant performance in forecasting the occurrence of long-lasting extreme heatwaves. We are able to predict them at three different levels of intensity, and as early as 15 days ahead of the start of the event (30 days ahead of the end of the event). △ Less

Submitted 13 January, 2022; v1 submitted 17 March, 2021; originally announced March 2021.

Comments: 21 pages, 5 figures

arXiv:2010.16132 [pdf, other]

Multiview Variational Graph Autoencoders for Canonical Correlation Analysis

Authors: Yacouba Kaloga, Pierre Borgnat, Sundeep Prabhakar Chepuri, Patrice Abry, Amaury Habrard

Abstract: We present a novel multiview canonical correlation analysis model based on a variational approach. This is the first nonlinear model that takes into account the available graph-based geometric constraints while being scalable for processing large scale datasets with multiple views. It is based on an autoencoder architecture with graph convolutional neural network layers. We experiment with our app… ▽ More We present a novel multiview canonical correlation analysis model based on a variational approach. This is the first nonlinear model that takes into account the available graph-based geometric constraints while being scalable for processing large scale datasets with multiple views. It is based on an autoencoder architecture with graph convolutional neural network layers. We experiment with our approach on classification, clustering, and recommendation tasks on real datasets. The algorithm is competitive with state-of-the-art multiview representation learning techniques. △ Less

Submitted 4 October, 2021; v1 submitted 30 October, 2020; originally announced October 2020.

Comments: 4 pages, 3 figures, submitted

arXiv:2007.03373 [pdf, other]

Hierarchical and Unsupervised Graph Representation Learning with Loukas's Coarsening

Authors: Louis Béthune, Yacouba Kaloga, Pierre Borgnat, Aurélien Garivier, Amaury Habrard

Abstract: We propose a novel algorithm for unsupervised graph representation learning with attributed graphs. It combines three advantages addressing some current limitations of the literature: i) The model is inductive: it can embed new graphs without re-training in the presence of new data; ii) The method takes into account both micro-structures and macro-structures by looking at the attributed graphs at… ▽ More We propose a novel algorithm for unsupervised graph representation learning with attributed graphs. It combines three advantages addressing some current limitations of the literature: i) The model is inductive: it can embed new graphs without re-training in the presence of new data; ii) The method takes into account both micro-structures and macro-structures by looking at the attributed graphs at different scales; iii) The model is end-to-end differentiable: it is a building block that can be plugged into deep learning pipelines and allows for back-propagation. We show that combining a coarsening method having strong theoretical guarantees with mutual information maximization suffices to produce high quality embeddings. We evaluate them on classification tasks with common benchmarks of the literature. We show that our algorithm is competitive with state of the art among unsupervised graph representation learning methods. △ Less

Submitted 17 August, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: 19 pages, 15 figures, submitted

arXiv:2007.00909 [pdf, other]

Asymptotic control of FWER under Gaussian assumption: application to correlation tests

Authors: Sophie Achard, Pierre Borgnat, Irène Gannaz

Abstract: In many applications, hypothesis testing is based on an asymptotic distribution of statistics. The aim of this paper is to clarify and extend multiple correction procedures when the statistics are asymptotically Gaussian. We propose a unified framework to prove their asymptotic behavior which is valid in the case of highly correlated tests. We focus on correlation tests where several test statisti… ▽ More In many applications, hypothesis testing is based on an asymptotic distribution of statistics. The aim of this paper is to clarify and extend multiple correction procedures when the statistics are asymptotically Gaussian. We propose a unified framework to prove their asymptotic behavior which is valid in the case of highly correlated tests. We focus on correlation tests where several test statistics are proposed. All these multiple testing procedures on correlations are shown to control FWER. An extensive simulation study on correlation-based graph estimation highlights finite sample behavior, independence on the sparsity of graphs and dependence on the values of correlations. Empirical evaluation of power provides comparisons of the proposed methods. Finally validation of our procedures is proposed on real dataset of rats brain connectivity measured by fMRI. We confirm our theoretical findings by applying our procedures on a full null hypotheses with data from dead rats. Data on alive rats show the performance of the proposed procedures to correctly identify brain connectivity graphs with controlled errors. △ Less

Submitted 2 July, 2020; originally announced July 2020.

arXiv:1910.14576 [pdf]

Solving NMF with smoothness and sparsity constraints using PALM

Authors: Raimon Fabregat, Nelly Pustelnik, Paulo Gonçalves, Pierre Borgnat

Abstract: Non-negative matrix factorization is a problem of dimensionality reduction and source separation of data that has been widely used in many fields since it was studied in depth in 1999 by Lee and Seung, including in compression of data, document clustering, processing of audio spectrograms and astronomy. In this work we have adapted a minimization scheme for convex functions with non-differentiable… ▽ More Non-negative matrix factorization is a problem of dimensionality reduction and source separation of data that has been widely used in many fields since it was studied in depth in 1999 by Lee and Seung, including in compression of data, document clustering, processing of audio spectrograms and astronomy. In this work we have adapted a minimization scheme for convex functions with non-differentiable constraints called PALM to solve the NMF problem with solutions that can be smooth and/or sparse, two properties frequently desired. △ Less

Submitted 18 March, 2021; v1 submitted 31 October, 2019; originally announced October 2019.

arXiv:1907.07495 [pdf, other]

Combining traffic counts and Bluetooth data for link-origin-destination matrix estimation in large urban networks: The Brisbane case study

Authors: Gabriel Michau, Nelly Pustelnik, Pierre Borgnat, Patrice Abry, Ashish Bhaskar, Edward Chung

Abstract: Origin-Destination matrix estimation is a keystone for traffic representation and analysis. Traditionally estimated thanks to traffic counts, surveys and socio-economic models, recent technological advances permit to rethink the estimation problem. Road user identification technologies, such as connected GPS, Bluetooth or Wifi detectors bring additional information, that is, for a fraction of the… ▽ More Origin-Destination matrix estimation is a keystone for traffic representation and analysis. Traditionally estimated thanks to traffic counts, surveys and socio-economic models, recent technological advances permit to rethink the estimation problem. Road user identification technologies, such as connected GPS, Bluetooth or Wifi detectors bring additional information, that is, for a fraction of the users, the origin, the destination and to some extend the itinerary taken. In the present work, this additional information is used for the estimation of a more comprehensive traffic representation tool: the link-origin-destination matrix. Such three-dimensional matrices extend the concept of traditional origin-destination matrices by also giving information on the traffic assignment. Their estimation is solved as an inverse problem whose objective function represents a trade-off between important properties the traffic has to satisfy. This article presents the theory and how to implement such method on real dataset. With the case study of Brisbane City where over 600 hundreds Bluetooth detectors have been installed it also illustrates the opportunities such link-origin-destination matrices create for traffic analysis. △ Less

Submitted 17 July, 2019; originally announced July 2019.

arXiv:1905.11842 [pdf, other]

doi 10.1016/j.physa.2019.122877

Graph-based era segmentation of international financial integration

Authors: Cécile Bastidon, Antoine Parent, Pablo Jensen, Patrice Abry, Pierre Borgnat

Abstract: Assessing world-wide financial integration constitutes a recurrent challenge in macroeconometrics, often addressed by visual inspections searching for data patterns. Econophysics literature enables us to build complementary, data-driven measures of financial integration using graphs. The present contribution investigates the potential and interests of a novel 3-step approach that combines several… ▽ More Assessing world-wide financial integration constitutes a recurrent challenge in macroeconometrics, often addressed by visual inspections searching for data patterns. Econophysics literature enables us to build complementary, data-driven measures of financial integration using graphs. The present contribution investigates the potential and interests of a novel 3-step approach that combines several state-of-the-art procedures to i) compute graph-based representations of the multivariate dependence structure of asset prices time series representing the financial states of 32 countries world-wide (1955-2015); ii) compute time series of 5 graph-based indices that characterize the time evolution of the topologies of the graph; iii) segment these time evolutions in piece-wise constant eras, using an optimization framework constructed on a multivariate multi-norm total variation penalized functional. The method shows first that it is possible to find endogenous stable eras of world-wide financial integration. Then, our results suggest that the most relevant globalization eras would be based on the historical patterns of global capital flows, while the major regulatory events of the 1970s would only appear as a cause of sub-segmentation. △ Less

Submitted 28 May, 2019; originally announced May 2019.

Comments: 18 pages, 3 figures

arXiv:1811.11636 [pdf, other]

Harmonic analysis on directed graphs and applications: from Fourier analysis to wavelets

Authors: Harry Sevi, Gabriel Rilling, Pierre Borgnat

Abstract: We introduce a novel harmonic analysis for functions defined on the vertices of a strongly connected directed graph of which the random walk operator is the cornerstone. As a first step, we consider the set of eigenvectors of the random walk operator as a non-orthogonal Fourier-type basis for functions over directed graphs. We found a frequency interpretation by linking the variation of the eigenv… ▽ More We introduce a novel harmonic analysis for functions defined on the vertices of a strongly connected directed graph of which the random walk operator is the cornerstone. As a first step, we consider the set of eigenvectors of the random walk operator as a non-orthogonal Fourier-type basis for functions over directed graphs. We found a frequency interpretation by linking the variation of the eigenvectors of the random walk operator obtained from their Dirichlet energy to the real part of their associated eigenvalues. From this Fourier basis, we can proceed further and build multi-scale analyses on directed graphs. We propose both a redundant wavelet transform and a decimated wavelet transform by extending the diffusion wavelets framework by Coifman and Maggioni for directed graphs. The development of our harmonic analysis on directed graphs thus leads us to consider both semi-supervised learning problems and signal modeling problems on graphs applied to directed graphs highlighting the efficiency of our framework. △ Less

Submitted 1 November, 2021; v1 submitted 28 November, 2018; originally announced November 2018.

Comments: Under review at ACHA

arXiv:1803.11505 [pdf, other]

Evolutions of Individuals Use of Lyon's Bike Sharing System

Authors: Jordan Cambe, Patrice Abry, Julien Barnier, Pierre Borgnat, Marie Vogel, Pablo Jensen

Abstract: Bike sharing systems (BSS) have been growing fast all over the world, along with the number of articles analyzing such systems. However the lack of temporally large trip databases has limited the analysis of BSS users behavior in the long term. This article studies the V'elo'v - a BSS located in Lyon, France - subscribers commitment in the long term and the evolution of their usage over time. Usin… ▽ More Bike sharing systems (BSS) have been growing fast all over the world, along with the number of articles analyzing such systems. However the lack of temporally large trip databases has limited the analysis of BSS users behavior in the long term. This article studies the V'elo'v - a BSS located in Lyon, France - subscribers commitment in the long term and the evolution of their usage over time. Using a 5-year dataset covering 121,000 long-term distinct users, we show the heterogeneous individual trajectories masked by the overall system stability. Users follow two main trajectories: about 60% remain in the system for at most one year, showing a low median activity (47 trips); the remaining 40% correspond to more active users (median activity of 96 trips in their first year) that remain continuously active for several years (mean time = 2.9 years). This latter class exhibits a relatively stable activity, decreasing slightly over the years. We show that middle-aged, male and urban users are over represented among the 'stable' users. △ Less

Submitted 2 September, 2018; v1 submitted 30 March, 2018; originally announced March 2018.

Comments: 11 pages, 7 figures

arXiv:1712.09350 [pdf, other]

Analytic signal in many dimensions

Authors: Mikhail Tsitsvero, Pierre Borgnat, Paulo Gonçalves

Abstract: In this work we extend analytic signal theory to the multidimensional case when oscillations are observed in the $d$ orthogonal directions. First it is shown how to obtain separate phase-shifted components and how to combine them into instantaneous amplitude and phases. Second, the proper hypercomplex analytic signal is defined as holomorphic hypercomplex function on the boundary of certain upper… ▽ More In this work we extend analytic signal theory to the multidimensional case when oscillations are observed in the $d$ orthogonal directions. First it is shown how to obtain separate phase-shifted components and how to combine them into instantaneous amplitude and phases. Second, the proper hypercomplex analytic signal is defined as holomorphic hypercomplex function on the boundary of certain upper half-space. Next it is shown that correct phase-shifted components can be obtained by positive frequency restriction of hypercomplex Fourier transform. Necessary and sufficient conditions for analytic extension of the hypercomplex analytic signal into the upper hypercomplex half-space by means of holomorphic Fourier transform are given by the corresponding Paley-Wiener theorem. Moreover it is demonstrated that for $d>2$ there is no corresponding non-commutative hypercomplex Fourier transform (including Clifford and Cayley-Dickson based) that allows to recover phase-shifted components correctly. △ Less

Submitted 24 April, 2019; v1 submitted 25 December, 2017; originally announced December 2017.

Comments: 47 pages, under review in Applied and Computational Harmonic Analysis Journal

arXiv:1711.02046 [pdf, other]

Design of graph filters and filterbanks

Authors: Nicolas Tremblay, Paulo Gonçalves, Pierre Borgnat

Abstract: Basic operations in graph signal processing consist in processing signals indexed on graphs either by filtering them, to extract specific part out of them, or by changing their domain of representation, using some transformation or dictionary more adapted to represent the information contained in them. The aim of this chapter is to review general concepts for the introduction of filters and repres… ▽ More Basic operations in graph signal processing consist in processing signals indexed on graphs either by filtering them, to extract specific part out of them, or by changing their domain of representation, using some transformation or dictionary more adapted to represent the information contained in them. The aim of this chapter is to review general concepts for the introduction of filters and representations of graph signals. We first begin by recalling the general framework to achieve that, which put the emphasis on introducing some spectral domain that is relevant for graph signals to define a Graph Fourier Transform. We show how to introduce a notion of frequency analysis for graph signals by looking at their variations. Then, we move to the introduction of graph filters, that are defined like the classical equivalent for 1D signals or 2D images, as linear systems which operate on each frequency of a signal. Some examples of filters and of their implementations are given. Finally, as alternate representations of graph signals, we focus on multiscale transforms that are defined from filters. Continuous multiscale transforms such as spectral wavelets on graphs are reviewed, as well as the versatile approaches of filterbanks on graphs. Several variants of graph filterbanks are discussed, for structured as well as arbitrary graphs, with a focus on the central point of the choice of the decimation or aggregation operators. △ Less

Submitted 3 November, 2017; originally announced November 2017.

Comments: chapter in collective book

arXiv:1703.02005 [pdf, other]

Scaling in Internet Traffic: a 14 year and 3 day longitudinal study, with multiscale analyses and random projections

Authors: Romain Fontugne, Patrice Abry, Kensuke Fukuda, Darryl Veitch, Kenjiro Cho, Pierre Borgnat, Herwig Wendt

Abstract: In the mid-90's, it was shown that the statistics of aggregated time series from Internet traffic departed from those of traditional short range dependent models, and were instead characterized by asymptotic self-similarity. Following this seminal contribution, over the years, many studies have investigated the existence and form of scaling in Internet traffic. This contribution aims first at pres… ▽ More In the mid-90's, it was shown that the statistics of aggregated time series from Internet traffic departed from those of traditional short range dependent models, and were instead characterized by asymptotic self-similarity. Following this seminal contribution, over the years, many studies have investigated the existence and form of scaling in Internet traffic. This contribution aims first at presenting a methodology, combining multiscale analysis (wavelet and wavelet leaders) and random projections (or sketches), permitting a precise, efficient and robust characterization of scaling which is capable of seeing through non-stationary anomalies. Second, we apply the methodology to a data set spanning an unusually long period: 14 years, from the MAWI traffic archive, thereby allowing an in-depth longitudinal analysis of the form, nature and evolutions of scaling in Internet traffic, as well as network mechanisms producing them. We also study a separate 3-day long trace to obtain complementary insight into intra-day behavior. We find that a biscaling (two ranges of independent scaling phenomena) regime is systematically observed: long-range dependence over the large scales, and multifractal-like scaling over the fine scales. We quantify the actual scaling ranges precisely, verify to high accuracy the expected relationship between the long range dependent parameter and the heavy tail parameter of the flow size distribution, and relate fine scale multifractal scaling to typical IP packet inter-arrival and to round-trip time distributions. △ Less

Submitted 6 March, 2017; originally announced March 2017.

arXiv:1604.00391 [pdf, other]

doi 10.1109/TSIPN.2016.2623094

A Primal-Dual Algorithm for Link Dependent Origin Destination Matrix Estimation

Authors: Gabriel Michau, Nelly Pustelnik, Pierre Borgnat, Patrice Abry, Alfredo Nantes, Ashish Bhaskar, Edward Chung

Abstract: Origin-Destination Matrix (ODM) estimation is a classical problem in transport engineering aiming to recover flows from every Origin to every Destination from measured traffic counts and a priori model information. In addition to traffic counts, the present contribution takes advantage of probe trajectories, whose capture is made possible by new measurement technologies. It extends the concept of… ▽ More Origin-Destination Matrix (ODM) estimation is a classical problem in transport engineering aiming to recover flows from every Origin to every Destination from measured traffic counts and a priori model information. In addition to traffic counts, the present contribution takes advantage of probe trajectories, whose capture is made possible by new measurement technologies. It extends the concept of ODM to that of Link dependent ODM (LODM), kee** the information about the flow distribution on links and containing inherently the ODM assignment. Further, an original formulation of LODM estimation, from traffic counts and probe trajectories is presented as an optimisation problem, where the functional to be minimized consists of five convex functions, each modelling a constraint or property of the transport problem: consistency with traffic counts, consistency with sampled probe trajectories, consistency with traffic conservation (Kirchhoff's law), similarity of flows having close origins and destinations, positivity of traffic flows. A primal-dual algorithm is devised to minimize the designed functional, as the corresponding objective functions are not necessarily differentiable. A case study, on a simulated network and traffic, validates the feasibility of the procedure and details its benefits for the estimation of an LODM matching real-network constraints and observations. △ Less

Submitted 1 April, 2016; originally announced April 2016.

Journal ref: 2016 IEEE Transactions on Signal and Information Processing over Networks

arXiv:1509.08863 [pdf, other]

Accelerated Spectral Clustering Using Graph Filtering Of Random Signals

Authors: Nicolas Tremblay, Gilles Puy, Pierre Borgnat, Remi Gribonval, Pierre Vandergheynst

Abstract: We build upon recent advances in graph signal processing to propose a faster spectral clustering algorithm. Indeed, classical spectral clustering is based on the computation of the first k eigenvectors of the similarity matrix' Laplacian, whose computation cost, even for sparse matrices, becomes prohibitive for large datasets. We show that we can estimate the spectral clustering distance matrix wi… ▽ More We build upon recent advances in graph signal processing to propose a faster spectral clustering algorithm. Indeed, classical spectral clustering is based on the computation of the first k eigenvectors of the similarity matrix' Laplacian, whose computation cost, even for sparse matrices, becomes prohibitive for large datasets. We show that we can estimate the spectral clustering distance matrix without computing these eigenvectors: by graph filtering random signals. Also, we take advantage of the stochasticity of these random vectors to estimate the number of clusters k. We compare our method to classical spectral clustering on synthetic data, and show that it reaches equal performance while being faster by a factor at least two for large datasets. △ Less

Submitted 29 September, 2015; originally announced September 2015.

arXiv:1509.05642 [pdf, other]

doi 10.1109/TSP.2016.2544747

Subgraph-based filterbanks for graph signals

Authors: Nicolas Tremblay, Pierre Borgnat

Abstract: We design a critically-sampled compact-support biorthogonal transform for graph signals, via graph filterbanks. Instead of partitioning the nodes in two sets so as to remove one every two nodes in the filterbank downsampling operations, the design is based on a partition of the graph in connected subgraphs. Coarsening is achieved by defining one "supernode" for each subgraph and the edges for this… ▽ More We design a critically-sampled compact-support biorthogonal transform for graph signals, via graph filterbanks. Instead of partitioning the nodes in two sets so as to remove one every two nodes in the filterbank downsampling operations, the design is based on a partition of the graph in connected subgraphs. Coarsening is achieved by defining one "supernode" for each subgraph and the edges for this coarsened graph derives from the connectivity between the subgraphs. Unlike the "one every two nodes" downsampling on bipartite graphs, this coarsening operation does not have an exact formulation in the graph Fourier domain. Instead, we rely on the local Fourier bases of each subgraph to define filtering operations. We apply successfully this method to decompose graph signals, and show promising performance on compression and denoising. △ Less

Submitted 11 February, 2016; v1 submitted 18 September, 2015; originally announced September 2015.

arXiv:1505.03044 [pdf, other]

Duality between Temporal Networks and Signals: Extraction of the Temporal Network Structures

Authors: Ronan Hamon, Pierre Borgnat, Patrick Flandrin, Céline Robardet

Abstract: We develop a framework to track the structure of temporal networks with a signal processing approach. The method is based on the duality between networks and signals using a multidimensional scaling technique. This enables a study of the network structure using frequency patterns of the corresponding signals. An extension is proposed for temporal networks, thereby enabling a tracking of the networ… ▽ More We develop a framework to track the structure of temporal networks with a signal processing approach. The method is based on the duality between networks and signals using a multidimensional scaling technique. This enables a study of the network structure using frequency patterns of the corresponding signals. An extension is proposed for temporal networks, thereby enabling a tracking of the network structure over time. A method to automatically extract the most significant frequency patterns and their activation coefficients over time is then introduced, using nonnegative matrix factorization of the temporal spectra. The framework, inspired by audio decomposition, allows transforming back these frequency patterns into networks, to highlight the evolution of the underlying structure of the network over time. The effectiveness of the method is first evidenced on a toy example, prior being used to study a temporal network of face-to-face contacts. The extraction of sub-networks highlights significant structures decomposed on time intervals. △ Less

Submitted 12 May, 2015; originally announced May 2015.

arXiv:1502.04697 [pdf, other]

From graphs to signals and back: Identification of network structures using spectral analysis

Authors: Ronan Hamon, Pierre Borgnat, Patrick Flandrin, Céline Robardet

Abstract: Many systems comprising entities in interactions can be represented as graphs, whose structure gives significant insights about how these systems work. Network theory has undergone further developments, in particular in relation to detection of communities in graphs, to catch this structure. Recently, an approach has been proposed to transform a graph into a collection of signals: Using a multidim… ▽ More Many systems comprising entities in interactions can be represented as graphs, whose structure gives significant insights about how these systems work. Network theory has undergone further developments, in particular in relation to detection of communities in graphs, to catch this structure. Recently, an approach has been proposed to transform a graph into a collection of signals: Using a multidimensional scaling technique on a distance matrix representing relations between vertices of the graph, points in a Euclidean space are obtained and interpreted as signals, indexed by the vertices. In this article, we propose several extensions to this approach, develo** a framework to study graph structures using signal processing tools. We first extend the current methodology, enabling us to highlight connections between properties of signals and graph structures, such as communities, regularity or randomness, as well as combinations of those. A robust inverse transformation method is next described, taking into account possible changes in the signals compared to original ones. This technique uses, in addition to the relationships between the points in the Euclidean space, the energy of each signal, coding the different scales of the graph structure. These contributions open up new perspectives in the study of graphs, by enabling processing of graphs through the processing of the corresponding collection of signals, using reliable tools from signal processing. A technique of denoising of a graph by filtering of the corresponding signals is then described, suggesting considerable potential of the approach. △ Less

Submitted 10 June, 2016; v1 submitted 16 February, 2015; originally announced February 2015.

arXiv:1410.6108 [pdf, other]

Discovering the structure of complex networks by minimizing cyclic bandwidth sum

Authors: Ronan Hamon, Pierre Borgnat, Patrick Flandrin, Céline Robardet

Abstract: Getting a labeling of vertices close to the structure of the graph has been proved to be of interest in many applications e.g., to follow smooth signals indexed by the vertices of the network. This question can be related to a graph labeling problem known as the cyclic bandwidth sum problem. It consists in finding a labeling of the vertices of an undirected and unweighted graph with distinct integ… ▽ More Getting a labeling of vertices close to the structure of the graph has been proved to be of interest in many applications e.g., to follow smooth signals indexed by the vertices of the network. This question can be related to a graph labeling problem known as the cyclic bandwidth sum problem. It consists in finding a labeling of the vertices of an undirected and unweighted graph with distinct integers such that the sum of (cyclic) difference of labels of adjacent vertices is minimized. Although theoretical results exist that give optimal value of cyclic bandwidth sum for standard graphs, there are neither results in the general case, nor explicit methods to reach this optimal result. In addition to this lack of theoretical knowledge, only a few methods have been proposed to approximately solve this problem. In this paper, we introduce a new heuristic to find an approximate solution for the cyclic bandwidth sum problem, by following the structure of the graph. The heuristic is a two-step algorithm: the first step consists of traversing the graph to find a set of paths which follow the structure of the graph, using a similarity criterion based on the Jaccard index to jump from one vertex to the next one. The second step is the merging of all obtained paths, based on a greedy approach that extends a partial solution by inserting a new path at the position that minimizes the cyclic bandwidth sum. The effectiveness of the proposed heuristic, both in terms of performance and time execution, is shown through experiments on graphs whose optimal value of CBS is known as well as on real-world networks, where the consistence between labeling and topology is highlighted. An extension to weighted graphs is also proposed. △ Less

Submitted 16 February, 2015; v1 submitted 22 October, 2014; originally announced October 2014.

arXiv:1409.0990 [pdf, other]

doi 10.1140/epja/i2015-15121-1

Sensitivity of predictions in an effective model -- application to the chiral critical end point position in the Nambu--Jona-Lasinio model

Authors: Alexandre Biguet, Hubert Hansen, Pedro Costa, Pierre Borgnat, Timothée Brugière

Abstract: The measurement of the position of the chiral critical end point (CEP) in the QCD phase diagram is under debate. While it is possible to predict its position by using effective models specifically built to reproduce some of the features of the underlying theory (QCD), the quality of the predictions (\textit{e.g.}, the CEP position) obtained by such effective models, depends on whether solving the… ▽ More The measurement of the position of the chiral critical end point (CEP) in the QCD phase diagram is under debate. While it is possible to predict its position by using effective models specifically built to reproduce some of the features of the underlying theory (QCD), the quality of the predictions (\textit{e.g.}, the CEP position) obtained by such effective models, depends on whether solving the model equations constitute a well- or ill-posed inverse problem. Considering these predictions as being inverse problems provides tools to evaluate if the problem is ill-conditioned, meaning that infinitesimal variations of the inputs of the model can cause comparatively large variations of the predictions. If it is ill-conditioned, it has major consequences because of finite variations that could come from experimental and/or theoretical errors. In the following, we shall apply such a reasoning on the predictions of a particular Nambu--Jona-Lasinio model within the mean field + ring approximations, with special attention to the prediction of the chiral CEP position in the $(T-μ)$ plane. We find that the problem is ill-conditioned (\idest very sensitive to input variations) for the $T$-coordinate of the CEP, whereas, it is well-posed for the $μ$-coordinate of the CEP. As a consequence, when the chiral condensate varies in a $10$ MeV range, $μ_{CEP}$ varies far less. As an illustration to understand how problematic this could be, we show that the main consequence when taking into account finite variation of the inputs, is that the existence of the CEP itself cannot be predicted anymore: for a deviation as low as 0.6 \% with respect to vacuum phenomenology (well within the estimation of the first correction to the ring approximation) the CEP may or may not exist. △ Less

Submitted 14 September, 2015; v1 submitted 3 September, 2014; originally announced September 2014.

Comments: Improved version: slightly modified title, the introduction of the sensitivity was improved; reformulation of the sections; v2 with 20 pages and only 9 figures. Version accepted for publication in EPJA

Journal ref: Eur.Phys.J. A51 (2015) 9, 121

arXiv:1404.7680 [pdf, other]

doi 10.1109/TIP.2014.2363000

2-D Prony-Huang Transform: A New Tool for 2-D Spectral Analysis

Authors: Jérémy Schmitt, Nelly Pustelnik, Pierre Borgnat, Patrick Flandrin, Laurent Condat

Abstract: This work proposes an extension of the 1-D Hilbert Huang transform for the analysis of images. The proposed method consists in (i) adaptively decomposing an image into oscillating parts called intrinsic mode functions (IMFs) using a mode decomposition procedure, and (ii) providing a local spectral analysis of the obtained IMFs in order to get the local amplitudes, frequencies, and orientations. Fo… ▽ More This work proposes an extension of the 1-D Hilbert Huang transform for the analysis of images. The proposed method consists in (i) adaptively decomposing an image into oscillating parts called intrinsic mode functions (IMFs) using a mode decomposition procedure, and (ii) providing a local spectral analysis of the obtained IMFs in order to get the local amplitudes, frequencies, and orientations. For the decomposition step, we propose two robust 2-D mode decompositions based on non-smooth convex optimization: a "Genuine 2-D" approach, that constrains the local extrema of the IMFs, and a "Pseudo 2-D" approach, which constrains separately the extrema of lines, columns, and diagonals. The spectral analysis step is based on Prony annihilation property that is applied on small square patches of the IMFs. The resulting 2-D Prony-Huang transform is validated on simulated and real data. △ Less

Submitted 30 April, 2014; originally announced April 2014.

Comments: 24 pages, 7 figures

arXiv:1212.3524 [pdf, other]

doi 10.1103/PhysRevE.88.052812

Bootstrap** under constraint for the assessment of group behavior in human contact networks

Authors: Nicolas Tremblay, Alain Barrat, Cary Forest, Mark Nornberg, Jean-François Pinton, Pierre Borgnat

Abstract: The increasing availability of time --and space-- resolved data describing human activities and interactions gives insights into both static and dynamic properties of human behavior. In practice, nevertheless, real-world datasets can often be considered as only one realisation of a particular event. This highlights a key issue in social network analysis: the statistical significance of estimated p… ▽ More The increasing availability of time --and space-- resolved data describing human activities and interactions gives insights into both static and dynamic properties of human behavior. In practice, nevertheless, real-world datasets can often be considered as only one realisation of a particular event. This highlights a key issue in social network analysis: the statistical significance of estimated properties. In this context, we focus here on the assessment of quantitative features of specific subset of nodes in empirical networks. We present a method of statistical resampling based on bootstrap** groups of nodes under constraints within the empirical network. The method enables us to define acceptance intervals for various Null Hypotheses concerning relevant properties of the subset of nodes under consideration, in order to characterize by a statistical test its behavior as ``normal'' or not. We apply this method to a high resolution dataset describing the face-to-face proximity of individuals during two co-located scientific conferences. As a case study, we show how to probe whether co-locating the two conferences succeeded in bringing together the two corresponding groups of scientists. △ Less

Submitted 8 November, 2013; v1 submitted 14 December, 2012; originally announced December 2012.

Journal ref: Phys. Rev. E 88, 052812 (2013)

arXiv:1212.0689 [pdf, other]

Multiscale Community Mining in Networks Using Spectral Graph Wavelets

Authors: Nicolas Tremblay, Pierre Borgnat

Abstract: For data represented by networks, the community structure of the underlying graph is of great interest. A classical clustering problem is to uncover the overall ``best'' partition of nodes in communities. Here, a more elaborate description is proposed in which community structures are identified at different scales. To this end, we take advantage of the local and scale-dependent information encode… ▽ More For data represented by networks, the community structure of the underlying graph is of great interest. A classical clustering problem is to uncover the overall ``best'' partition of nodes in communities. Here, a more elaborate description is proposed in which community structures are identified at different scales. To this end, we take advantage of the local and scale-dependent information encoded in graph wavelets. After new developments for the practical use of graph wavelets, studying proper scale boundaries and parameters and introducing scaling functions, we propose a method to mine for communities in complex networks in a scale-dependent manner. It relies on classifying nodes according to their wavelets or scaling functions, using a scale-dependent modularity function. An example on a graph benchmark having hierarchical communities shows that we estimate successfully its multiscale structure. △ Less

Submitted 8 November, 2013; v1 submitted 4 December, 2012; originally announced December 2012.

Comments: Proceedings of the European Signal Processing Conference (EUSIPCO 2013)

Showing 1–30 of 30 results for author: Borgnat, P