-
Spikyball sampling: Exploring large networks via an inhomogeneous filtered diffusion
Authors:
Benjamin Ricaud,
Nicolas Aspert,
Volodymyr Miz
Abstract:
Studying real-world networks such as social networks or web networks is a challenge. These networks often combine a complex, highly connected structure together with a large size. We propose a new approach for large scale networks that is able to automatically sample user-defined relevant parts of a network. Starting from a few selected places in the network and a reduced set of expansion rules, t…
▽ More
Studying real-world networks such as social networks or web networks is a challenge. These networks often combine a complex, highly connected structure together with a large size. We propose a new approach for large scale networks that is able to automatically sample user-defined relevant parts of a network. Starting from a few selected places in the network and a reduced set of expansion rules, the method adopts a filtered breadth-first search approach, that expands through edges and nodes matching these properties. Moreover, the expansion is performed over a random subset of neighbors at each step to mitigate further the overwhelming number of connections that may exist in large graphs. This carries the image of a "spiky" expansion. We show that this approach generalize previous exploration sampling methods, such as Snowball or Forest Fire and extend them. We demonstrate its ability to capture groups of nodes with high interactions while discarding weakly connected nodes that are often numerous in social networks and may hide important structures.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Fast accuracy estimation of deep learning based multi-class musical source separation
Authors:
Alexandru Mocanu,
Benjamin Ricaud,
Milos Cernak
Abstract:
Music source separation represents the task of extracting all the instruments from a given song. Recent breakthroughs on this challenge have gravitated around a single dataset, MUSDB, only limited to four instrument classes. Larger datasets and more instruments are costly and time-consuming in collecting data and training deep neural networks (DNNs). In this work, we propose a fast method to evalu…
▽ More
Music source separation represents the task of extracting all the instruments from a given song. Recent breakthroughs on this challenge have gravitated around a single dataset, MUSDB, only limited to four instrument classes. Larger datasets and more instruments are costly and time-consuming in collecting data and training deep neural networks (DNNs). In this work, we propose a fast method to evaluate the separability of instruments in any dataset without training and tuning a DNN. This separability measure helps to select appropriate samples for the efficient training of neural networks. Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches such as TasNet or Open-Unmix. Our results contribute to revealing two essential points for audio source separation: 1) the ideal ratio mask, although light and straightforward, provides an accurate measure of the audio separability performance of recent neural nets, and 2) new end-to-end learning methods such as Tasnet, that operate directly on waveforms, are, in fact, internally building a Time-Frequency (TF) representation, so that they encounter the same limitations as the TF based-methods when separating audio pattern overlap** in the TF plane.
△ Less
Submitted 1 December, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
What is Trending on Wikipedia? Capturing Trends and Language Biases Across Wikipedia Editions
Authors:
Volodymyr Miz,
Joëlle Hanna,
Nicolas Aspert,
Benjamin Ricaud,
Pierre Vandergheynst
Abstract:
In this work, we propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. As an example, we focus on English, French, and Russian languages during the last four months of 2018. The proposed method has three steps. Firstly, it extracts the most trending articles over a chosen period of time. Secondly, it…
▽ More
In this work, we propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. As an example, we focus on English, French, and Russian languages during the last four months of 2018. The proposed method has three steps. Firstly, it extracts the most trending articles over a chosen period of time. Secondly, it performs a semi-supervised topic extraction and thirdly, it compares topics across languages. The automated processing works with the data that combines Wikipedia's graph of hyperlinks, pageview statistics and summaries of the pages.
The results show that people share a common interest and curiosity for entertainment, e.g. movies, music, sports independently of their language. Differences appear in topics related to local events or about cultural particularities. Interactive visualizations showing clusters of trending pages in each language edition are available online https://wiki-insights.epfl.ch/wikitrends
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
A Graph-structured Dataset for Wikipedia Research
Authors:
Nicolas Aspert,
Volodymyr Miz,
Benjamin Ricaud,
Pierre Vandergheynst
Abstract:
Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-pr…
▽ More
Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale.
In this work, we propose an efficient data structure to make requests and access subnetworks of Wikipedia pages and categories. We provide convenient tools for accessing and filtering viewership statistics or "pagecounts" of Wikipedia web pages. The dataset organization leverages principles of graph databases that allows rapid and intuitive access to subgraphs of Wikipedia articles and categories. The dataset and deployment guidelines are available on the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}.
△ Less
Submitted 20 March, 2019;
originally announced March 2019.
-
Anomaly detection in the dynamics of web and social networks
Authors:
Volodymyr Miz,
Benjamin Ricaud,
Kirell Benzi,
Pierre Vandergheynst
Abstract:
In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalou…
▽ More
In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalous activity in a dynamic network despite the noise from multiple interfering sources. We use the Hopfield network model of memory to combine the graph and time information. We show that anomalies can be spotted with a good precision using a memory network. The presented approach is scalable and we provide a distributed implementation of the algorithm. To demonstrate its efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information, opening the door to new quantitative studies on collective and social behavior on large and dynamic datasets.
△ Less
Submitted 22 January, 2019;
originally announced January 2019.
-
Wikipedia graph mining: dynamic structure of collective memory
Authors:
Volodymyr Miz,
Kirell Benzi,
Benjamin Ricaud,
Pierre Vandergheynst
Abstract:
Wikipedia is the biggest encyclopedia ever created and the fifth most visited website in the world. Tens of millions of people surf it every day, seeking answers to various questions. Collective user activity on its pages leaves publicly available footprints of human behavior, making Wikipedia an excellent source for analysis of collective behavior. In this work, we propose a distributed graph-bas…
▽ More
Wikipedia is the biggest encyclopedia ever created and the fifth most visited website in the world. Tens of millions of people surf it every day, seeking answers to various questions. Collective user activity on its pages leaves publicly available footprints of human behavior, making Wikipedia an excellent source for analysis of collective behavior. In this work, we propose a distributed graph-based event extraction model, inspired by the Hebbian learning theory. The model exploits collective effect of the dynamics to discover events. We focus on data-streams with underlying graph structure and perform several large-scale experiments on the Wikipedia visitor activity data. We show that the presented model is scalable regarding time-series length and graph density, providing a distributed implementation of the proposed algorithm. We extract dynamical patterns of collective activity and demonstrate that they correspond to meaningful clusters of associated events, reflected in the Wikipedia articles. We also illustrate evolutionary dynamics of the graphs over time to highlight changing nature of visitors' interests. Finally, we discuss clusters of events that model collective recall process and represent collective memories - common memories shared by a group of people.
△ Less
Submitted 14 February, 2018; v1 submitted 1 October, 2017;
originally announced October 2017.
-
A Time-Vertex Signal Processing Framework
Authors:
Francesco Grassi,
Andreas Loukas,
Nathanaël Perraudin,
Benjamin Ricaud
Abstract:
An emerging way to deal with high-dimensional non-euclidean data is to assume that the underlying structure can be captured by a graph. Recently, ideas have begun to emerge related to the analysis of time-varying graph signals. This work aims to elevate the notion of joint harmonic analysis to a full-fledged framework denoted as Time-Vertex Signal Processing, that links together the time-domain si…
▽ More
An emerging way to deal with high-dimensional non-euclidean data is to assume that the underlying structure can be captured by a graph. Recently, ideas have begun to emerge related to the analysis of time-varying graph signals. This work aims to elevate the notion of joint harmonic analysis to a full-fledged framework denoted as Time-Vertex Signal Processing, that links together the time-domain signal processing techniques with the new tools of graph signal processing. This entails three main contributions: (a) We provide a formal motivation for harmonic time-vertex analysis as an analysis tool for the state evolution of simple Partial Differential Equations on graphs. (b) We improve the accuracy of joint filtering operators by up-to two orders of magnitude. (c) Using our joint filters, we construct time-vertex dictionaries analyzing the different scales and the local time-frequency content of a signal. The utility of our tools is illustrated in numerous applications and datasets, such as dynamic mesh denoising and classification, still-video inpainting, and source localization in seismic events. Our results suggest that joint analysis of time-vertex signals can bring benefits to regression and learning.
△ Less
Submitted 5 May, 2017;
originally announced May 2017.
-
Tracking Time-Vertex Propagation using Dynamic Graph Wavelets
Authors:
Francesco Grassi,
Nathanael Perraudin,
Benjamin Ricaud
Abstract:
Graph Signal Processing generalizes classical signal processing to signal or data indexed by the vertices of a weighted graph. So far, the research efforts have been focused on static graph signals. However numerous applications involve graph signals evolving in time, such as spreading or propagation of waves on a network. The analysis of this type of data requires a new set of methods that fully…
▽ More
Graph Signal Processing generalizes classical signal processing to signal or data indexed by the vertices of a weighted graph. So far, the research efforts have been focused on static graph signals. However numerous applications involve graph signals evolving in time, such as spreading or propagation of waves on a network. The analysis of this type of data requires a new set of methods that fully takes into account the time and graph dimensions. We propose a novel class of wavelet frames named Dynamic Graph Wavelets, whose time-vertex evolution follows a dynamic process. We demonstrate that this set of functions can be combined with sparsity based approaches such as compressive sensing to reveal information on the dynamic processes occurring on a graph. Experiments on real seismological data show the efficiency of the technique, allowing to estimate the epicenter of earthquake events recorded by a seismic network.
△ Less
Submitted 21 June, 2016;
originally announced June 2016.
-
Principal Patterns on Graphs: Discovering Coherent Structures in Datasets
Authors:
Kirell Benzi,
Benjamin Ricaud,
Pierre Vandergheynst
Abstract:
Graphs are now ubiquitous in almost every field of research. Recently, new research areas devoted to the analysis of graphs and data associated to their vertices have emerged. Focusing on dynamical processes, we propose a fast, robust and scalable framework for retrieving and analyzing recurring patterns of activity on graphs. Our method relies on a novel type of multilayer graph that encodes the…
▽ More
Graphs are now ubiquitous in almost every field of research. Recently, new research areas devoted to the analysis of graphs and data associated to their vertices have emerged. Focusing on dynamical processes, we propose a fast, robust and scalable framework for retrieving and analyzing recurring patterns of activity on graphs. Our method relies on a novel type of multilayer graph that encodes the spreading or propagation of events between successive time steps. We demonstrate the versatility of our method by applying it on three different real-world examples. Firstly, we study how rumor spreads on a social network. Secondly, we reveal congestion patterns of pedestrians in a train station. Finally, we show how patterns of audio playlists can be used in a recommender system. In each example, relevant information previously hidden in the data is extracted in a very efficient manner, emphasizing the scalability of our method. With a parallel implementation scaling linearly with the size of the dataset, our framework easily handles millions of nodes on a single commodity server.
△ Less
Submitted 1 February, 2016; v1 submitted 30 April, 2015;
originally announced April 2015.
-
Optimal Window and Lattice in Gabor Transform Application to Audio Analysis
Authors:
Helene Lachambre,
Benjamin Ricaud,
Guillaume Stempfel,
Bruno Torresani,
Christoph Wiesmeyr,
Darian M. Onchis
Abstract:
This article deals with the use of optimal lattice and optimal window in Discrete Gabor Transform computation. In the case of a generalized Gaussian window, extending earlier contributions, we introduce an additional local window adaptation technique for non-stationary signals. We illustrate our approach and the earlier one by addressing three time-frequency analysis problems to show the improveme…
▽ More
This article deals with the use of optimal lattice and optimal window in Discrete Gabor Transform computation. In the case of a generalized Gaussian window, extending earlier contributions, we introduce an additional local window adaptation technique for non-stationary signals. We illustrate our approach and the earlier one by addressing three time-frequency analysis problems to show the improvements achieved by the use of optimal lattice and window: close frequencies distinction, frequency estimation and SNR estimation. The results are presented, when possible, with real world audio signals.
△ Less
Submitted 18 December, 2014; v1 submitted 10 March, 2014;
originally announced March 2014.
-
Vertex-Frequency Analysis on Graphs
Authors:
David I Shuman,
Benjamin Ricaud,
Pierre Vandergheynst
Abstract:
One of the key challenges in the area of signal processing on graphs is to design dictionaries and transform methods to identify and exploit structure in signals on weighted graphs. To do so, we need to account for the intrinsic geometric structure of the underlying graph data domain. In this paper, we generalize one of the most important signal processing tools - windowed Fourier analysis - to th…
▽ More
One of the key challenges in the area of signal processing on graphs is to design dictionaries and transform methods to identify and exploit structure in signals on weighted graphs. To do so, we need to account for the intrinsic geometric structure of the underlying graph data domain. In this paper, we generalize one of the most important signal processing tools - windowed Fourier analysis - to the graph setting. Our approach is to first define generalized convolution, translation, and modulation operators for signals on graphs, and explore related properties such as the localization of translated and modulated graph kernels. We then use these operators to define a windowed graph Fourier transform, enabling vertex-frequency analysis. When we apply this transform to a signal with frequency components that vary along a path graph, the resulting spectrogram matches our intuition from classical discrete-time signal processing. Yet, our construction is fully generalized and can be applied to analyze signals on any undirected, connected, weighted graph.
△ Less
Submitted 22 July, 2013;
originally announced July 2013.
-
A survey of uncertainty principles and some signal processing applications
Authors:
Benjamin Ricaud,
Bruno Torresani
Abstract:
The goal of this paper is to review the main trends in the domain of uncertainty principles and localization, emphasize their mutual connections and investigate practical consequences. The discussion is strongly oriented towards, and motivated by signal processing problems, from which significant advances have been made recently. Relations with sparse approximation and coding problems are emphasiz…
▽ More
The goal of this paper is to review the main trends in the domain of uncertainty principles and localization, emphasize their mutual connections and investigate practical consequences. The discussion is strongly oriented towards, and motivated by signal processing problems, from which significant advances have been made recently. Relations with sparse approximation and coding problems are emphasized.
△ Less
Submitted 20 September, 2013; v1 submitted 26 November, 2012;
originally announced November 2012.
-
Refined support and entropic uncertainty inequalities
Authors:
Benjamin Ricaud,
Bruno Torrésani
Abstract:
Generalized versions of the entropic (Hirschman-Beckner) and support (Elad-Bruckstein) uncertainty principle are presented for frames representations. Moreover, a sharpened version of the support inequality has been obtained by introducing a generalization of the coherence. In the finite dimensional case and under certain conditions, minimizers of this inequalities are given as constant functions…
▽ More
Generalized versions of the entropic (Hirschman-Beckner) and support (Elad-Bruckstein) uncertainty principle are presented for frames representations. Moreover, a sharpened version of the support inequality has been obtained by introducing a generalization of the coherence. In the finite dimensional case and under certain conditions, minimizers of this inequalities are given as constant functions on their support. In addition, $\ell^p$-norms inequalities are introduced as byproducts of the entropic inequalities.
△ Less
Submitted 29 October, 2012;
originally announced October 2012.