-
Query-Efficient Correlation Clustering with Noisy Oracle
Authors:
Yuko Kuroki,
Atsushi Miyauchi,
Francesco Bonchi,
Wei Chen
Abstract:
We study a general clustering setting in which we have $n$ elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We propose two novel formulations of online learning problem…
▽ More
We study a general clustering setting in which we have $n$ elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We propose two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB): fixed confidence and fixed budget settings. For both settings, we design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation clustering and study their theoretical guarantees. Our results are the first examples of polynomial-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Learning Multi-Frequency Partial Correlation Graphs
Authors:
Gabriele D'Acunto,
Paolo Di Lorenzo,
Francesco Bonchi,
Stefania Sardellitti,
Sergio Barbarossa
Abstract:
Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial…
▽ More
Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial correlation graph, in which layers correspond to different frequency bands, and partial correlations can occur over just a few layers. To this aim, we formulate and solve two nonconvex learning problems: the first has a closed-form solution and is suitable when there is prior knowledge about the number of partial correlations; the second hinges on an iterative solution based on successive convex approximation, and is effective for the general case where no prior knowledge is available. Numerical results on synthetic data show that the proposed methods outperform the current state of the art. Finally, the analysis of financial time series confirms that partial correlations exist only within a few frequency bands, underscoring how our methods enable the gaining of valuable insights that would be undetected without discriminating along the frequency domain.
△ Less
Submitted 12 May, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Extracting the Multiscale Causal Backbone of Brain Dynamics
Authors:
Gabriele D'Acunto,
Francesco Bonchi,
Gianmarco De Francisci Morales,
Giovanni Petri
Abstract:
The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it.
Our approach le…
▽ More
The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it.
Our approach leverages recent advances in multiscale causal structure learning and optimizes the trade-off between the model fit and its complexity. Empirical assessment on synthetic data shows the superiority of our methodology over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. Thanks to its multiscale nature, our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting the existing extensive research in brain connectivity fingerprinting from a causal perspective.
△ Less
Submitted 19 March, 2024; v1 submitted 31 October, 2023;
originally announced November 2023.
-
Learning Multiscale Non-stationary Causal Structures
Authors:
Gabriele D'Acunto,
Gianmarco De Francisci Morales,
Paolo Bajardi,
Francesco Bonchi
Abstract:
This paper addresses a gap in the current state of the art by providing a solution for modeling causal relationships that evolve over time and occur at different time scales. Specifically, we introduce the multiscale non-stationary directed acyclic graph (MN-DAG), a framework for modeling multivariate time series data. Our contribution is twofold. Firstly, we expose a probabilistic generative mode…
▽ More
This paper addresses a gap in the current state of the art by providing a solution for modeling causal relationships that evolve over time and occur at different time scales. Specifically, we introduce the multiscale non-stationary directed acyclic graph (MN-DAG), a framework for modeling multivariate time series data. Our contribution is twofold. Firstly, we expose a probabilistic generative model by leveraging results from spectral and causality theories. Our model allows sampling an MN-DAG according to user-specified priors on the time-dependence and multiscale properties of the causal graph. Secondly, we devise a Bayesian method named Multiscale Non-stationary Causal Structure Learner (MN-CASTLE) that uses stochastic variational inference to estimate MN-DAGs. The method also exploits information from the local partial correlation between time series over different time resolutions. The data generated from an MN-DAG reproduces well-known features of time series in different domains, such as volatility clustering and serial correlation. Additionally, we show the superior performance of MN-CASTLE on synthetic data with different multiscale and non-stationary properties compared to baseline models. Finally, we apply MN-CASTLE to identify the drivers of the natural gas prices in the US market. Causal relationships have strengthened during the COVID-19 outbreak and the Russian invasion of Ukraine, a fact that baseline methods fail to capture. MN-CASTLE identifies the causal impact of critical economic drivers on natural gas prices, such as seasonal factors, economic uncertainty, oil prices, and gas storage deviations.
△ Less
Submitted 17 November, 2023; v1 submitted 31 August, 2022;
originally announced August 2022.
-
Probabilistic Causal Analysis of Social Influence
Authors:
Francesco Bonchi,
Francesco Gullo,
Bud Mishra,
Daniele Ramazzotti
Abstract:
Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, i.e., homophily and other spurious causes. However, most studies to characterize social influence, and, in general, most data-science analyses focus on correlations, statistical independence, or conditional independence. Only recen…
▽ More
Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, i.e., homophily and other spurious causes. However, most studies to characterize social influence, and, in general, most data-science analyses focus on correlations, statistical independence, or conditional independence. Only recently, there has been a resurgence of interest in "causal data science", e.g., grounded on causality theories. In this paper we adopt a principled causal approach to the analysis of social influence from information-propagation data, rooted in the theory of probabilistic causation.
Our approach consists of two phases. In the first one, in order to avoid the pitfalls of misinterpreting causation when the data spans a mixture of several subtypes ("Simpson's paradox"), we partition the set of propagation traces into groups, in such a way that each group is as less contradictory as possible in terms of the hierarchical structure of information propagation. To achieve this goal, we borrow the notion of "agony" and define the Agony-bounded Partitioning problem, which we prove being hard, and for which we develop two efficient algorithms with approximation guarantees. In the second phase, for each group from the first phase, we apply a constrained MLE approach to ultimately learn a minimal causal topology. Experiments on synthetic data show that our method is able to retrieve the genuine causal arcs w.r.t. a ground-truth generative model. Experiments on real data show that, by focusing only on the extracted causal structures instead of the whole social graph, the effectiveness of predicting influence spread is significantly improved.
△ Less
Submitted 29 August, 2018; v1 submitted 6 August, 2018;
originally announced August 2018.