-
Learning Decision Policies with Instrumental Variables through Double Machine Learning
Authors:
Daqian Shao,
Ashkan Soleymani,
Francesco Quinzan,
Marta Kwiatkowska
Abstract:
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recen…
▽ More
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recent IV regression algorithms use a two-stage approach, where a deep neural network (DNN) estimator learnt in the first stage is directly plugged into the second stage, in which another DNN is used to estimate the causal effect. Naively plugging the estimator can cause heavy bias in the second stage, especially when regularisation bias is present in the first stage estimator. We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions and effectively learns high-performing policies. We derive a novel learning objective to reduce bias and design the DML-IV algorithm following the double/debiased machine learning (DML) framework. The learnt DML-IV estimator has strong convergence rate and $O(N^{-1/2})$ suboptimality guarantees that match those when the dataset is unconfounded. DML-IV outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
△ Less
Submitted 28 June, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Doubly Robust Structure Identification from Temporal Data
Authors:
Emmanouil Angelis,
Francesco Quinzan,
Ashkan Soleymani,
Patrick Jaillet,
Stefan Bauer
Abstract:
Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantia…
▽ More
Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantially biased. Furthermore, potential causes may be correlated in practical applications. Moreover, existing algorithms often do not work with cyclic data. To address these challenges, we propose a new doubly robust method for Structure Identification from Temporal Data ( SITD ). We provide theoretical guarantees, showing that our method asymptotically recovers the true underlying causal structure. Our analysis extends to cases where the potential causes have cycles and they may be confounded. We further perform extensive experiments to showcase the superior performance of our method.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Diffusion Based Causal Representation Learning
Authors:
Amir Mohammad Karimi Mamaghan,
Andrea Dittadi,
Stefan Bauer,
Karl Henrik Johansson,
Francesco Quinzan
Abstract:
Causal reasoning can be considered a cornerstone of intelligent systems. Having access to an underlying causal graph comes with the promise of cause-effect estimation and the identification of efficient and safe interventions. However, learning causal representations remains a major challenge, due to the complexity of many real-world systems. Previous works on causal representation learning have m…
▽ More
Causal reasoning can be considered a cornerstone of intelligent systems. Having access to an underlying causal graph comes with the promise of cause-effect estimation and the identification of efficient and safe interventions. However, learning causal representations remains a major challenge, due to the complexity of many real-world systems. Previous works on causal representation learning have mostly focused on Variational Auto-Encoders (VAE). These methods only provide representations from a point estimate, and they are unsuitable to handle high dimensions. To overcome these problems, we proposed a new Diffusion-based Causal Representation Learning (DCRL) algorithm. This algorithm uses diffusion-based representations for causal discovery. DCRL offers access to infinite dimensional latent codes, which encode different levels of information in the latent code. In a first proof of principle, we investigate the use of DCRL for causal representation learning. We further demonstrate experimentally that this approach performs comparably well in identifying the causal structure and causal variables.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
DRCFS: Doubly Robust Causal Feature Selection
Authors:
Francesco Quinzan,
Ashkan Soleymani,
Patrick Jaillet,
Cristian R. Rojas,
Stefan Bauer
Abstract:
Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the caus…
▽ More
Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems.
△ Less
Submitted 5 July, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Optimal Transport for Correctional Learning
Authors:
Rebecka Winqvist,
Inês Lourenco,
Francesco Quinzan,
Cristian R. Rojas,
Bo Wahlberg
Abstract:
The contribution of this paper is a generalized formulation of correctional learning using optimal transport, which is about how to optimally transport one mass distribution to another. Correctional learning is a framework developed to enhance the accuracy of parameter estimation processes by means of a teacher-student approach. In this framework, an expert agent, referred to as the teacher, modif…
▽ More
The contribution of this paper is a generalized formulation of correctional learning using optimal transport, which is about how to optimally transport one mass distribution to another. Correctional learning is a framework developed to enhance the accuracy of parameter estimation processes by means of a teacher-student approach. In this framework, an expert agent, referred to as the teacher, modifies the data used by a learning agent, known as the student, to improve its estimation process. The objective of the teacher is to alter the data such that the student's estimation error is minimized, subject to a fixed intervention budget. Compared to existing formulations of correctional learning, our novel optimal transport approach provides several benefits. It allows for the estimation of more complex characteristics as well as the consideration of multiple intervention policies for the teacher. We evaluate our approach on two theoretical examples, and on a human-robot interaction application in which the teacher's role is to improve the robots performance in an inverse reinforcement learning setting.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Learning Counterfactually Invariant Predictors
Authors:
Francesco Quinzan,
Cecilia Casolo,
Krikamol Muandet,
Yucen Luo,
Niki Kilbertus
Abstract:
Notions of counterfactual invariance (CI) have proven essential for predictors that are fair, robust, and generalizable in the real world. We propose graphical criteria that yield a sufficient condition for a predictor to be counterfactually invariant in terms of a conditional independence in the observational distribution. In order to learn such predictors, we propose a model-agnostic framework,…
▽ More
Notions of counterfactual invariance (CI) have proven essential for predictors that are fair, robust, and generalizable in the real world. We propose graphical criteria that yield a sufficient condition for a predictor to be counterfactually invariant in terms of a conditional independence in the observational distribution. In order to learn such predictors, we propose a model-agnostic framework, called Counterfactually Invariant Prediction (CIP), building on the Hilbert-Schmidt Conditional Independence Criterion (HSCIC), a kernel-based conditional dependence measure. Our experimental results demonstrate the effectiveness of CIP in enforcing counterfactual invariance across various simulated and real-world datasets including scalar and multi-variate settings.
△ Less
Submitted 13 October, 2023; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Fast Feature Selection with Fairness Constraints
Authors:
Francesco Quinzan,
Rajiv Khanna,
Moshik Hershcovitch,
Sarel Cohen,
Daniel G. Waddington,
Tobias Friedrich,
Michael W. Mahoney
Abstract:
We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-…
▽ More
We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions. The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work. Furthermore, our extension allows the use of downward-closed constraints, which can be used to encode certain fairness criteria into the feature selection process. We prove strong approximation guarantees for the algorithm based on standard assumptions. These guarantees are applicable to many parametric models, including Generalized Linear Models. Finally, we demonstrate empirically that the proposed algorithm competes favorably with state-of-the-art techniques for feature selection, on real-world and synthetic datasets.
△ Less
Submitted 3 February, 2023; v1 submitted 28 February, 2022;
originally announced February 2022.
-
Adaptive Sampling for Fast Constrained Maximization of Submodular Function
Authors:
Francesco Quinzan,
Vanja Doskoč,
Andreas Göbel,
Tobias Friedrich
Abstract:
Several large-scale machine learning tasks, such as data summarization, can be approached by maximizing functions that satisfy submodularity. These optimization problems often involve complex side constraints, imposed by the underlying application. In this paper, we develop an algorithm with poly-logarithmic adaptivity for non-monotone submodular maximization under general side constraints. The ad…
▽ More
Several large-scale machine learning tasks, such as data summarization, can be approached by maximizing functions that satisfy submodularity. These optimization problems often involve complex side constraints, imposed by the underlying application. In this paper, we develop an algorithm with poly-logarithmic adaptivity for non-monotone submodular maximization under general side constraints. The adaptive complexity of a problem is the minimal number of sequential rounds required to achieve the objective.
Our algorithm is suitable to maximize a non-monotone submodular function under a $p$-system side constraint, and it achieves a $(p + O(\sqrt{p}))$-approximation for this problem, after only poly-logarithmic adaptive rounds and polynomial queries to the valuation oracle function. Furthermore, our algorithm achieves a $(p + O(1))$-approximation when the given side constraint is a $p$-extendible system.
This algorithm yields an exponential speed-up, with respect to the adaptivity, over any other known constant-factor approximation algorithm for this problem. It also competes with previous known results in terms of the query complexity. We perform various experiments on various real-world applications. We find that, in comparison with commonly used heuristics, our algorithm performs better on these instances.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
Non-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings
Authors:
Vanja Doskoč,
Tobias Friedrich,
Andreas Göbel,
Frank Neumann,
Aneta Neumann,
Francesco Quinzan
Abstract:
We study the problem of maximizing a non-monotone submodular function under multiple knapsack constraints. We propose a simple discrete greedy algorithm to approach this problem, and prove that it yields strong approximation guarantees for functions with bounded curvature. In contrast to other heuristics, this requires no problem relaxation to continuous domains and it maintains a constant-factor…
▽ More
We study the problem of maximizing a non-monotone submodular function under multiple knapsack constraints. We propose a simple discrete greedy algorithm to approach this problem, and prove that it yields strong approximation guarantees for functions with bounded curvature. In contrast to other heuristics, this requires no problem relaxation to continuous domains and it maintains a constant-factor approximation guarantee in the problem size. In the case of a single knapsack, our analysis suggests that the standard greedy can be used in non-monotone settings.
Additionally, we study this problem in a dynamic setting, by which knapsacks change during the optimization process. We modify our greedy algorithm to avoid a complete restart at each constraint update. This modification retains the approximation guarantees of the static case.
We evaluate our results experimentally on a video summarization and sensor placement task. We show that our proposed algorithm competes with the state-of-the-art in static settings. Furthermore, we show that in dynamic settings with tight computational time budget, our modified greedy yields significant improvements over starting the greedy from scratch, in terms of the solution quality achieved.
△ Less
Submitted 18 February, 2020; v1 submitted 15 November, 2019;
originally announced November 2019.
-
Greedy Maximization of Functions with Bounded Curvature under Partition Matroid Constraints
Authors:
Tobias Friedrich,
Andreas Göbel,
Frank Neumann,
Francesco Quinzan,
Ralf Rothenberger
Abstract:
We investigate the performance of a deterministic GREEDY algorithm for the problem of maximizing functions under a partition matroid constraint. We consider non-monotone submodular functions and monotone subadditive functions. Even though constrained maximization problems of monotone submodular functions have been extensively studied, little is known about greedy maximization of non-monotone submo…
▽ More
We investigate the performance of a deterministic GREEDY algorithm for the problem of maximizing functions under a partition matroid constraint. We consider non-monotone submodular functions and monotone subadditive functions. Even though constrained maximization problems of monotone submodular functions have been extensively studied, little is known about greedy maximization of non-monotone submodular functions or monotone subadditive functions.
We give approximation guarantees for GREEDY on these problems, in terms of the curvature. We find that this simple heuristic yields a strong approximation guarantee on a broad class of functions.
We discuss the applicability of our results to three real-world problems: Maximizing the determinant function of a positive semidefinite matrix, and related problems such as the maximum entropy sampling problem, the constrained maximum cut problem on directed graphs, and combinatorial auction games.
We conclude that GREEDY is well-suited to approach these problems. Overall, we present evidence to support the idea that, when dealing with constrained maximization problems with bounded curvature, one needs not search for approximate) monotonicity to get good approximate solutions.
△ Less
Submitted 20 February, 2019; v1 submitted 13 November, 2018;
originally announced November 2018.
-
Evolutionary Algorithms and Submodular Functions: Benefits of Heavy-Tailed Mutations
Authors:
Tobias Friedrich,
Andreas Göbel,
Francesco Quinzan,
Markus Wagner
Abstract:
A core feature of evolutionary algorithms is their mutation operator. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this line of work, we propose a new mutation operator and analyze its performance on the (1+1) Evolutionary Algorithm (EA).
Our analyses show that this mutation operator competes with pre-ex…
▽ More
A core feature of evolutionary algorithms is their mutation operator. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this line of work, we propose a new mutation operator and analyze its performance on the (1+1) Evolutionary Algorithm (EA).
Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1+1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1+1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1+1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability.
Finally, we evaluate the performance of the (1+1)EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, and with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme our operator comes out on top on these instances.
△ Less
Submitted 21 November, 2018; v1 submitted 28 May, 2018;
originally announced May 2018.
-
Approximating Optimization Problems using EAs on Scale-Free Networks
Authors:
Ankit Chauhan,
Tobias Friedrich,
Francesco Quinzan
Abstract:
It has been observed that many complex real-world networks have certain properties, such as a high clustering coefficient, a low diameter, and a power-law degree distribution. A network with a power-law degree distribution is known as scale-free network. In order to study these networks, various random graph models have been proposed, e.g. Preferential Attachment, Chung-Lu, or Hyperbolic.
We loo…
▽ More
It has been observed that many complex real-world networks have certain properties, such as a high clustering coefficient, a low diameter, and a power-law degree distribution. A network with a power-law degree distribution is known as scale-free network. In order to study these networks, various random graph models have been proposed, e.g. Preferential Attachment, Chung-Lu, or Hyperbolic.
We look at the interplay between the power-law degree distribution and the run time of optimization techniques for well known combinatorial problems. We observe that on scale-free networks, simple evolutionary algorithms (EAs) quickly reach a constant-factor approximation ratio on common covering problems
We prove that the single-objective (1+1)EA reaches a constant-factor approximation ratio on the Minimum Dominating Set problem, the Minimum Vertex Cover problem, the Minimum Connected Dominating Set problem, and the Maximum Independent Set problem in expected polynomial number of calls to the fitness function.
Furthermore, we prove that the multi-objective GSEMO algorithm reaches a better approximation ratio than the (1+1)EA on those problems, within polynomial fitness evaluations.
△ Less
Submitted 26 November, 2018; v1 submitted 12 April, 2017;
originally announced April 2017.