Search | arXiv e-print repository

FiP: a Fixed-Point Approach for Causal Generative Modeling

Authors: Meyer Scetbon, Joel Jennings, Agrin Hilmkil, Cheng Zhang, Chao Ma

Abstract: Modeling true world data-generating processes lies at the heart of empirical science. Structural Causal Models (SCMs) and their associated Directed Acyclic Graphs (DAGs) provide an increasingly popular answer to such problems by defining the causal generative process that transforms random noise into observations. However, learning them from observational data poses an ill-posed and NP-hard invers… ▽ More Modeling true world data-generating processes lies at the heart of empirical science. Structural Causal Models (SCMs) and their associated Directed Acyclic Graphs (DAGs) provide an increasingly popular answer to such problems by defining the causal generative process that transforms random noise into observations. However, learning them from observational data poses an ill-posed and NP-hard inverse problem in general. In this work, we propose a new and equivalent formalism that does not require DAGs to describe them, viewed as fixed-point problems on the causally ordered variables, and we show three important cases where they can be uniquely recovered given the topological ordering (TO). To the best of our knowledge, we obtain the weakest conditions for their recovery when TO is known. Based on this, we design a two-stage causal generative model that first infers the causal order from observations in a zero-shot manner, thus by-passing the search, and then learns the generative fixed-point SCM on the ordered variables. To infer TOs from observations, we propose to amortize the learning of TOs on generated datasets by sequentially predicting the leaves of graphs seen during training. To learn fixed-point SCMs, we design a transformer-based architecture that exploits a new attention mechanism enabling the modeling of causal structures, and show that this parameterization is consistent with our formalism. Finally, we conduct an extensive evaluation of each method individually, and show that when combined, our model outperforms various baselines on generated out-of-distribution problems. △ Less

Submitted 14 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2402.06665 [pdf, other]

The Essential Role of Causality in Foundation World Models for Embodied AI

Authors: Tarun Gupta, Wenbo Gong, Chao Ma, Nick Pawlowski, Agrin Hilmkil, Meyer Scetbon, Marc Rigter, Ade Famoti, Ashley Juan Llorens, Jianfeng Gao, Stefan Bauer, Danica Kragic, Bernhard Schölkopf, Cheng Zhang

Abstract: Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for E… ▽ More Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for Embodied AI. The study of causality lends itself to the construction of veridical world models, which are crucial for accurately predicting the outcomes of possible interactions. This paper focuses on the prospects of building foundation world models for the upcoming generation of embodied agents and presents a novel viewpoint on the significance of causality within these. We posit that integrating causal considerations is vital to facilitating meaningful physical interactions with the world. Finally, we demystify misconceptions about causality in this context and present our outlook for future research. △ Less

Submitted 29 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2311.03989 [pdf, other]

Learned Causal Method Prediction

Authors: Shantanu Gupta, Cheng Zhang, Agrin Hilmkil

Abstract: For a given causal question, it is important to efficiently decide which causal inference method to use for a given dataset. This is challenging because causal methods typically rely on complex and difficult-to-verify assumptions, and cross-validation is not applicable since ground truth causal quantities are unobserved. In this work, we propose CAusal Method Predictor (CAMP), a framework for pred… ▽ More For a given causal question, it is important to efficiently decide which causal inference method to use for a given dataset. This is challenging because causal methods typically rely on complex and difficult-to-verify assumptions, and cross-validation is not applicable since ground truth causal quantities are unobserved. In this work, we propose CAusal Method Predictor (CAMP), a framework for predicting the best method for a given dataset. To this end, we generate datasets from a diverse set of synthetic causal models, score the candidate methods, and train a model to directly predict the highest-scoring method for that dataset. Next, by formulating a self-supervised pre-training objective centered on dataset assumptions relevant for causal inference, we significantly reduce the need for costly labeled data and enhance training efficiency. Our strategy learns to map implicit dataset properties to the best method in a data-driven manner. In our experiments, we focus on method prediction for causal discovery. CAMP outperforms selecting any individual candidate method and demonstrates promising generalization to unseen semi-synthetic and real-world benchmarks. △ Less

Submitted 8 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.00809 [pdf, other]

Towards Causal Foundation Model: on Duality between Causal Inference and Attention

Authors: Jiaqi Zhang, Joel Jennings, Agrin Hilmkil, Nick Pawlowski, Cheng Zhang, Chao Ma

Abstract: Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-… ▽ More Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-aware foundation models for treatment effect estimations. We propose a novel, theoretically justified method called Causal Inference with Attention (CInA), which utilizes multiple unlabeled datasets to perform self-supervised causal learning, and subsequently enables zero-shot causal inference on unseen tasks with new data. This is based on our theoretical results that demonstrate the primal-dual connection between optimal covariate balancing and self-attention, facilitating zero-shot causal inference through the final layer of a trained transformer-type architecture. We demonstrate empirically that CInA effectively generalizes to out-of-distribution datasets and various real-world datasets, matching or even surpassing traditional per-dataset methodologies. These results provide compelling evidence that our method has the potential to serve as a step** stone for the development of causal foundation models. △ Less

Submitted 3 June, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2304.05524 [pdf, other]

Understanding Causality with Large Language Models: Feasibility and Opportunities

Authors: Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan

Abstract: We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decisio… ▽ More We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decision-making tasks with high precision. We discuss possible future directions and opportunities, such as enabling explicit and implicit causal modules as well as deep causal-aware LLMs. These will not only enable LLMs to answer many different types of causal questions for greater impact but also enable LLMs to be more trustworthy and efficient in general. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2303.12703 [pdf, other]

Causal Reasoning in the Presence of Latent Confounders via Neural ADMG Learning

Authors: Matthew Ashman, Chao Ma, Agrin Hilmkil, Joel Jennings, Cheng Zhang

Abstract: Latent confounding has been a long-standing obstacle for causal reasoning from observational data. One popular approach is to model the data using acyclic directed mixed graphs (ADMGs), which describe ancestral relations between variables using directed and bidirected edges. However, existing methods using ADMGs are based on either linear functional assumptions or a discrete search that is complic… ▽ More Latent confounding has been a long-standing obstacle for causal reasoning from observational data. One popular approach is to model the data using acyclic directed mixed graphs (ADMGs), which describe ancestral relations between variables using directed and bidirected edges. However, existing methods using ADMGs are based on either linear functional assumptions or a discrete search that is complicated to use and lacks computational tractability for large datasets. In this work, we further extend the existing body of work and develop a novel gradient-based approach to learning an ADMG with non-linear functional relations from observational data. We first show that the presence of latent confounding is identifiable under the assumptions of bow-free ADMGs with non-linear additive noise models. With this insight, we propose a novel neural causal model based on autoregressive flows for ADMG learning. This not only enables us to determine complex causal structural relationships behind the data in the presence of latent confounding, but also estimate their functional relationships (hence treatment effects) simultaneously. We further validate our approach via experiments on both synthetic and real-world datasets, and demonstrate the competitive performance against relevant baselines. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: Camera ready version for ICLR 2023

arXiv:2102.00875 [pdf, other]

Scaling Federated Learning for Fine-tuning of Large Language Models

Authors: Agrin Hilmkil, Sebastian Callh, Matteo Barbieri, Leon René Sütfeld, Edvin Listo Zec, Olof Mogren

Abstract: Federated learning (FL) is a promising approach to distributed compute, as well as distributed data, and provides a level of privacy and compliance to legal frameworks. This makes FL attractive for both consumer and healthcare applications. While the area is actively being explored, few studies have examined FL in the context of larger language models and there is a lack of comprehensive reviews o… ▽ More Federated learning (FL) is a promising approach to distributed compute, as well as distributed data, and provides a level of privacy and compliance to legal frameworks. This makes FL attractive for both consumer and healthcare applications. While the area is actively being explored, few studies have examined FL in the context of larger language models and there is a lack of comprehensive reviews of robustness across tasks, architectures, numbers of clients, and other relevant factors. In this paper, we explore the fine-tuning of Transformer-based language models in a federated learning setting. We evaluate three popular BERT-variants of different sizes (BERT, ALBERT, and DistilBERT) on a number of text classification tasks such as sentiment analysis and author identification. We perform an extensive sweep over the number of clients, ranging up to 32, to evaluate the impact of distributed compute on task performance in the federated averaging setting. While our findings suggest that the large sizes of the evaluated models are not generally prohibitive to federated training, we found that the different models handle federated averaging to a varying degree. Most notably, DistilBERT converges significantly slower with larger numbers of clients, and under some circumstances, even collapses to chance level performance. Investigating this issue presents an interesting perspective for future research. △ Less

Submitted 1 February, 2021; originally announced February 2021.

arXiv:2006.06287 [pdf, other]

Perceiving Music Quality with GANs

Authors: Agrin Hilmkil, Carl Thomé, Anders Arpteg

Abstract: Several methods have been developed to assess the perceptual quality of audio under transforms like lossy compression. However, they require paired reference signals of the unaltered content, limiting their use in applications where references are unavailable. This has hindered progress in audio generation and style transfer, where a no-reference quality assessment method would allow more reproduc… ▽ More Several methods have been developed to assess the perceptual quality of audio under transforms like lossy compression. However, they require paired reference signals of the unaltered content, limiting their use in applications where references are unavailable. This has hindered progress in audio generation and style transfer, where a no-reference quality assessment method would allow more reproducible comparisons across methods. We propose training a GAN on a large music library, and using its discriminator as a no-reference quality assessment measure of the perceived quality of music. This method is unsupervised, needs no access to degraded material and can be tuned for various domains of music. In a listening test with 448 human subjects, where participants rated professionally produced music tracks degraded with different levels and types of signal degradations such as wavesha** distortion and low-pass filtering, we establish a dataset of human rated material. By using the human rated dataset we show that the discriminator score correlates significantly with the subjective ratings, suggesting that the proposed method can be used to create a no-reference musical audio quality assessment measure. △ Less

Submitted 4 April, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: Extended abstract (first version) accepted for the Northern Lights Deep Learning Workshop 2020

arXiv:1808.00198 [pdf]

Towards Machine Learning on data from Professional Cyclists

Authors: Agrin Hilmkil, Oscar Ivarsson, Moa Johansson, Dan Kuylenstierna, Teun van Erp

Abstract: Professional sports are develo** towards increasingly scientific training methods with increasing amounts of data being collected from laboratory tests, training sessions and competitions. In cycling, it is standard to equip bicycles with small computers recording data from sensors such as power-meters, in addition to heart-rate, speed, altitude etc. Recently, machine learning techniques have pr… ▽ More Professional sports are develo** towards increasingly scientific training methods with increasing amounts of data being collected from laboratory tests, training sessions and competitions. In cycling, it is standard to equip bicycles with small computers recording data from sensors such as power-meters, in addition to heart-rate, speed, altitude etc. Recently, machine learning techniques have provided huge success in a wide variety of areas where large amounts of data (big data) is available. In this paper, we perform a pilot experiment on machine learning to model physical response in elite cyclists. As a first experiment, we show that it is possible to train a LSTM machine learning algorithm to predict the heart-rate response of a cyclist during a training session. This work is a promising first step towards develo** more elaborate models based on big data and machine learning to capture performance aspects of athletes. △ Less

Submitted 1 August, 2018; originally announced August 2018.

Comments: Accepted for the 12th World Congress on Performance Analysis of Sports, Opatija, Croatia, 2018

Showing 1–9 of 9 results for author: Hilmkil, A