-
A Notion of Complexity for Theory of Mind via Discrete World Models
Authors:
X. Angelo Huang,
Emanuele La Malfa,
Samuele Marro,
Andrea Asperti,
Anthony Cohn,
Michael Wooldridge
Abstract:
Theory of Mind (ToM) can be used to assess the capabilities of Large Language Models (LLMs) in complex scenarios where social reasoning is required. While the research community has proposed many ToM benchmarks, their hardness varies greatly, and their complexity is not well defined. This work proposes a framework to measure the complexity of ToM tasks. We quantify a problem's complexity as the nu…
▽ More
Theory of Mind (ToM) can be used to assess the capabilities of Large Language Models (LLMs) in complex scenarios where social reasoning is required. While the research community has proposed many ToM benchmarks, their hardness varies greatly, and their complexity is not well defined. This work proposes a framework to measure the complexity of ToM tasks. We quantify a problem's complexity as the number of states necessary to solve it correctly. Our complexity measure also accounts for spurious states of a ToM problem designed to make it apparently harder. We use our method to assess the complexity of five widely adopted ToM benchmarks. On top of this framework, we design a prompting technique that augments the information available to a model with a description of how the environment changes with the agents' interactions. We name this technique Discrete World Models (DWM) and show how it elicits superior performance on ToM tasks.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Code Simulation Challenges for Large Language Models
Authors:
Emanuele La Malfa,
Christoph Weinhuber,
Orazio Torre,
Fangru Lin,
Samuele Marro,
Anthony Cohn,
Nigel Shadbolt,
Michael Wooldridge
Abstract:
Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-lin…
▽ More
Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the simulation capabilities of LLMs with sorting algorithms and nested loops and show that a routine's computational complexity directly affects an LLM's ability to simulate its execution. While the most powerful LLMs exhibit relatively strong simulation capabilities, the process is fragile, seems to rely heavily on pattern recognition, and is affected by memorisation. We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the computation pattern of compilers. CoSm efficiently helps LLMs reduce memorisation and shallow pattern recognition while improving simulation performance. We consider the success of CoSm in code simulation to be inspirational for other general routine simulation reasoning tasks.
△ Less
Submitted 12 June, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Computational Asymmetries in Robust Classification
Authors:
Samuele Marro,
Michele Lombardi
Abstract:
In the context of adversarial robustness, we make three strongly related contributions. First, we prove that while attacking ReLU classifiers is $\mathit{NP}$-hard, ensuring their robustness at training time is $Σ^2_P$-hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show tha…
▽ More
In the context of adversarial robustness, we make three strongly related contributions. First, we prove that while attacking ReLU classifiers is $\mathit{NP}$-hard, ensuring their robustness at training time is $Σ^2_P$-hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show that inference-time robustness certificates are not affected by this asymmetry, by introducing a proof-of-concept approach named Counter-Attack (CA). Indeed, CA displays a reversed asymmetry: running the defense is $\mathit{NP}$-hard, while attacking it is $Σ_2^P$-hard. Finally, motivated by our previous result, we argue that adversarial attacks can be used in the context of robustness certification, and provide an empirical evaluation of their effectiveness. As a byproduct of this process, we also release UG100, a benchmark dataset for adversarial attacks.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
Image Embedding for Denoising Generative Models
Authors:
Andrea Asperti,
Davide Evangelista,
Samuele Marro,
Fabio Merizzi
Abstract:
Denoising Diffusion models are gaining increasing popularity in the field of generative modeling for several reasons, including the simple and stable training, the excellent generative quality, and the solid probabilistic foundation. In this article, we address the problem of {\em embedding} an image into the latent space of Denoising Diffusion Models, that is finding a suitable ``noisy'' image wh…
▽ More
Denoising Diffusion models are gaining increasing popularity in the field of generative modeling for several reasons, including the simple and stable training, the excellent generative quality, and the solid probabilistic foundation. In this article, we address the problem of {\em embedding} an image into the latent space of Denoising Diffusion Models, that is finding a suitable ``noisy'' image whose denoising results in the original image. We particularly focus on Denoising Diffusion Implicit Models due to the deterministic nature of their reverse diffusion process. As a side result of our investigation, we gain a deeper insight into the structure of the latent space of diffusion models, opening interesting perspectives on its exploration, the definition of semantic trajectories, and the manipulation/conditioning of encodings for editing purposes. A particularly interesting property highlighted by our research, which is also characteristic of this class of generative models, is the independence of the latent representation from the networks implementing the reverse diffusion process. In other words, a common seed passed to different networks (each trained on the same dataset), eventually results in identical images.
△ Less
Submitted 30 December, 2022;
originally announced January 2023.
-
Green NFTs: A Study on the Environmental Impact of Cryptoart Technologies
Authors:
Samuele Marro,
Luca Donno
Abstract:
We introduce a model of greenhouse gas emissions due to on-chain activity on Ethereum, focusing on cryptoart. We also estimate the impact of individual transactions on the environment, both before and after the London hard fork. We find that with the current fee mechanism, spending one dollar on transaction fees corresponds to emitting at least the equivalent of 1.305 kilograms of CO2. We also des…
▽ More
We introduce a model of greenhouse gas emissions due to on-chain activity on Ethereum, focusing on cryptoart. We also estimate the impact of individual transactions on the environment, both before and after the London hard fork. We find that with the current fee mechanism, spending one dollar on transaction fees corresponds to emitting at least the equivalent of 1.305 kilograms of CO2. We also describe several techniques to reduce cryptoart emissions, both in the short and long term.
△ Less
Submitted 29 August, 2022; v1 submitted 29 January, 2022;
originally announced February 2022.