Search | arXiv e-print repository

arXiv:2406.11911 [pdf, other]

A Notion of Complexity for Theory of Mind via Discrete World Models

Authors: X. Angelo Huang, Emanuele La Malfa, Samuele Marro, Andrea Asperti, Anthony Cohn, Michael Wooldridge

Abstract: Theory of Mind (ToM) can be used to assess the capabilities of Large Language Models (LLMs) in complex scenarios where social reasoning is required. While the research community has proposed many ToM benchmarks, their hardness varies greatly, and their complexity is not well defined. This work proposes a framework to measure the complexity of ToM tasks. We quantify a problem's complexity as the nu… ▽ More Theory of Mind (ToM) can be used to assess the capabilities of Large Language Models (LLMs) in complex scenarios where social reasoning is required. While the research community has proposed many ToM benchmarks, their hardness varies greatly, and their complexity is not well defined. This work proposes a framework to measure the complexity of ToM tasks. We quantify a problem's complexity as the number of states necessary to solve it correctly. Our complexity measure also accounts for spurious states of a ToM problem designed to make it apparently harder. We use our method to assess the complexity of five widely adopted ToM benchmarks. On top of this framework, we design a prompting technique that augments the information available to a model with a description of how the environment changes with the agents' interactions. We name this technique Discrete World Models (DWM) and show how it elicits superior performance on ToM tasks. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: https://flecart.github.com/complexity-tom-dwm

arXiv:2401.09074 [pdf, other]

Code Simulation Challenges for Large Language Models

Authors: Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin, Samuele Marro, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge

Abstract: Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-lin… ▽ More Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the simulation capabilities of LLMs with sorting algorithms and nested loops and show that a routine's computational complexity directly affects an LLM's ability to simulate its execution. While the most powerful LLMs exhibit relatively strong simulation capabilities, the process is fragile, seems to rely heavily on pattern recognition, and is affected by memorisation. We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the computation pattern of compilers. CoSm efficiently helps LLMs reduce memorisation and shallow pattern recognition while improving simulation performance. We consider the success of CoSm in code simulation to be inspirational for other general routine simulation reasoning tasks. △ Less

Submitted 12 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Code: https://github.com/EmanueleLM/CodeSimulation

arXiv:2306.14326 [pdf, other]

Computational Asymmetries in Robust Classification

Authors: Samuele Marro, Michele Lombardi

Abstract: In the context of adversarial robustness, we make three strongly related contributions. First, we prove that while attacking ReLU classifiers is $\mathit{NP}$-hard, ensuring their robustness at training time is $Σ^2_P$-hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show tha… ▽ More In the context of adversarial robustness, we make three strongly related contributions. First, we prove that while attacking ReLU classifiers is $\mathit{NP}$-hard, ensuring their robustness at training time is $Σ^2_P$-hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show that inference-time robustness certificates are not affected by this asymmetry, by introducing a proof-of-concept approach named Counter-Attack (CA). Indeed, CA displays a reversed asymmetry: running the defense is $\mathit{NP}$-hard, while attacking it is $Σ_2^P$-hard. Finally, motivated by our previous result, we argue that adversarial attacks can be used in the context of robustness certification, and provide an empirical evaluation of their effectiveness. As a byproduct of this process, we also release UG100, a benchmark dataset for adversarial attacks. △ Less

Submitted 25 June, 2023; originally announced June 2023.

MSC Class: 68T07

Journal ref: 40th International Conference on Machine Learning (ICML 2023)

arXiv:2301.07485 [pdf, other]

Image Embedding for Denoising Generative Models

Authors: Andrea Asperti, Davide Evangelista, Samuele Marro, Fabio Merizzi

Abstract: Denoising Diffusion models are gaining increasing popularity in the field of generative modeling for several reasons, including the simple and stable training, the excellent generative quality, and the solid probabilistic foundation. In this article, we address the problem of {\em embedding} an image into the latent space of Denoising Diffusion Models, that is finding a suitable ``noisy'' image wh… ▽ More Denoising Diffusion models are gaining increasing popularity in the field of generative modeling for several reasons, including the simple and stable training, the excellent generative quality, and the solid probabilistic foundation. In this article, we address the problem of {\em embedding} an image into the latent space of Denoising Diffusion Models, that is finding a suitable ``noisy'' image whose denoising results in the original image. We particularly focus on Denoising Diffusion Implicit Models due to the deterministic nature of their reverse diffusion process. As a side result of our investigation, we gain a deeper insight into the structure of the latent space of diffusion models, opening interesting perspectives on its exploration, the definition of semantic trajectories, and the manipulation/conditioning of encodings for editing purposes. A particularly interesting property highlighted by our research, which is also characteristic of this class of generative models, is the independence of the latent representation from the networks implementing the reverse diffusion process. In other words, a common seed passed to different networks (each trained on the same dataset), eventually results in identical images. △ Less

Submitted 30 December, 2022; originally announced January 2023.

MSC Class: 68T07 ACM Class: I.3.3

arXiv:2202.00003 [pdf, other]

Green NFTs: A Study on the Environmental Impact of Cryptoart Technologies

Authors: Samuele Marro, Luca Donno

Abstract: We introduce a model of greenhouse gas emissions due to on-chain activity on Ethereum, focusing on cryptoart. We also estimate the impact of individual transactions on the environment, both before and after the London hard fork. We find that with the current fee mechanism, spending one dollar on transaction fees corresponds to emitting at least the equivalent of 1.305 kilograms of CO2. We also des… ▽ More We introduce a model of greenhouse gas emissions due to on-chain activity on Ethereum, focusing on cryptoart. We also estimate the impact of individual transactions on the environment, both before and after the London hard fork. We find that with the current fee mechanism, spending one dollar on transaction fees corresponds to emitting at least the equivalent of 1.305 kilograms of CO2. We also describe several techniques to reduce cryptoart emissions, both in the short and long term. △ Less

Submitted 29 August, 2022; v1 submitted 29 January, 2022; originally announced February 2022.

Comments: This draft was written in May 2021 and might be subject to modifications. August 29th 2022: removed references to old emission figure

MSC Class: 68-11 ACM Class: J.5; J.4

Showing 1–5 of 5 results for author: Marro, S