Skip to main content

Showing 1–14 of 14 results for author: Paolini, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00741  [pdf, other

    cs.AI cs.LG

    Learning to Play 7 Wonders Duel Without Human Supervision

    Authors: Giovanni Paolini, Lorenzo Moreschini, Francesco Veneziano, Alessandro Iraci

    Abstract: This paper introduces ZeusAI, an artificial intelligence system developed to play the board game 7 Wonders Duel. Inspired by the AlphaZero reinforcement learning algorithm, ZeusAI relies on a combination of Monte Carlo Tree Search and a Transformer Neural Network to learn the game without human supervision. ZeusAI competes at the level of top human players, develops both known and novel strategies… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  2. arXiv:2405.00204  [pdf, other

    cs.CL cs.AI

    General Purpose Verification for Chain of Thought Prompting

    Authors: Robert Vacareanu, Anurag Pratik, Evangelia Spiliopoulou, Zheng Qi, Giovanni Paolini, Neha Anna John, Jie Ma, Yassine Benajiba, Miguel Ballesteros

    Abstract: Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 22 pages, preprint

  3. arXiv:2404.10830  [pdf, other

    cs.CL cs.AI cs.LG

    Fewer Truncations Improve Language Modeling

    Authors: Hantian Ding, Zijian Wang, Giovanni Paolini, Varun Kumar, Anoop Deoras, Dan Roth, Stefano Soatto

    Abstract: In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity -- it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and… ▽ More

    Submitted 2 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: ICML 2024

  4. arXiv:2305.13191  [pdf, other

    cs.CL cs.AI cs.LG

    Taxonomy Expansion for Named Entity Recognition

    Authors: Karthikeyan K, Yogarshi Vyas, Jie Ma, Giovanni Paolini, Neha Anna John, Shuai Wang, Yassine Benajiba, Vittorio Castelli, Dan Roth, Miguel Ballesteros

    Abstract: Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types. However, requirements evolve and we might need the NER model to recognize additional entity types. A simple approach is to re-annotate entire dataset with both existing and additional entity types and then train the model on the re-annotated dataset. However, this is an extremely laborious task. To re… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  5. arXiv:2305.11979  [pdf, other

    cs.CL

    A Weak Supervision Approach for Few-Shot Aspect Based Sentiment

    Authors: Robert Vacareanu, Siddharth Varia, Kishaloy Halder, Shuai Wang, Giovanni Paolini, Neha Anna John, Miguel Ballesteros, Smaranda Muresan

    Abstract: We explore how weak supervision on abundant unlabeled data can be leveraged to improve few-shot performance in aspect-based sentiment analysis (ABSA) tasks. We propose a pipeline approach to construct a noisy ABSA dataset, and we use it to adapt a pre-trained sequence-to-sequence model to the ABSA tasks. We test the resulting model on three widely used ABSA datasets, before and after fine-tuning.… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  6. arXiv:2302.07994  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting

    Authors: Benjamin Bowman, Alessandro Achille, Luca Zancato, Matthew Trager, Pramuditha Perera, Giovanni Paolini, Stefano Soatto

    Abstract: We introduce À-la-carte Prompt Tuning (APT), a transformer-based scheme to tune prompts on distinct data so that they can be arbitrarily composed at inference time. The individual prompts can be trained in isolation, possibly on different devices, at different times, and on different distributions or domains. Furthermore each prompt only contains information about the subset of data it was exposed… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: 13 pages, 4 figures, 8 tables

  7. arXiv:2202.12457  [pdf, other

    cs.LG eess.SY stat.ML

    Stacked Residuals of Dynamic Layers for Time Series Anomaly Detection

    Authors: L. Zancato, A. Achille, G. Paolini, A. Chiuso, S. Soatto

    Abstract: We present an end-to-end differentiable neural network architecture to perform anomaly detection in multivariate time series by incorporating a Sequential Probability Ratio Test on the prediction residual. The architecture is a cascade of dynamical systems designed to separate linearly predictable components of the signal such as trends and seasonality, from the non-linear ones. The former are mod… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  8. arXiv:2111.09785  [pdf, other

    cs.LG

    DIVA: Dataset Derivative of a Learning Task

    Authors: Yonatan Dukler, Alessandro Achille, Giovanni Paolini, Avinash Ravichandran, Marzia Polito, Stefano Soatto

    Abstract: We present a method to compute the derivative of a learning task with respect to a dataset. A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN). The "dataset derivative" is a linear operator, computed around the trained model, that informs how perturbations of the weight of each training sample affect the valida… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

  9. arXiv:2101.06640  [pdf, other

    cs.LG stat.ML

    Estimating informativeness of samples with Smooth Unique Information

    Authors: Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

    Abstract: We define a notion of information that an individual sample provides to the training of a neural network, and we specialize it to measure both how much a sample informs the final weights and how much it informs the function computed by the weights. Though related, we show that these quantities have a qualitatively different behavior. We give efficient approximations of these quantities using a lin… ▽ More

    Submitted 28 March, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

    Comments: ICLR 2021, 22 pages

  10. arXiv:2101.05779  [pdf, other

    cs.LG cs.CL

    Structured Prediction as Translation between Augmented Natural Languages

    Authors: Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, Stefano Soatto

    Abstract: We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discri… ▽ More

    Submitted 2 December, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

    Journal ref: International Conference on Learning Representations (ICLR) 2021

  11. arXiv:1905.12213  [pdf, other

    cs.LG cs.AI cs.IT stat.ML

    Where is the Information in a Deep Neural Network?

    Authors: Alessandro Achille, Giovanni Paolini, Stefano Soatto

    Abstract: Whatever information a deep neural network has gleaned from training data is encoded in its weights. How this information affects the response of the network to future data remains largely an open question. Indeed, even defining and measuring information entails some subtleties, since a trained network is a deterministic map, so standard information measures can be degenerate. We measure informati… ▽ More

    Submitted 21 June, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Report number: UCLA-TR:190005

  12. arXiv:1904.03292  [pdf, other

    cs.LG cs.IT stat.ML

    The Information Complexity of Learning Tasks, their Structure and their Distance

    Authors: Alessandro Achille, Giovanni Paolini, Glen Mbeng, Stefano Soatto

    Abstract: We introduce an asymmetric distance in the space of learning tasks, and a framework to compute their complexity. These concepts are foundational for the practice of transfer learning, whereby a parametric model is pre-trained for a task, and then fine-tuned for another. The framework we develop is non-asymptotic, captures the finite nature of the training dataset, and allows distinguishing learnin… ▽ More

    Submitted 14 July, 2020; v1 submitted 5 April, 2019; originally announced April 2019.

    Report number: UCLA CSD180003

  13. arXiv:1703.06983  [pdf, other

    cs.CG cs.CC math.GT

    Collapsibility to a subcomplex of a given dimension is NP-complete

    Authors: Giovanni Paolini

    Abstract: In this paper we extend the works of Tancer and of Malgouyres and Francés, showing that $(d,k)$-collapsibility is NP-complete for $d\geq k+2$ except $(2,0)$. By $(d,k)$-collapsibility we mean the following problem: determine whether a given $d$-dimensional simplicial complex can be collapsed to some $k$-dimensional subcomplex. The question of establishing the complexity status of $(d,k)$-collapsib… ▽ More

    Submitted 5 April, 2019; v1 submitted 20 March, 2017; originally announced March 2017.

    Journal ref: Discrete & Computational Geometry 59 (1), pp. 246-251 (2018)

  14. arXiv:1408.3310  [pdf, other

    cs.DS cs.DM math.GR

    An algorithm for canonical forms of finite subsets of $\mathbb{Z}^d$ up to affinities

    Authors: Giovanni Paolini

    Abstract: In this paper we describe an algorithm for the computation of canonical forms of finite subsets of $\mathbb{Z}^d$, up to affinities over $\mathbb{Z}$. For fixed dimension $d$, this algorithm has worst-case asymptotic complexity $O(n \log^2 n \, s\,μ(s))$, where $n$ is the number of points in the given subset, $s$ is an upper bound to the size of the binary representation of any of the $n$ points,… ▽ More

    Submitted 27 September, 2018; v1 submitted 14 August, 2014; originally announced August 2014.

    MSC Class: 52C07

    Journal ref: Discrete & Computational Geometry 58 (2), pp. 293-312 (2017)