Skip to main content

Showing 1–13 of 13 results for author: Pagnoni, A

.
  1. arXiv:2407.02446  [pdf, other

    cs.CL cs.AI

    Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

    Authors: Margaret Li, Weijia Shi, Artidoro Pagnoni, Peter West, Ari Holtzman

    Abstract: RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2305.14314  [pdf, other

    cs.LG

    QLoRA: Efficient Finetuning of Quantized LLMs

    Authors: Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer

    Abstract: We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly rel… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Extended NeurIPS submission

  3. arXiv:2212.10449  [pdf, other

    cs.CL

    Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization

    Authors: Artidoro Pagnoni, Alexander R. Fabbri, Wojciech Kryściński, Chien-Sheng Wu

    Abstract: In long document controllable summarization, where labeled data is scarce, pretrained models struggle to adapt to the task and effectively respond to user queries. In this paper, we introduce Socratic pretraining, a question-driven, unsupervised pretraining objective specifically designed to improve controllability in summarization tasks. By training a model to generate and answer relevant questio… ▽ More

    Submitted 8 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: To appear at ACL 2023

  4. arXiv:2211.05392  [pdf, other

    cs.CL

    EvEntS ReaLM: Event Reasoning of Entity States via Language Models

    Authors: Evangelia Spiliopoulou, Artidoro Pagnoni, Yonatan Bisk, Eduard Hovy

    Abstract: This paper investigates models of event implications. Specifically, how well models predict entity state-changes, by targeting their understanding of physical attributes. Nominally, Large Language models (LLM) have been exposed to procedural knowledge about how objects interact, yet our benchmarking shows they fail to reason about the world. Conversely, we also demonstrate that existing approaches… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022

  5. arXiv:2104.13346  [pdf, other

    cs.CL

    Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics

    Authors: Artidoro Pagnoni, Vidhisha Balachandran, Yulia Tsvetkov

    Abstract: Modern summarization models generate highly fluent but often factually unreliable outputs. This motivated a surge of metrics attempting to measure the factuality of automatically generated summaries. Due to the lack of common benchmarks, these metrics cannot be compared. Moreover, all these methods treat factuality as a binary concept and fail to provide deeper insights into the kinds of inconsist… ▽ More

    Submitted 23 July, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted at NAACL 2021. Second version fixes bug with BERTScore results

  6. arXiv:2003.00576  [pdf, other

    cs.CL

    StructSum: Summarization via Structured Representations

    Authors: Vidhisha Balachandran, Artidoro Pagnoni, Jay Yoon Lee, Dheeraj Rajagopal, Jaime Carbonell, Yulia Tsvetkov

    Abstract: Abstractive text summarization aims at compressing the information of a long source document into a rephrased, condensed summary. Despite advances in modeling techniques, abstractive summarization models still suffer from several key challenges: (i) layout bias: they overfit to the style of training corpora; (ii) limited abstractiveness: they are optimized to copying n-grams from the source rather… ▽ More

    Submitted 16 February, 2021; v1 submitted 1 March, 2020; originally announced March 2020.

  7. arXiv:1909.04793  [pdf, other

    cs.CL

    Definition Frames: Using Definitions for Hybrid Concept Representations

    Authors: Evangelia Spiliopoulou, Artidoro Pagnoni, Eduard Hovy

    Abstract: Advances in word representations have shown tremendous improvements in downstream NLP tasks, but lack semantic interpretability. In this paper, we introduce Definition Frames (DF), a matrix distributed representation extracted from definitions, where each dimension is semantically interpretable. DF dimensions correspond to the Qualia structure relations: a set of relations that uniquely define a t… ▽ More

    Submitted 1 November, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: To appear in COLING 2020

  8. arXiv:1906.03822  [pdf, other

    cs.LG stat.ML

    Making Classical Machine Learning Pipelines Differentiable: A Neural Translation Approach

    Authors: Gyeong-In Yu, Saeed Amizadeh, Sehoon Kim, Artidoro Pagnoni, Byung-Gon Chun, Markus Weimer, Matteo Interlandi

    Abstract: Classical Machine Learning (ML) pipelines often comprise of multiple ML models where models, within a pipeline, are trained in isolation. Conversely, when training neural network models, layers composing the neural models are simultaneously trained using backpropagation. We argue that the isolated training scheme of ML pipelines is sub-optimal, since it cannot jointly optimize multiple components.… ▽ More

    Submitted 12 December, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

  9. Machine Learning at Microsoft with ML .NET

    Authors: Zeeshan Ahmed, Saeed Amizadeh, Mikhail Bilenko, Rogan Carr, Wei-Sheng Chin, Yael Dekel, Xavier Dupre, Vadim Eksarevskiy, Eric Erhardt, Costin Eseanu, Senja Filipi, Tom Finley, Abhishek Goswami, Monte Hoover, Scott Inglis, Matteo Interlandi, Shon Katzenberger, Najeeb Kazmi, Gleb Krivosheev, Pete Luferenko, Ivan Matantsev, Sergiy Matusevych, Shahab Moradi, Gani Nazirov, Justin Ormont , et al. (9 additional authors not shown)

    Abstract: Machine Learning is transitioning from an art and science into a technology available to every developer. In the near future, every application on every platform will incorporate trained models to encode data-based decisions that would be impossible for developers to author. This presents a significant engineering challenge, since currently data science and modeling are largely decoupled from stan… ▽ More

    Submitted 15 May, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

  10. arXiv:1901.07070  [pdf, other

    cs.DS

    Analyzing Branch-and-Bound Algorithms for the Multiprocessor Scheduling Problem

    Authors: Thomas Lively, William Long, Artidoro Pagnoni

    Abstract: The Multiprocessor Scheduling Problem (MSP) is an NP-Complete problem with significant applications in computer and operations systems. We provide a survey of the wide array of polynomial-time approximation, heuristic, and meta-heuristic based algorithms that exist for solving MSP. We also implement Fujita's state-of-the-art Branch-and-Bound algorithm and evaluate the benefit of using Fujita's bin… ▽ More

    Submitted 21 January, 2019; originally announced January 2019.

  11. arXiv:1812.06393  [pdf, ps, other

    cs.LG stat.ML

    PAC Learning Guarantees Under Covariate Shift

    Authors: Artidoro Pagnoni, Stefan Gramatovici, Samuel Liu

    Abstract: We consider the Domain Adaptation problem, also known as the covariate shift problem, where the distributions that generate the training and test data differ while retaining the same labeling function. This problem occurs across a large range of practical applications, and is related to the more general challenge of transfer learning. Most recent work on the topic focuses on optimization technique… ▽ More

    Submitted 16 December, 2018; originally announced December 2018.

  12. arXiv:1812.04405  [pdf, other

    cs.CL

    Conditional Variational Autoencoder for Neural Machine Translation

    Authors: Artidoro Pagnoni, Kevin Liu, Shangyan Li

    Abstract: We explore the performance of latent variable models for conditional text generation in the context of neural machine translation (NMT). Similar to Zhang et al., we augment the encoder-decoder NMT paradigm by introducing a continuous latent variable to model features of the translation process. We extend this model with a co-attention mechanism motivated by Parikh et al. in the inference network.… ▽ More

    Submitted 11 December, 2018; originally announced December 2018.

  13. arXiv:1807.08349  [pdf, other

    cs.CR

    Taint Tracking for WebAssembly

    Authors: Aron Szanto, Timothy Tamm, Artidoro Pagnoni

    Abstract: WebAssembly seeks to provide an alternative to running large and untrusted binaries within web browsers by implementing a portable, performant, and secure bytecode format for native web computation. However, WebAssembly is largely unstudied from a security perspective. In this work, we build the first WebAssembly virtual machine that runs in native JavaScript, and implement a novel taint tracking… ▽ More

    Submitted 22 July, 2018; originally announced July 2018.