Skip to main content

Showing 1–12 of 12 results for author: Vanzo, A

.
  1. arXiv:2404.09841  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Anatomy of Industrial Scale Multilingual ASR

    Authors: Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

    Abstract: This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs. Our system leverages a diverse training dataset comprising unsupervised (12.5M hours), supervised (188k hours), and pseudo-labeled (1.6M hours) data across four languages. We provide a detailed descriptio… ▽ More

    Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  2. arXiv:2404.07341  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrap**

    Authors: Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato

    Abstract: This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources. To achieve this, we perform Noisy Student Training after generating pseudo-labels for the unlabeled public data using a strong Conformer RNN-T baseline model. The addition of these pseu… ▽ More

    Submitted 12 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  3. arXiv:2211.04534  [pdf, other

    cs.CV cs.CL

    Going for GOAL: A Resource for Grounded Football Commentaries

    Authors: Alessandro Suglia, José Lopes, Emanuele Bastianelli, Andrea Vanzo, Shubham Agarwal, Malvina Nikandrou, Lu Yu, Ioannis Konstas, Verena Rieser

    Abstract: Recent video+language datasets cover domains where the interaction is highly structured, such as instructional videos, or where the interaction is scripted, such as TV shows. Both of these properties can lead to spurious cues to be exploited by models rather than learning to ground language. In this paper, we present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or `soccer')… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: Preprint formatted using the ACM Multimedia template (8 pages + appendix)

  4. arXiv:2106.03553  [pdf

    cs.GT physics.soc-ph q-bio.PE

    Playing with words: Do people exploit loaded language to affect others' decisions for their own benefit?

    Authors: Valerio Capraro, Andrea Vanzo, Antonio Cabrales

    Abstract: We report on three pre-registered studies testing whether people in the position of describing a decision problem to decision-makers exploit this opportunity for their benefit, by choosing descriptions that may be potentially beneficial for themselves. In Study 1, recipients of an extreme dictator game (where dictators can either take the whole pie for themselves or give it entirely to the receive… ▽ More

    Submitted 7 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: in press at Judgment and Decision Making

  5. arXiv:2102.00424  [pdf, other

    cs.CL cs.CV cs.LG

    An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games

    Authors: Alessandro Suglia, Yonatan Bisk, Ioannis Konstas, Antonio Vergari, Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon

    Abstract: Guessing games are a prototypical instance of the "learning by interacting" paradigm. This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA). We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: Accepted paper for the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)

  6. arXiv:2011.13210  [pdf, other

    cs.CL cs.LG

    Encoding Syntactic Constituency Paths for Frame-Semantic Parsing with Graph Convolutional Networks

    Authors: Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon

    Abstract: We study the problem of integrating syntactic information from constituency trees into a neural model in Frame-semantic parsing sub-tasks, namely Target Identification (TI), FrameIdentification (FI), and Semantic Role Labeling (SRL). We use a Graph Convolutional Network to learn specific representations of constituents, such that each constituent is profiled as the production grammar rule it corre… ▽ More

    Submitted 26 November, 2020; originally announced November 2020.

  7. arXiv:2011.13205  [pdf, other

    cs.CL cs.LG

    SLURP: A Spoken Language Understanding Resource Package

    Authors: Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski, Verena Rieser

    Abstract: Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we release SLURP, a new SLU package containing the following: (1) A new challenging dataset in English spanning 18 domains, which is substantially bigger an… ▽ More

    Submitted 26 November, 2020; originally announced November 2020.

    Comments: Published at the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP-2020)

  8. arXiv:2011.02917  [pdf, other

    cs.CL cs.CV cs.LG

    Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games

    Authors: Alessandro Suglia, Antonio Vergari, Ioannis Konstas, Yonatan Bisk, Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon

    Abstract: In visual guessing games, a Guesser has to identify a target object in a scene by asking questions to an Oracle. An effective strategy for the players is to learn conceptual representations of objects that are both discriminative and expressive enough to ask questions and guess correctly. However, as shown by Suglia et al. (2020), existing models fail to learn truly multi-modal representations, re… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: Accepted to the International Conference on Computational Linguistics (COLING) 2020

  9. arXiv:2006.02174  [pdf, other

    cs.CL cs.AI cs.LG

    CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

    Authors: Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, Oliver Lemon

    Abstract: Approaches to Grounded Language Learning typically focus on a single task-based final performance measure that may not depend on desirable properties of the learned hidden representations, such as their ability to predict salient attributes or to generalise to unseen situations. To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

    Comments: Accepted to the Annual Conference of the Association for Computational Linguistics (ACL) 2020

  10. arXiv:1910.00912  [pdf, other

    cs.CL

    Hierarchical Multi-Task Natural Language Understanding for Cross-domain Conversational AI: HERMIT NLU

    Authors: Andrea Vanzo, Emanuele Bastianelli, Oliver Lemon

    Abstract: We present a new neural architecture for wide-coverage Natural Language Understanding in Spoken Dialogue Systems. We develop a hierarchical multi-task architecture, which delivers a multi-layer representation of sentence meaning (i.e., Dialogue Acts and Frame-like structures). The architecture is a hierarchy of self-attention mechanisms and BiLSTM encoders followed by CRF tagging layers. We descri… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

    Comments: 10 pages

    Journal ref: SIGDial 2019

  11. arXiv:1909.06749  [pdf, other

    cs.RO cs.AI

    MuMMER: Socially Intelligent Human-Robot Interaction in Public Spaces

    Authors: Mary Ellen Foster, Bart Craenen, Amol Deshmukh, Oliver Lemon, Emanuele Bastianelli, Christian Dondrup, Ioannis Papaioannou, Andrea Vanzo, Jean-Marc Odobez, Olivier Canévet, Yuanzhouhan Cao, Weipeng He, Angel Martínez-González, Petr Motlicek, Rémy Siegfried, Rachid Alami, Kathleen Belhassein, Guilhem Buisan, Aurélie Clodic, Amandine Mayima, Yoan Sallami, Guillaume Sarthou, Phani-Teja Singamaneni, Jules Waldhart, Alexandre Mazel , et al. (5 additional authors not shown)

    Abstract: In the EU-funded MuMMER project, we have developed a social robot designed to interact naturally and flexibly with users in public spaces such as a shop** mall. We present the latest version of the robot system developed during the project. This system encompasses audio-visual sensing, social signal processing, conversational interaction, perspective taking, geometric reasoning, and motion plann… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Report number: AI-HRI/2019/14

  12. arXiv:1901.02314  [pdf, other

    physics.soc-ph cs.GT q-bio.PE

    The power of moral words: Loaded language generates framing effects in the extreme dictator game

    Authors: Valerio Capraro, Andrea Vanzo

    Abstract: Understanding whether preferences are sensitive to the frame has been a major topic of debate in the last decades. For example, several works have explored whether the dictator game in the give frame gives rise to a different rate of pro-sociality than the same game in the take frame, leading to mixed results. Here we contribute to this debate with two experiments. In Study 1 ($N=567$) we implemen… ▽ More

    Submitted 6 April, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: Forthcoming in Judgment and Decision Making

    Journal ref: Judgm. decis. mak. 14 (2019) 309-317