Skip to main content

Showing 1–21 of 21 results for author: Ragni, A

.
  1. arXiv:2406.12937  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Self-Train Before You Transcribe

    Authors: Robert Flynn, Anton Ragni

    Abstract: When there is a mismatch between the training and test domains, current speech recognition systems show significant performance degradation. Self-training methods, such as noisy student teacher training, can help address this and enable the adaptation of models under such domain shifts. However, self-training typically requires a collection of unlabelled target domain data. For settings where this… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2406.08568  [pdf, ps, other

    cs.SD eess.AS

    Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis

    Authors: Wing-Zin Leung, Mattias Cross, Anton Ragni, Stefan Goetze

    Abstract: Automatic speech recognition (ASR) research has achieved impressive performance in recent years and has significant potential for enabling access for people with dysarthria (PwD) in augmentative and alternative communication (AAC) and home environment systems. However, progress in dysarthric ASR (DASR) has been limited by high variability in dysarthric speech and limited public availability of dys… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  3. arXiv:2401.13611  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models

    Authors: Rhiannon Mogridge, George Close, Robert Sutherland, Thomas Hain, Jon Barker, Stefan Goetze, Anton Ragni

    Abstract: Neural networks have been successfully used for non-intrusive speech intelligibility prediction. Recently, the use of feature representations sourced from intermediate layers of pre-trained self-supervised and weakly-supervised models has been found to be particularly useful for this task. This work combines the use of Whisper ASR decoder layer representations as neural network input features with… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024

  4. arXiv:2310.15672  [pdf, other

    cs.CL cs.SD eess.AS

    How Much Context Does My Attention-Based ASR System Need?

    Authors: Robert Flynn, Anton Ragni

    Abstract: For the task of speech recognition, the use of more than 30 seconds of acoustic context during training is uncommon and under-investigated in literature. In this work, we conduct an empirical study on the effect of scaling the sequence length used to train/evaluate (dense-attention-based) acoustic models on speech recognition performance. For these experiments, a dataset of roughly 100,000 pseudo-… ▽ More

    Submitted 17 June, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted at Interspeech 2024

  5. arXiv:2310.12765  [pdf, other

    cs.SD cs.LG eess.AS

    Energy-Based Models For Speech Synthesis

    Authors: Wanli Sun, Zehai Tu, Anton Ragni

    Abstract: Recently there has been a lot of interest in non-autoregressive (non-AR) models for speech synthesis, such as FastSpeech 2 and diffusion models. Unlike AR models, these models do not have autoregressive dependencies among outputs which makes inference efficient. This paper expands the range of available non-AR models with another member called energy-based models (EBMs). The paper describes how no… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  6. arXiv:2307.05161  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    On the Effectiveness of Speech Self-supervised Learning for Music

    Authors: Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu

    Abstract: Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Neverthele… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  7. Leveraging Cross-Utterance Context For ASR Decoding

    Authors: Robert Flynn, Anton Ragni

    Abstract: While external language models (LMs) are often incorporated into the decoding stage of automated speech recognition systems, these models usually operate with limited context. Cross utterance information has been shown to be beneficial during second pass re-scoring, however this limits the hypothesis space based on the local information available to the first pass LM. In this work, we investigate… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  8. arXiv:2306.10548  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    MARBLE: Music Audio Representation Benchmark for Universal Evaluation

    Authors: Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

    Abstract: In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue… ▽ More

    Submitted 23 November, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: camera-ready version for NeurIPS 2023

  9. arXiv:2306.00107  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

    Authors: Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghao Xiao, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Zili Wang, Yike Guo, Jie Fu

    Abstract: Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is partially due to the distinctive challenges associated with modelling musical knowledge, part… ▽ More

    Submitted 22 April, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: accepted by ICLR 2024

  10. arXiv:2302.12103  [pdf, other

    stat.ME stat.AP stat.CO

    Clustering Hierarchies via a Semi-Parametric Generalized Linear Mixed Model: a statistical significance-based approach

    Authors: Alessandra Ragni, Chiara Masci, Francesca Ieva, Anna Maria Paganoni

    Abstract: We introduce a novel statistical significance-based approach for clustering hierarchical data using semi-parametric linear mixed-effects models designed for responses with laws in the exponential family (e.g., Poisson and Bernoulli). Within the family of semi-parametric mixed-effects models, a latent clustering structure of the highest-level units can be identified by assuming the random effects t… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  11. arXiv:2212.02508  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning

    Authors: Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Chenghua Lin, Xingran Chen, Anton Ragni, Hanzhi Yin, Zhijie Hu, Haoyu He, Emmanouil Benetos, Norbert Gyenge, Ruibo Liu, Jie Fu

    Abstract: The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL). However, it still remains unexplored how to build a framework for learning useful representations of raw music waveforms in a self-supervised manner. In this work, we design Music2Vec, a framework exploring different SSL algorithmic components and tricks for music audio recordings. Our mo… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  12. arXiv:2211.02882  [pdf, other

    cs.CL

    HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models

    Authors: Yizhi Li, Ge Zhang, Bohao Yang, Chenghua Lin, Shi Wang, Anton Ragni, Jie Fu

    Abstract: Fairness has become a trending topic in natural language processing (NLP), which addresses biases targeting certain social groups such as genders and religions. However, regional bias in language models (LMs), a long-standing global discrimination problem, still remains unexplored. This paper bridges the gap by analysing the regional bias learned by the pre-trained language models that are broadly… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

    Comments: Accepted at AACL 2022 as Long Findings

  13. arXiv:2208.04829  [pdf, other

    stat.ME eess.IV q-bio.QM

    Imaging-based representation and stratification of intra-tumor Heterogeneity via tree-edit distance

    Authors: Lara Cavinato, Matteo Pegoraro, Alessandra Ragni, Martina Sollini, Anna Paola Erba, Francesca Ieva

    Abstract: Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the pr… ▽ More

    Submitted 13 January, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

  14. arXiv:2106.02417  [pdf, ps, other

    eess.AS cs.CL

    Approximate Fixed-Points in Recurrent Neural Networks

    Authors: Zhengxiong Wang, Anton Ragni

    Abstract: Recurrent neural networks are widely used in speech and language processing. Due to dependency on the past, standard algorithms for training these models, such as back-propagation through time (BPTT), cannot be efficiently parallelised. Furthermore, applying these models to more complex structures than sequences requires inference time approximations, which introduce inconsistency between inferenc… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  15. arXiv:2105.03716  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Continuous representations of intents for dialogue systems

    Authors: Sindre André Jacobsen, Anton Ragni

    Abstract: Intent modelling has become an important part of modern dialogue systems. With the rapid expansion of practical dialogue systems and virtual assistants, such as Amazon Alexa, Apple Siri, and Google Assistant, the interest has only increased. However, up until recently the focus has been on detecting a fixed, discrete, number of seen intents. Recent years have seen some work done on unseen intent d… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

  16. arXiv:1910.11933  [pdf, other

    eess.AS cs.LG cs.SD

    Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks

    Authors: Alexandros Kastanos, Anton Ragni, Mark Gales

    Abstract: Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for… ▽ More

    Submitted 15 March, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: 5 pages, 8 figures, ICASSP submission

  17. arXiv:1905.12467  [pdf, other

    eess.SP

    Multi-channel lock-in based differential front-end for broadband Raman spectroscopy

    Authors: A. Ragni, G. Sciortino, M. Sampietro, G. Ferrari, F. Crisafi, V. Kumar, G. Cerullo, D. Polli

    Abstract: In Broadband Stimulated Raman Spectroscopy, the intrinsic limit given by the laser shot noise is seldom reached due to the electronic noise of the front-end amplifier and the intensity fluctuations of the laser source. In this paper we present a low-noise multi-channel acquisition system, with an integration-oriented design, able to compensate the common-mode fluctuations of the laser output power… ▽ More

    Submitted 25 June, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: Post review version

  18. arXiv:1810.13025  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks

    Authors: Anton Ragni, Qiujia Li, Mark Gales, Yu Wang

    Abstract: The standard approach to assess reliability of automatic speech transcriptions is through the use of confidence scores. If accurate, these scores provide a flexible mechanism to flag transcription errors for upstream and downstream applications. One challenging type of errors that recognisers make are deletions. These errors are not accounted for by the standard confidence estimation schemes and a… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted as a conference paper at 2018 IEEE Workshop on Spoken Language Technology (SLT 2018)

  19. arXiv:1810.13024  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation

    Authors: Qiujia Li, Preben Ness, Anton Ragni, Mark Gales

    Abstract: The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word. In the simplest case, these scores are word posterior probabilities whilst more complex schemes utilise bi-directional recurrent neural network (BiRNN) models. A number of upstream and downstream applications, however, rely on confidence scores as… ▽ More

    Submitted 18 February, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted by ICASSP 2019

  20. arXiv:1802.00254  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

    Authors: Yu Wang, Xie Chen, Mark Gales, Anton Ragni, Jeremy Wong

    Abstract: State-of-the-art English automatic speech recognition systems typically use phonetic rather than graphemic lexicons. Graphemic systems are known to perform less well for English as the map** from the written form to the spoken form is complicated. However, in recent years the representational power of deep-learning based acoustic models has improved, raising interest in graphemic acoustic models… ▽ More

    Submitted 1 February, 2018; originally announced February 2018.

    Comments: 5 pages, 6 tables, to appear in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)

  21. arXiv:1708.05592  [pdf, other

    cs.CL

    Future Word Contexts in Neural Network Language Models

    Authors: Xie Chen, Xunying Liu, Anton Ragni, Yu Wang, Mark Gales

    Abstract: Recently, bidirectional recurrent network language models (bi-RNNLMs) have been shown to outperform standard, unidirectional, recurrent neural network language models (uni-RNNLMs) on a range of speech recognition tasks. This indicates that future word context information beyond the word history can be useful. However, bi-RNNLMs pose a number of challenges as they make use of the complete previous… ▽ More

    Submitted 18 August, 2017; originally announced August 2017.

    Comments: Submitted to ASRU2017