Skip to main content

Showing 1–11 of 11 results for author: Borgholt, L

.
  1. arXiv:2406.08958  [pdf, other

    cs.LG

    An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records

    Authors: Joakim Edin, Maria Maistro, Lars Maaløe, Lasse Borgholt, Jakob D. Havtorn, Tuukka Ruotsalo

    Abstract: Electronic healthcare records are vital for patient safety as they document conditions, plans, and procedures in both free text and medical codes. Language models have significantly enhanced the processing of such records, streamlining workflows and reducing manual data entry, thereby saving healthcare providers significant resources. However, the black-box nature of these models often leaves heal… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

    Authors: Joakim Edin, Alexander Junge, Jakob D. Havtorn, Lasse Borgholt, Maria Maistro, Tuukka Ruotsalo, Lars Maaløe

    Abstract: Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such codes to track patient diagnoses and treatments. Automated medical coding can considerably alleviate this administrative burden. In this paper, we reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that seve… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

    Comments: 11 pages, 6 figures, to be published in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23), July 23--27, 2023, Taipei, Taiwan

    ACM Class: H.3.0

  3. arXiv:2205.10643  [pdf, other

    cs.CL cs.SD eess.AS

    Self-Supervised Speech Representation Learning: A Review

    Authors: Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

    Abstract: Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a… ▽ More

    Submitted 27 October, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

  4. arXiv:2203.01829  [pdf, other

    eess.AS cs.LG cs.SD

    A Brief Overview of Unsupervised Neural Speech Representation Learning

    Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Joakim Edin, Lars Maaløe, Christian Igel

    Abstract: Unsupervised representation learning for speech processing has matured greatly in the last few years. Work in computer vision and natural language processing has paved the way, but speech data offers unique challenges. As a result, methods from other domains rarely translate directly. We review the development of unsupervised representation learning for speech over the last decade. We identify two… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing (SAS) at AAAI

  5. arXiv:2202.12707  [pdf, other

    eess.AS cs.AI cs.LG cs.SD stat.ML

    Benchmarking Generative Latent Variable Models for Speech

    Authors: Jakob D. Havtorn, Lasse Borgholt, Søren Hauberg, Jes Frellsen, Lars Maaløe

    Abstract: Stochastic latent variable models (LVMs) achieve state-of-the-art performance on natural image generation but are still inferior to deterministic models on speech. In this paper, we develop a speech benchmark of popular temporal LVMs and compare them against state-of-the-art deterministic models. We report the likelihood, which is a much used metric in the image domain, but rarely, or incomparably… ▽ More

    Submitted 5 April, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: Accepted at the 2022 ICLR workshop on Deep Generative Models for Highly Structured Data (https://deep-gen-struct.github.io)

  6. arXiv:2111.14842  [pdf, other

    eess.AS cs.CL cs.LG

    Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

    Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Mostafa Abdou, Joakim Edin, Lars Maaløe, Anders Søgaard, Christian Igel

    Abstract: Spoken language understanding (SLU) tasks are usually solved by first transcribing an utterance with automatic speech recognition (ASR) and then feeding the output to a text-based model. Recent advances in self-supervised representation learning for speech data have focused on improving the ASR component. We investigate whether representation learning for speech has matured enough to replace ASR i… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: Under review as a conference paper at ICASSP 2022

  7. arXiv:2102.09928  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Do End-to-End Speech Recognition Models Care About Context?

    Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Željko Agić, Anders Søgaard, Lars Maaløe, Christian Igel

    Abstract: The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Comments: Published in the proceedings of INTERSPEECH 2020, pp. 4352-4356

  8. arXiv:2102.00850  [pdf, other

    eess.AS cs.LG cs.SD

    On Scaling Contrastive Representations for Low-Resource Speech Recognition

    Authors: Lasse Borgholt, Tycho Max Sylvester Tax, Jakob Drachmann Havtorn, Lars Maaløe, Christian Igel

    Abstract: Recent advances in self-supervised learning through contrastive training have shown that it is possible to learn a competitive speech recognition system with as little as 10 minutes of labeled data. However, these systems are computationally expensive since they require pre-training followed by fine-tuning in a large parameter space. We explore the performance of such systems without fine-tuning b… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  9. arXiv:2005.00812  [pdf, other

    cs.CL cs.SD eess.AS

    MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech

    Authors: Jakob D. Havtorn, Jan Latko, Joakim Edin, Lasse Borgholt, Lars Maaløe, Lorenzo Belgrano, Nicolai F. Jacobsen, Regitze Sdun, Željko Agić

    Abstract: We address a challenging and practical task of labeling questions in speech in real time during telephone calls to emergency medical services in English, which embeds within a broader decision support system for emergency call-takers. We propose a novel multimodal approach to real-time sequence labeling in speech. Our model treats speech and its own textual representation as two separate modalitie… ▽ More

    Submitted 12 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted at ACL 2020

  10. arXiv:1812.02308  [pdf, other

    cs.CL cs.LG stat.ML

    On the Inductive Bias of Word-Character-Level Multi-Task Learning for Speech Recognition

    Authors: Jan Kremer, Lasse Borgholt, Lars Maaløe

    Abstract: End-to-end automatic speech recognition (ASR) commonly transcribes audio signals into sequences of characters while its performance is evaluated by measuring the word-error rate (WER). This suggests that predicting sequences of words directly may be helpful instead. However, training with word-level supervision can be more difficult due to the sparsity of examples per label class. In this paper we… ▽ More

    Submitted 28 November, 2018; originally announced December 2018.

    Comments: Accepted at the IRASL workshop at NeurIPS 2018

  11. arXiv:1711.10271  [pdf, other

    cs.SD eess.AS stat.ML

    Exploiting Nontrivial Connectivity for Automatic Speech Recognition

    Authors: Marius Paraschiv, Lasse Borgholt, Tycho Max Sylvester Tax, Marco Singh, Lars Maaløe

    Abstract: Nontrivial connectivity has allowed the training of very deep networks by addressing the problem of vanishing gradients and offering a more efficient method of reusing parameters. In this paper we make a comparison between residual networks, densely-connected networks and highway networks on an image classification task. Next, we show that these methodologies can easily be deployed into automatic… ▽ More

    Submitted 28 November, 2017; originally announced November 2017.

    Comments: Accepted at the ML4Audio workshop at the NIPS 2017