Skip to main content

Showing 1–12 of 12 results for author: Havtorn, J D

.
  1. arXiv:2406.08958  [pdf, other

    cs.LG

    An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records

    Authors: Joakim Edin, Maria Maistro, Lars Maaløe, Lasse Borgholt, Jakob D. Havtorn, Tuukka Ruotsalo

    Abstract: Electronic healthcare records are vital for patient safety as they document conditions, plans, and procedures in both free text and medical codes. Language models have significantly enhanced the processing of such records, streamlining workflows and reducing manual data entry, thereby saving healthcare providers significant resources. However, the black-box nature of these models often leaves heal… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2307.02321  [pdf, other

    cs.CV

    MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers

    Authors: Jakob Drachmann Havtorn, Amelie Royer, Tijmen Blankevoort, Babak Ehteshami Bejnordi

    Abstract: The input tokens to Vision Transformers carry little semantic meaning as they are defined as regular equal-sized patches of the input image, regardless of its content. However, processing uniform background areas of an image should not necessitate as much compute as dense, cluttered areas. To address this issue, we propose a dynamic mixed-scale tokenization scheme for ViT, MSViT. Our method introd… ▽ More

    Submitted 7 September, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: ICCV Workshops 2023; Code for the Generalized Batch-Sha** loss is available at https://github.com/Qualcomm-AI-research/batchsha**

  3. Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

    Authors: Joakim Edin, Alexander Junge, Jakob D. Havtorn, Lasse Borgholt, Maria Maistro, Tuukka Ruotsalo, Lars Maaløe

    Abstract: Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such codes to track patient diagnoses and treatments. Automated medical coding can considerably alleviate this administrative burden. In this paper, we reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that seve… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

    Comments: 11 pages, 6 figures, to be published in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23), July 23--27, 2023, Taipei, Taiwan

    ACM Class: H.3.0

  4. arXiv:2205.10643  [pdf, other

    cs.CL cs.SD eess.AS

    Self-Supervised Speech Representation Learning: A Review

    Authors: Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

    Abstract: Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a… ▽ More

    Submitted 27 October, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

  5. arXiv:2203.01829  [pdf, other

    eess.AS cs.LG cs.SD

    A Brief Overview of Unsupervised Neural Speech Representation Learning

    Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Joakim Edin, Lars Maaløe, Christian Igel

    Abstract: Unsupervised representation learning for speech processing has matured greatly in the last few years. Work in computer vision and natural language processing has paved the way, but speech data offers unique challenges. As a result, methods from other domains rarely translate directly. We review the development of unsupervised representation learning for speech over the last decade. We identify two… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing (SAS) at AAAI

  6. arXiv:2203.01097  [pdf, other

    stat.ML cs.LG

    Model-agnostic out-of-distribution detection using combined statistical tests

    Authors: Federico Bergamin, Pierre-Alexandre Mattei, Jakob D. Havtorn, Hugo Senetaire, Hugo Schmutz, Lars Maaløe, Søren Hauberg, Jes Frellsen

    Abstract: We present simple methods for out-of-distribution detection using a trained generative model. These techniques, based on classical statistical tests, are model-agnostic in the sense that they can be applied to any differentiable generative model. The idea is to combine a classical parametric test (Rao's score test) with the recently introduced typicality test. These two test statistics are both th… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: Accepted at the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

  7. arXiv:2202.12707  [pdf, other

    eess.AS cs.AI cs.LG cs.SD stat.ML

    Benchmarking Generative Latent Variable Models for Speech

    Authors: Jakob D. Havtorn, Lasse Borgholt, Søren Hauberg, Jes Frellsen, Lars Maaløe

    Abstract: Stochastic latent variable models (LVMs) achieve state-of-the-art performance on natural image generation but are still inferior to deterministic models on speech. In this paper, we develop a speech benchmark of popular temporal LVMs and compare them against state-of-the-art deterministic models. We report the likelihood, which is a much used metric in the image domain, but rarely, or incomparably… ▽ More

    Submitted 5 April, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: Accepted at the 2022 ICLR workshop on Deep Generative Models for Highly Structured Data (https://deep-gen-struct.github.io)

  8. arXiv:2111.14842  [pdf, other

    eess.AS cs.CL cs.LG

    Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

    Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Mostafa Abdou, Joakim Edin, Lars Maaløe, Anders Søgaard, Christian Igel

    Abstract: Spoken language understanding (SLU) tasks are usually solved by first transcribing an utterance with automatic speech recognition (ASR) and then feeding the output to a text-based model. Recent advances in self-supervised representation learning for speech data have focused on improving the ASR component. We investigate whether representation learning for speech has matured enough to replace ASR i… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: Under review as a conference paper at ICASSP 2022

  9. arXiv:2102.09928  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Do End-to-End Speech Recognition Models Care About Context?

    Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Željko Agić, Anders Søgaard, Lars Maaløe, Christian Igel

    Abstract: The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Comments: Published in the proceedings of INTERSPEECH 2020, pp. 4352-4356

  10. arXiv:2102.08248  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Hierarchical VAEs Know What They Don't Know

    Authors: Jakob D. Havtorn, Jes Frellsen, Søren Hauberg, Lars Maaløe

    Abstract: Deep generative models have been demonstrated as state-of-the-art density estimators. Yet, recent work has found that they often assign a higher likelihood to data from outside the training distribution. This seemingly paradoxical behavior has caused concerns over the quality of the attained density estimates. In the context of hierarchical variational autoencoders, we provide evidence to explain… ▽ More

    Submitted 18 January, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: Appeared in Proceedings of the 38th International Conference on Machine Learning (ICML 2021). 18 pages, source code available at https://github.com/JakobHavtorn/hvae-oodd, https://github.com/vlievin/biva-pytorch and https://github.com/larsmaaloee/BIVA

  11. arXiv:2102.00850  [pdf, other

    eess.AS cs.LG cs.SD

    On Scaling Contrastive Representations for Low-Resource Speech Recognition

    Authors: Lasse Borgholt, Tycho Max Sylvester Tax, Jakob Drachmann Havtorn, Lars Maaløe, Christian Igel

    Abstract: Recent advances in self-supervised learning through contrastive training have shown that it is possible to learn a competitive speech recognition system with as little as 10 minutes of labeled data. However, these systems are computationally expensive since they require pre-training followed by fine-tuning in a large parameter space. We explore the performance of such systems without fine-tuning b… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  12. arXiv:2005.00812  [pdf, other

    cs.CL cs.SD eess.AS

    MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech

    Authors: Jakob D. Havtorn, Jan Latko, Joakim Edin, Lasse Borgholt, Lars Maaløe, Lorenzo Belgrano, Nicolai F. Jacobsen, Regitze Sdun, Željko Agić

    Abstract: We address a challenging and practical task of labeling questions in speech in real time during telephone calls to emergency medical services in English, which embeds within a broader decision support system for emergency call-takers. We propose a novel multimodal approach to real-time sequence labeling in speech. Our model treats speech and its own textual representation as two separate modalitie… ▽ More

    Submitted 12 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted at ACL 2020