Skip to main content

Showing 1–7 of 7 results for author: Park, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2311.03419  [pdf, other

    eess.AS cs.LG cs.SD

    Personalizing Keyword Spotting with Speaker Information

    Authors: Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno

    Abstract: Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  2. arXiv:2301.00765  [pdf, other

    eess.IV cs.CV q-bio.CB

    Segmentation based tracking of cells in 2D+time microscopy images of macrophages

    Authors: Seol Ah Park, Tamara Sipka, Zuzana Kriva, George Lutfalla, Mai Nguyen-Chi, Karol Mikula

    Abstract: The automated segmentation and tracking of macrophages during their migration are challenging tasks due to their dynamically changing shapes and motions. This paper proposes a new algorithm to achieve automatic cell tracking in time-lapse microscopy macrophage data. First, we design a segmentation method employing space-time filtering, local Otsu's thresholding, and the SUBSURF (subjective surface… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

    Comments: Computers in Biology and Medicine, Volume 153, 106499,(2023)

  3. arXiv:2205.03481  [pdf, other

    eess.AS cs.SD eess.SP

    A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy

    Authors: Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park, James Walker, Alexander Gruenstein

    Abstract: Acoustic Echo Cancellation (AEC) is essential for accurate recognition of queries spoken to a smart speaker that is playing out audio. Previous work has shown that a neural AEC model operating on log-mel spectral features (denoted "logmel" hereafter) can greatly improve Automatic Speech Recognition (ASR) accuracy when optimized with an auxiliary loss utilizing a pre-trained ASR model encoder. In t… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: Submitted to Interspeech 2022

  4. arXiv:2204.06322  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

    Authors: Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun ** Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays

    Abstract: We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering str… ▽ More

    Submitted 29 June, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  5. arXiv:2111.09935  [pdf, other

    eess.AS cs.SD

    A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

    Authors: Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker, Nathan Howard

    Abstract: We present a frontend for improving robustness of automatic speech recognition (ASR), that jointly implements three modules within a single model: acoustic echo cancellation, speech enhancement, and speech separation. This is achieved by using a contextual enhancement neural network that can optionally make use of different types of side inputs: (1) a reference signal of the playback audio, which… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: Will appear in IEEE-ASRU 2021

  6. arXiv:2106.00856  [pdf, other

    eess.AS cs.SD

    A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data

    Authors: Nathan Howard, Alex Park, Turaj Zakizadeh Shabestary, Alexander Gruenstein, Rohit Prabhavalkar

    Abstract: We consider the problem of recognizing speech utterances spoken to a device which is generating a known sound waveform; for example, recognizing queries issued to a digital assistant which is generating responses to previous user inputs. Previous work has proposed building acoustic echo cancellation (AEC) models for this task that optimize speech enhancement metrics using both neural network as we… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: To appear in ICASSP 2021

  7. arXiv:2011.02284  [pdf, other

    cs.CY cs.CV cs.LG eess.IV

    Surgical Data Science -- from Concepts toward Clinical Translation

    Authors: Lena Maier-Hein, Matthias Eisenmann, Duygu Sarikaya, Keno März, Toby Collins, Anand Malpani, Johannes Fallert, Hubertus Feussner, Stamatia Giannarou, Pietro Mascagni, Hirenkumar Nakawala, Adrian Park, Carla Pugh, Danail Stoyanov, Swaroop S. Vedula, Kevin Cleary, Gabor Fichtinger, Germain Forestier, Bernard Gibaud, Teodor Grantcharov, Makoto Hashizume, Doreen Heckmann-Nötzel, Hannes G. Kenngott, Ron Kikinis, Lars Mündermann , et al. (25 additional authors not shown)

    Abstract: Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applica… ▽ More

    Submitted 30 July, 2021; v1 submitted 30 October, 2020; originally announced November 2020.