Skip to main content

Showing 1–18 of 18 results for author: Park, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.07217  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    Attention-aware Semantic Communications for Collaborative Inference

    Authors: Jiwoong Im, Nayoung Kwon, Taewoo Park, Jiheon Woo, Jaeho Lee, Yongjune Kim

    Abstract: We propose a communication-efficient collaborative inference framework in the domain of edge inference, focusing on the efficient use of vision transformer (ViT) models. The partitioning strategy of conventional collaborative inference fails to reduce communication cost because of the inherent architecture of ViTs maintaining consistent layer dimensions across the entire transformer encoder. There… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 February, 2024; originally announced April 2024.

  2. arXiv:2310.12378  [pdf, other

    eess.AS cs.SD

    The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

    Authors: Tae ** Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

    Abstract: We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays. The system predominantly comprises of the following integral modules: the Spea… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  3. arXiv:2310.12371  [pdf, other

    eess.AS cs.SD

    Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

    Authors: Tae ** Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg

    Abstract: We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap via the adjustment of statistical parameters. This capability offers a tailored training environment for develo** neural models suited for speaker diarization… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  4. arXiv:2309.05248  [pdf, other

    eess.AS cs.SD

    Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

    Authors: Tae ** Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam

    Abstract: Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual cues in human dialogues. Our method builds upon an acoustic-based speaker diarization system by adding lexical information from an LLM in the inference stage. W… ▽ More

    Submitted 13 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 4 pages 1 reference page, ICASSP format

  5. arXiv:2305.06459  [pdf, other

    eess.SP cs.GR cs.HC eess.IV q-bio.NC

    SlicerTMS: Real-Time Visualization of Transcranial Magnetic Stimulation for Mental Health Treatment

    Authors: Loraine Franke, Tae Young Park, Jie Luo, Yogesh Rathi, Steve Pieper, Lipeng Ning, Daniel Haehn

    Abstract: We present a real-time visualization system for Transcranial Magnetic Stimulation (TMS), a non-invasive neuromodulation technique for treating various brain disorders and mental health diseases. Our solution targets the current challenges of slow and labor-intensive practices in treatment planning. Integrating Deep Learning (DL), our system rapidly predicts electric field (E-field) distributions i… ▽ More

    Submitted 12 March, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 11 pages, 4 figures, 2 tables, MICCAI

  6. arXiv:2206.03796  [pdf, other

    cs.RO eess.SP

    Adaptive Neural Network-based Unscented Kalman Filter for Robust Pose Tracking of Noncooperative Spacecraft

    Authors: Tae Ha Park, Simone D'Amico

    Abstract: This paper presents a neural network-based Unscented Kalman Filter (UKF) to estimate and track the pose (i.e., position and orientation) of a known, noncooperative, tumbling target spacecraft in a close-proximity rendezvous scenario. The UKF estimates the target's orbit and attitude relative to the servicer based on the pose information provided by a multi-task Convolutional Neural Network (CNN) f… ▽ More

    Submitted 8 May, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted to AIAA Journal of Guidance, Control, and Dynamics. Updated derivation of Section IV.B and experiments

  7. arXiv:2203.15974  [pdf, other

    eess.AS cs.CL

    Multi-scale Speaker Diarization with Dynamic Scale Weighting

    Authors: Tae ** Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

    Abstract: Speaker diarization systems are challenged by a trade-off between the temporal resolution and the fidelity of the speaker representation. By obtaining a superior temporal resolution with an enhanced accuracy, a multi-scale approach is a way to cope with such a trade-off. In this paper, we propose a more advanced multi-scale diarization system based on a multi-scale diarization decoder. There are t… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  8. arXiv:2110.04410  [pdf, other

    eess.AS cs.SD

    TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

    Authors: Nithin Rao Koluguri, Tae** Park, Boris Ginsburg

    Abstract: In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker representations. We employ 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with global context followed by channel attention based statistics pooling layer to map variable-length utterances to a fixed-length embedding (t-vector). TitaNet is a scalable architecture and achieves… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: preprint. Submitted to ICASSP 2022

  9. arXiv:2101.09624  [pdf, other

    eess.AS cs.CL cs.SD

    A Review of Speaker Diarization: Recent Advances with Deep Learning

    Authors: Tae ** Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan

    Abstract: Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for speech recognition on multispeaker audio recordings to enable speaker adaptive processing. These algorithms also gained their own value as a standalone application o… ▽ More

    Submitted 26 November, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

    Comments: This article is a preprint version of the article published in Computer Speech & Language, Volume 72, March 2022, 101317

  10. arXiv:2011.10527  [pdf, other

    eess.AS

    Multi-Scale Speaker Diarization With Neural Affinity Score Fusion

    Authors: Tae ** Park, Manoj Kumar, Shrikanth Narayanan

    Abstract: Identifying the identity of the speaker of short segments in human dialogue has been considered one of the most challenging problems in speech signal processing. Speaker representations of short speech segments tend to be unreliable, resulting in poor fidelity of speaker representations in tasks requiring speaker recognition. In this paper, we propose an unconventional method that tackles the trad… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: Submitted to ICASSP 2021

  11. arXiv:2007.09635  [pdf, other

    eess.AS cs.SD

    Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization

    Authors: Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae ** Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan

    Abstract: The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN n… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

    Comments: Submitted to IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

  12. Speaker Diarization with Lexical Information

    Authors: Tae ** Park, Kyu J. Han, **g Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

    Abstract: This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Journal ref: Interspeech 2019, 391-395

  13. Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

    Authors: Tae ** Park, Kyu J. Han, Manoj Kumar, Shrikanth Narayanan

    Abstract: In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization. The proposed framework uses normalized maximum eigengap (NME) values to estimate the number of clusters and the parameters for the threshold of the elements of each row in an affinity matrix during spectral clustering, without the use of… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: in IEEE Signal Processing Letters, 2020

  14. arXiv:2002.02520  [pdf, other

    cs.SD cs.CL eess.AS

    Robust Multi-channel Speech Recognition using Frequency Aligned Network

    Authors: Tae** Park, Kenichi Kumatani, Minhua Wu, Shiva Sundaram

    Abstract: Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial filtering layer jointly within an acoustic model. In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic s… ▽ More

    Submitted 6 February, 2020; originally announced February 2020.

  15. arXiv:1911.11927  [pdf, ps, other

    eess.AS

    Automatic prediction of suicidal risk in military couples using multimodal interaction cues from couples conversations

    Authors: Sandeep Nallan Chakravarthula, Md Nasir, Shao-Yen Tseng, Haoqi Li, Tae ** Park, Brian Baucom, Craig J. Bryan, Shrikanth Narayanan, Panayiotis Georgiou

    Abstract: Suicide is a major societal challenge globally, with a wide range of risk factors, from individual health, psychological and behavioral elements to socio-economic aspects. Military personnel, in particular, are at especially high risk. Crisis resources, while helpful, are often constrained by access to clinical visits or therapist availability, especially when needed in a timely manner. There have… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: submitted to ICASSP 2020

  16. arXiv:1910.11398  [pdf, ps, other

    eess.AS cs.SD

    Speaker diarization using latent space clustering in generative adversarial network

    Authors: Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae ** Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan

    Abstract: In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) backprojection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, latent variable recovery loss, and a clustering-specific loss. It uses x-vector speaker embeddings at the input, while the latent variables are sampled from a co… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

  17. arXiv:1905.10488  [pdf, other

    eess.IV cs.CV cs.LG

    GAN2GAN: Generative Noise Learning for Blind Denoising with Single Noisy Images

    Authors: Sungmin Cha, Taeeon Park, Byeongjoon Kim, Jongduk Baek, Taesup Moon

    Abstract: We tackle a challenging blind image denoising problem, in which only single distinct noisy images are available for training a denoiser, and no information about noise is known, except for it being zero-mean, additive, and independent of the clean image. In such a setting, which often occurs in practice, it is not possible to train a denoiser with the standard discriminative training or with the r… ▽ More

    Submitted 4 July, 2021; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: ICLR 2021 camera ready version

  18. arXiv:1805.10731  [pdf, other

    eess.AS cs.SD

    Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks

    Authors: Tae ** Park, Panayiotis Georgiou

    Abstract: While there has been substantial amount of work in speaker diarization recently, there are few efforts in jointly employing lexical and acoustic information for speaker segmentation. Towards that, we investigate a speaker diarization system using a sequence-to-sequence neural network trained on both lexical and acoustic features. We also propose a loss function that allows for selecting not only t… ▽ More

    Submitted 27 May, 2018; originally announced May 2018.