Skip to main content

Showing 1–10 of 10 results for author: Kong, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.18134  [pdf, other

    eess.IV cs.CV

    Integrative Graph-Transformer Framework for Histopathology Whole Slide Image Representation and Classification

    Authors: Zhan Shi, **gwei Zhang, Jun Kong, Fusheng Wang

    Abstract: In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task where giga-pixel WSIs are only labeled at the slide level. However, existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles, while graph-based… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  2. arXiv:2402.10087  [pdf, ps, other

    cs.NI cs.LG eess.SP

    Decentralized Covert Routing in Heterogeneous Networks Using Reinforcement Learning

    Authors: Justin Kong, Terrence J. Moore, Fikadu T. Dagefu

    Abstract: This letter investigates covert routing communications in a heterogeneous network where a source transmits confidential data to a destination with the aid of relaying nodes where each transmitter judiciously chooses one modality among multiple communication modalities. We develop a novel reinforcement learning-based covert routing algorithm that finds a route from the source to the destination whe… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  3. arXiv:2311.11745  [pdf, other

    cs.SD cs.CL eess.AS

    ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

    Authors: Jungil Kong, Junmo Lee, Jeongmin Kim, Beomjeong Kim, Jihoon Park, Dohee Kong, Changheon Lee, Sang** Kim

    Abstract: In this work, we propose a novel method for modeling numerous speakers, which enables expressing the overall characteristics of speakers in detail like a trained multi-speaker model without additional training on the target speaker's dataset. Although various works with similar purposes have been actively studied, their performance has not yet reached that of trained multi-speaker models due to th… ▽ More

    Submitted 31 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: ICML 2024

  4. arXiv:2310.17448  [pdf, other

    cs.CL eess.AS

    Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge

    Authors: Tanel Alumäe, Jiaming Kong, Daniil Robnikov

    Abstract: This paper describes Tallinn University of Technology (TalTech) systems developed for the ASRU MADASR 2023 Challenge. The challenge focuses on automatic speech recognition of dialect-rich Indian languages with limited training audio and text data. TalTech participated in two tracks of the challenge: Track 1 that allowed using only the provided training data and Track 3 which allowed using addition… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Journal ref: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

  5. arXiv:2307.16430  [pdf, other

    cs.SD cs.LG eess.AS

    VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

    Authors: Jungil Kong, Jihoon Park, Beomjeong Kim, Jeongmin Kim, Dohee Kong, Sang** Kim

    Abstract: Single-stage text-to-speech models have been actively studied recently, and their results have outperformed two-stage pipeline systems. Although the previous single-stage model has made great progress, there is room for improvement in terms of its intermittent unnaturalness, computational efficiency, and strong dependence on phoneme conversion. In this work, we introduce VITS2, a single-stage text… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: Interspeech 2023

  6. arXiv:2106.06103  [pdf, other

    cs.SD eess.AS

    Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

    Authors: Jaehyeon Kim, Jungil Kong, Juhee Son

    Abstract: Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems. In this work, we present a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. Our method adopts variational inference augmented with normalizing flo… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  7. arXiv:2010.05646  [pdf, other

    cs.SD cs.LG eess.AS

    HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

    Authors: Jungil Kong, Jaehyeon Kim, Jaekyoung Bae

    Abstract: Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech… ▽ More

    Submitted 23 October, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/jik876/hifi-gan

  8. arXiv:2005.11129  [pdf, other

    eess.AS cs.SD

    Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

    Authors: Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon

    Abstract: Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the parallel TTS models cannot be trained without guidance from autoregressive TTS models as their external aligners. In this work, we propose Glow-TTS, a flow-based generative model for parallel TTS that does not require any external al… ▽ More

    Submitted 22 October, 2020; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: Accepted by NeurIPS2020

  9. arXiv:1911.07088  [pdf, other

    eess.IV cs.CV

    Liver Steatosis Segmentation with Deep Learning Methods

    Authors: Xiaoyuan Guo, Fusheng Wang, George Teodorou, Alton B. Farris, Jun Kong

    Abstract: Liver steatosis is known as the abnormal accumulation of lipids within cells. An accurate quantification of steatosis area within the liver histopathological microscopy images plays an important role in liver disease diagnosis and trans-plantation assessment. Such a quantification analysis often requires a precise steatosis segmentation that is challenging due to abundant presence of highly overla… ▽ More

    Submitted 16 November, 2019; originally announced November 2019.

    Comments: 4 pages

    Journal ref: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) Venice, Italy, April 8-11, 2019

  10. arXiv:1209.3332  [pdf, other

    cs.DC eess.SY

    High-throughput Execution of Hierarchical Analysis Pipelines on Hybrid Cluster Platforms

    Authors: George Teodoro, Tony Pan, Tahsin M. Kurc, Jun Kong, Lee A. D. Cooper, Joel H. Saltz

    Abstract: We propose, implement, and experimentally evaluate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have multiple CPUs and general purpose graphics processing units (GPUs). Our work targets scientific analysis applications in which datasets are processed in applica… ▽ More

    Submitted 14 September, 2012; originally announced September 2012.

    Comments: 12 pages, 14 figures