Skip to main content

Showing 1–13 of 13 results for author: Yang, E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.01464  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images

    Authors: JungEun Kim, Hangyul Yoon, Geondo Park, Kyungsu Kim, Eunho Yang

    Abstract: 4D medical images, which represent 3D images with temporal information, are crucial in clinical practice for capturing dynamic changes and monitoring long-term disease progression. However, acquiring 4D medical images poses challenges due to factors such as radiation exposure and imaging duration, necessitating a balance between achieving high temporal resolution and minimizing adverse effects. Gi… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  2. arXiv:2311.05844  [pdf, other

    cs.CV cs.AI cs.CL cs.MM cs.SD eess.AS

    Face-StyleSpeech: Improved Face-to-Voice latent map** for Natural Zero-shot Speech Synthesis from a Face Image

    Authors: Minki Kang, Wooseok Han, Eunho Yang

    Abstract: Generating a voice from a face image is crucial for develo** virtual humans capable of interacting using their unique voices, without relying on pre-recorded human speech. In this paper, we propose Face-StyleSpeech, a zero-shot Text-To-Speech (TTS) synthesis model that generates natural speech conditioned on a face image rather than reference speech. We hypothesize that learning both speaker ide… ▽ More

    Submitted 25 September, 2023; originally announced November 2023.

    Comments: Submitted to ICASSP 2024

  3. arXiv:2309.03451  [pdf, other

    cs.SD cs.LG eess.AS

    Cross-domain Sound Recognition for Efficient Underwater Data Analysis

    Authors: Jeongsoo Park, Dong-Gyun Han, Hyoung Sul La, Sangmin Lee, Yoonchang Han, Eun-** Yang

    Abstract: This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualizati… ▽ More

    Submitted 21 February, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: Accepted to APSIPA 2023

  4. arXiv:2306.01981  [pdf, other

    eess.AS cs.AI cs.LG

    SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization

    Authors: Changhun Kim, Joonhyung Park, Ha** Shim, Eunho Yang

    Abstract: Automatic speech recognition (ASR) models are frequently exposed to data distribution shifts in many real-world scenarios, leading to erroneous predictions. To tackle this issue, an existing test-time adaptation (TTA) method has recently been proposed to adapt the pre-trained ASR model on unlabeled test instances without source data. Despite decent performance gain, this work relies solely on naiv… ▽ More

    Submitted 21 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: INTERSPEECH 2023 Oral Presentation; Code is available at https://github.com/drumpt/SGEM

  5. arXiv:2305.13831  [pdf, other

    cs.SD cs.CL eess.AS

    ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models

    Authors: Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang

    Abstract: Emotional Text-To-Speech (TTS) is an important task in the development of systems (e.g., human-like dialogue agents) that require natural and emotional speech. Existing approaches, however, only aim to produce emotional TTS for seen speakers during training, without consideration of the generalization to unseen speakers. In this paper, we propose ZET-Speech, a zero-shot adaptive emotion-controllab… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023

  6. arXiv:2302.13231  [pdf

    eess.SY stat.AP

    A Synthetic Texas Backbone Power System with Climate-Dependent Spatio-Temporal Correlated Profiles

    Authors: ** Lu, Xingpeng Li, Hongyi Li, Taher Chegini, Carlos Gamarra, Y. C. Ethan Yang, Margaret Cook, Gavin Dillingham

    Abstract: Most power system test cases only have electrical parameters and can be used only for studies based on a snapshot of system profiles. To facilitate more comprehensive and practical studies, a synthetic power system including spatio-temporal correlated profiles for the entire year of 2019 at one-hour resolution has been created in this work. This system, referred to as the synthetic Texas 123-bus b… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

    Comments: 10 pages, 14 figures, 12 tables

  7. arXiv:2302.09560  [pdf, other

    eess.IV cs.CV cs.LG cs.MM

    Deep Selector-JPEG: Adaptive JPEG Image Compression for Computer Vision in Image classification with Human Vision Criteria

    Authors: Hossam Amer, Sepideh Shaterian, En-hui Yang

    Abstract: With limited storage/bandwidth resources, input images to Computer Vision (CV) applications that use Deep Neural Networks (DNNs) are often encoded with JPEG that is tailored to Human Vision (HV). This paper presents Deep Selector-JPEG, an adaptive JPEG compression method that targets image classification while satisfying HV criteria. For each image, Deep Selector-JPEG selects adaptively a Quality… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

    Comments: 4 pages, 2 figures

  8. arXiv:2302.04143  [pdf, other

    eess.IV cs.CV

    Predicting Thrombectomy Recanalization from CT Imaging Using Deep Learning Models

    Authors: Haoyue Zhang, Jennifer S. Polson, Eric J. Yang, Kambiz Nael, William Speier, Corey W. Arnold

    Abstract: For acute ischemic stroke (AIS) patients with large vessel occlusions, clinicians must decide if the benefit of mechanical thrombectomy (MTB) outweighs the risks and potential complications following an invasive procedure. Pre-treatment computed tomography (CT) and angiography (CTA) are widely used to characterize occlusions in the brain vasculature. If a patient is deemed eligible, a modified tre… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: Medical Imaging with Deep Learning 2022 accepted short paper Jun 2022

    Journal ref: Medical Imaging with Deep Learning 2022

  9. arXiv:2110.15018  [pdf, other

    eess.AS cs.SD

    TorchAudio: Building Blocks for Audio and Speech Processing

    Authors: Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-BĂ©lair, Yangyang Shi

    Abstract: This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically dif… ▽ More

    Submitted 16 February, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  10. arXiv:2106.03153  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

    Authors: Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

    Abstract: With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality witho… ▽ More

    Submitted 16 June, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: Accepted by ICML 2021

  11. arXiv:2103.14302  [pdf, other

    cs.CL cs.SD eess.AS

    Mutually-Constrained Monotonic Multihead Attention for Online ASR

    Authors: Jaeyun Song, Ha** Shim, Eunho Yang

    Abstract: Despite the feature of real-time decoding, Monotonic Multihead Attention (MMA) shows comparable performance to the state-of-the-art offline methods in machine translation and automatic speech recognition (ASR) tasks. However, the latency of MMA is still a major issue in ASR and should be combined with a technique that can reduce the test latency at inference time, such as head-synchronous beam sea… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: Accepted at IEEE ICASSP 2021

  12. arXiv:2010.15269  [pdf, other

    eess.IV cs.CV cs.LG

    GloFlow: Global Image Alignment for Creation of Whole Slide Images for Pathology from Video

    Authors: Viswesh Krishna, Anirudh Joshi, Philip L. Bulterys, Eric Yang, Andrew Y. Ng, Pranav Rajpurkar

    Abstract: The application of deep learning to pathology assumes the existence of digital whole slide images of pathology slides. However, slide digitization is bottlenecked by the high cost of precise motor stages in slide scanners that are needed for position information used for slide stitching. We propose GloFlow, a two-stage method for creating a whole slide image using optical flow-based image registra… ▽ More

    Submitted 12 November, 2020; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

  13. arXiv:1911.12990   

    cs.CV cs.LG eess.IV

    Semi-Relaxed Quantization with DropBits: Training Low-Bit Neural Networks via Bit-wise Regularization

    Authors: Jung Hyun Lee, Jihun Yun, Sung Ju Hwang, Eunho Yang

    Abstract: Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order to overcome the nature of transforming continuous activations and weights to discrete ones, recent study called Relaxed Quantization (RQ) [Louizos et al. 2019] s… ▽ More

    Submitted 7 September, 2021; v1 submitted 29 November, 2019; originally announced November 2019.

    Comments: New submission with another link