Skip to main content

Showing 1–16 of 16 results for author: Hsieh, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.08328  [pdf, other

    eess.AS

    Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation

    Authors: Tsun-An Hsieh, Heeyoul Choi, Minje Kim

    Abstract: Recent studies highlight the potential of textual modalities in conditioning the speech separation model's inference process. However, regularization-based methods remain underexplored despite their advantages of not requiring auxiliary text data during the test time. To address this gap, we introduce a timed text-based regularization (TTR) method that uses language model-derived semantics to impr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2401.07882  [pdf, other

    cs.SD eess.AS

    On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

    Authors: Tsun-An Hsieh, Jacob Donley, Daniel Wong, Buye Xu, Ashutosh Pandey

    Abstract: We introduce a time-domain framework for efficient multichannel speech enhancement, emphasizing low latency and computational efficiency. This framework incorporates two compact deep neural networks (DNNs) surrounding a multichannel neural Wiener filter (NWF). The first DNN enhances the speech signal to estimate NWF coefficients, while the second DNN refines the output from the NWF. The NWF, while… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted for publication at ICASSP

  3. arXiv:2312.01644  [pdf

    eess.IV cs.CV

    TMSR: Tiny Multi-path CNNs for Super Resolution

    Authors: Chia-Hung Liu, Tzu-Hsin Hsieh, Kuan-Yu Huang, Pei-Yin Chen

    Abstract: In this paper, we proposed a tiny multi-path CNN-based Super-Resolution (SR) method, called TMSR. We mainly refer to some tiny CNN-based SR methods, under 5k parameters. The main contribution of the proposed method is the improved multi-path learning and self-defined activated function. The experimental results show that TMSR obtains competitive image quality (i.e. PSNR and SSIM) compared to the r… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 5 pages, 7 figures, published in the IEEE Eurasia Conference on IoT, Communication and Engineering proceedings 2023

  4. arXiv:2211.01189  [pdf, other

    eess.AS cs.AI cs.LG cs.NE cs.SD

    Inference and Denoise: Causal Inference-based Neural Speech Enhancement

    Authors: Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao

    Abstract: This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention. Based on the potential outcome framework, the proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement module… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  5. arXiv:2202.09907  [pdf, other

    cs.SD eess.AS

    towards automatic transcription of polyphonic electric guitar music:a new dataset and a multi-loss transformer model

    Authors: Yu-Hua Chen, Wen-Yi Hsiao, Tsu-Kuang Hsieh, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: In this paper, we propose a new dataset named EGDB, that con-tains transcriptions of the electric guitar performance of 240 tab-latures rendered with different tones. Moreover, we benchmark theperformance of two well-known transcription models proposed orig-inally for the piano on this dataset, along with a multi-loss Trans-former model that we newly propose. Our evaluation on this datasetand a se… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

    Comments: to be published at ICASSP 2022

  6. arXiv:2201.09208  [pdf

    cs.CV eess.SP

    Design of Sensor Fusion Driver Assistance System for Active Pedestrian Safety

    Authors: I-Hsi Kao, Ya-Zhu Yian, Jian-An Su, Yi-Horng Lai, Jau-Woei Perng, Tung-Li Hsieh, Yi-Shueh Tsai, Min-Shiu Hsieh

    Abstract: In this paper, we present a parallel architecture for a sensor fusion detection system that combines a camera and 1D light detection and ranging (lidar) sensor for object detection. The system contains two object detection methods, one based on an optical flow, and the other using lidar. The two sensors can effectively complement the defects of the other. The accurate longitudinal accuracy of the… ▽ More

    Submitted 23 January, 2022; originally announced January 2022.

    Comments: The 14th International Conference on Automation Technology (Automation 2017), December 8-10, 2017, Kaohsiung, Taiwan

  7. arXiv:2111.05703  [pdf, other

    eess.AS cs.SD

    OSSEM: one-shot speaker adaptive speech enhancement using meta learning

    Authors: Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

    Abstract: Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers. In this study, we propose a novel meta-learning-based speaker-adaptive SE approach (called OSSEM) that aims to achieve SE model adaptation in a one-shot manner. OSSEM consists of a modified tra… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

  8. arXiv:2106.05229  [pdf, other

    cs.SD cs.LG eess.AS

    Speech Recovery for Real-World Self-powered Intermittent Devices

    Authors: Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao, Tei-Wei Kuo

    Abstract: The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications. Although many researches have been proposed to address this issue, they controlled the data missing conditions by simulation with self-defined masking lengths or sizes. Besides, the masking definitions are different among all these experimental settings. This paper presen… ▽ More

    Submitted 24 January, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

  9. arXiv:2104.03538  [pdf

    cs.SD cs.AI eess.AS

    MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

    Authors: Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

    Abstract: The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discr… ▽ More

    Submitted 4 June, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Accepted by Interspeech 2021

  10. arXiv:2010.15174  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement

    Authors: Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

    Abstract: Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e.g. phones and syllables. In this study, we propose a novel phone-fortified perceptual loss (PFPL) that takes phonetic information into account for training SE models. To effectively incorporate the phonetic information… ▽ More

    Submitted 27 April, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

  11. arXiv:2006.10296  [pdf

    eess.AS cs.LG cs.SD

    Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

    Authors: Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-** Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

    Abstract: The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To fur… ▽ More

    Submitted 3 March, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted by APSIPA 2020

  12. arXiv:2004.04098  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

    Authors: Tsun-An Hsieh, Hsin-Min Wang, Xugang Lu, Yu Tsao

    Abstract: Due to the simple design pipeline, end-to-end (E2E) neural models for speech enhancement (SE) have attracted great interest. In order to improve the performance of the E2E model, the locality and temporal sequential properties of speech should be efficiently taken into account when modelling. However, in most current E2E models for SE, these properties are either not fully considered or are too co… ▽ More

    Submitted 26 November, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  13. arXiv:2002.06817  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Addressing the confounds of accompaniments in singer identification

    Authors: Tsung-Han Hsieh, Kai-Hsiang Cheng, Zhe-Cheng Fan, Yu-Ching Yang, Yi-Hsuan Yang

    Abstract: Identifying singers is an important task with many applications. However, the task remains challenging due to many issues. One major issue is related to the confounding factors from the background instrumental music that is mixed with the vocals in music production. A singer identification model may learn to extract non-vocal related features from the instrumental part of the songs, if a singer on… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

  14. arXiv:1911.12529  [pdf, other

    cs.CV cs.LG eess.IV

    One-Shot Object Detection with Co-Attention and Co-Excitation

    Authors: Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, Tyng-Luh Liu

    Abstract: This paper aims to tackle the challenging problem of one-shot object detection. Given a query image patch whose class label is not included in the training data, the goal of the task is to detect all instances of the same class in a target image. To this end, we develop a novel {\em co-attention and co-excitation} (CoAE) framework that makes contributions in three key technical aspects. First, we… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Comments: NeurIPS 2019

  15. arXiv:1906.02772  [pdf

    eess.SP cs.LG

    Adaptive Subspace Sampling for Class Imbalance Processing-Some clarifications, algorithm, and further investigation including applications to Brain Computer Interface

    Authors: Chin-Teng Lin, Kuan-Chih Huang, Yu-Ting Liu, Yang-Yin Lin, Tsung-Yu Hsieh, Nikhil R. Pal, Shang-Lin Wu, Chieh-Ning Fang, Zehong Cao

    Abstract: Kohonen's Adaptive Subspace Self-Organizing Map (ASSOM) learns several subspaces of the data where each subspace represents some invariant characteristics of the data. To deal with the imbalance classification problem, earlier we have proposed a method for oversampling the minority class using Kohonen's ASSOM. This investigation extends that study, clarifies some issues related to our earlier work… ▽ More

    Submitted 7 October, 2020; v1 submitted 26 May, 2019; originally announced June 2019.

    Comments: The current version is accepted by iFuzzy 2020

  16. arXiv:1810.12947  [pdf, other

    eess.AS cs.SD

    A Streamlined Encoder/Decoder Architecture for Melody Extraction

    Authors: Tsung-Han Hsieh, Li Su, Yi-Hsuan Yang

    Abstract: Melody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing inspiration from a state-of-the-art model for semantic pixel-wise segmentation, we pass through the pooling indices between pooling and un-pooling layers to lo… ▽ More

    Submitted 18 February, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: This is a pre-print version of an ICASSP 2019 paper