Skip to main content

Showing 1–23 of 23 results for author: Tsai, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18009  [pdf, other

    eess.AS cs.SD

    E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

    Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.04281  [pdf, other

    eess.AS

    Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, **yu Li, Sheng Zhao, Naoyuki Kanda

    Abstract: Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker characteristics, has been underexplored. In this work, we propose a novel total-duration-aware (TDA) duration model for TTS, where phoneme durations a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  3. arXiv:2404.01643  [pdf, other

    eess.IV cs.CV cs.LG

    A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

    Abstract: Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions… ▽ More

    Submitted 20 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Camera-ready version, accepted by DEF-AI-MIA workshop, in conjunted with CVPR2024

  4. arXiv:2403.18270  [pdf, other

    cs.CV eess.IV

    Image Deraining via Self-supervised Reinforcement Learning

    Authors: He-Hao Liao, Yan-Tsung Peng, Wen-Tao Chu, **-Chun Hsieh, Chung-Chi Tsai

    Abstract: The quality of images captured outdoors is often affected by the weather. One factor that interferes with sight is rain, which can obstruct the view of observers and computer vision applications that rely on those images. The work aims to recover rain images by removing rain streaks via Self-supervised Reinforcement Learning (RL) for image deraining (SRL-Derain). We locate rain streak pixels from… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  5. arXiv:2403.11230  [pdf, other

    eess.IV cs.CV cs.LG

    Simple 2D Convolutional Neural Network-based Approach for COVID-19 Detection

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

    Abstract: This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images. Classic deep learning approaches face challenges with varying slice counts and resolutions in CT images, a diversity arising from the utilization of assorted scanning equipment. Typically, predictions are made on single slices which are then combined for a comprehensive outcome. Yet, this me… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  6. arXiv:2402.07383  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

    Authors: Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, **zhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

    Abstract: Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their applications and user experience. While there have been prior works to generate natural laughter, they fell short in terms of controlling the timing an… ▽ More

    Submitted 4 March, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: See https://aka.ms/elate/ for demo samples, v2: subjective evaluation has been added

  7. arXiv:2312.09429  [pdf

    eess.SP cs.LG

    Deep Learning-Enabled Swallowing Monitoring and Postoperative Recovery Biosensing System

    Authors: Chih-Ning Tsai, Pei-Wen Yang, Tzu-Yen Huang, Jung-Chih Chen, Hsin-Yi Tseng, Che-Wei Wu, Amrit Sarmah, Tzu-En Lin

    Abstract: This study introduces an innovative 3D printed dry electrode tailored for biosensing in postoperative recovery scenarios. Fabricated through a drop coating process, the electrode incorporates a novel 2D material.

    Submitted 24 November, 2023; originally announced December 2023.

    Comments: the abstract can't uploaded fully

    MSC Class: NA ACM Class: A.0

  8. arXiv:2309.04651  [pdf

    eess.IV cs.AI cs.CV

    Video and Synthetic MRI Pre-training of 3D Vision Architectures for Neuroimage Analysis

    Authors: Nikhil J. Dhinagar, Amit Singh, Saket Ozarkar, Ketaki Buwa, Sophia I. Thomopoulos, Conor Owens-Walton, Emily Laltoo, Yao-Liang Chen, Philip Cook, Corey McMillan, Chih-Chien Tsai, J-J Wang, Yih-Ru Wu, Paul M. Thompson

    Abstract: Transfer learning represents a recent paradigm shift in the way we build artificial intelligence (AI) systems. In contrast to training task-specific models, transfer learning involves pre-training deep learning models on a large corpus of data and minimally fine-tuning them for adaptation to specific tasks. Even so, for 3D medical imaging tasks, we do not know if it is best to pre-train models on… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  9. Interference-Aware Deployment for Maximizing User Satisfaction in Multi-UAV Wireless Networks

    Authors: Chuan-Chi Lai, Ang-Hsun Tsai, Chia-Wei Ting, Ko-Han Lin, **g-Chi Ling, Chia-En Tsai

    Abstract: In this letter, we study the deployment of Unmanned Aerial Vehicle mounted Base Stations (UAV-BSs) in multi-UAV cellular networks. We model the multi-UAV deployment problem as a user satisfaction maximization problem, that is, maximizing the proportion of served ground users (GUs) that meet a given minimum data rate requirement. We propose an interference-aware deployment (IAD) algorithm for servi… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: 5 pages, 3 figures, to appear in IEEE Wireless Communications Letters

  10. arXiv:2303.08490  [pdf, other

    eess.IV cs.CV

    Strong Baseline and Bag of Tricks for COVID-19 Detection of CT Scans

    Authors: Chih-Chung Hsu, Chih-Yu Jian, Chia-Ming Lee, Chi-Han Tsai, Sheng-Chieh Dai

    Abstract: This paper investigates the application of deep learning models for lung Computed Tomography (CT) image analysis. Traditional deep learning frameworks encounter compatibility issues due to variations in slice numbers and resolutions in CT images, which stem from the use of different machines. Commonly, individual slices are predicted and subsequently merged to obtain the final result; however, thi… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: technical report. Keywords: Spatial-Slice correlation, COVID-19 classification, convolutional neural networks, computed tomography

  11. arXiv:2302.13631  [pdf

    eess.IV cs.AI cs.CV cs.LG q-bio.QM

    Curriculum Based Multi-Task Learning for Parkinson's Disease Detection

    Authors: Nikhil J. Dhinagar, Conor Owens-Walton, Emily Laltoo, Christina P. Boyle, Yao-Liang Chen, Philip Cook, Corey McMillan, Chih-Chien Tsai, J-J Wang, Yih-Ru Wu, Ysbrand van der Werf, Paul M. Thompson

    Abstract: There is great interest in develo** radiological classifiers for diagnosis, staging, and predictive modeling in progressive diseases such as Parkinson's disease (PD), a neurodegenerative disease that is difficult to detect in its early stages. Here we leverage severity-based meta-data on the stages of disease to define a curriculum for training a deep convolutional neural network (CNN). Typicall… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted for publication at the 20th IEEE International Symposium on Biomedical Imaging, ISBI 2023

  12. arXiv:2210.08369  [pdf

    physics.optics eess.IV

    Metasurface Smart Glass for Object Recognition

    Authors: Cheng-Chia Tsai, Xiaoyan Huang, Zhicheng Wu, Zongfu Yu, Nanfang Yu

    Abstract: Recent years have seen a considerable surge of research on develo** heuristic approaches to realize analog computing using physical waves. Among these, neuromorphic computing using light waves is envisioned to feature performance metrics such as computational speed and energy efficiency exceeding those of conventional digital techniques by many orders of magnitude. Yet, neuromorphic computing ba… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: 30 pages, 6 figures

  13. arXiv:2207.03050  [pdf, other

    eess.IV cs.CV

    Multi-Task Lung Nodule Detection in Chest Radiographs with a Dual Head Network

    Authors: Chen-Han Tsai, Yu-Shao Peng

    Abstract: Lung nodules can be an alarming precursor to potential lung cancer. Missed nodule detections during chest radiograph analysis remains a common challenge among thoracic radiologists. In this work, we present a multi-task lung nodule detection algorithm for chest radiograph analysis. Unlike past approaches, our algorithm predicts a global-level label indicating nodule presence along with local-level… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: 11 pages, 3 figures, Accepted to the MICCAI Conference 2022

  14. arXiv:2207.01579  [pdf, other

    eess.IV cs.CV cs.LG

    Spatiotemporal Feature Learning Based on Two-Step LSTM and Transformer for CT Scans

    Authors: Chih-Chung Hsu, Chi-Han Tsai, Guan-Lin Chen, Sin-Di Ma, Shen-Chieh Tai

    Abstract: Computed tomography (CT) imaging could be very practical for diagnosing various diseases. However, the nature of the CT images is even more diverse since the resolution and number of the slices of a CT scan are determined by the machine and its settings. Conventional deep learning models are hard to tickle such diverse data since the essential requirement of the deep neural network is the consiste… ▽ More

    Submitted 8 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: draft

  15. arXiv:2202.03430  [pdf, other

    eess.IV cs.CV

    A Topology-Attention ConvLSTM Network and Its Application to EM Images

    Authors: Jiaqi Yang, Xiaoling Hu, Chao Chen, Chialing Tsai

    Abstract: Structural accuracy of segmentation is important for finescale structures in biomedical images. We propose a novel TopologyAttention ConvLSTM Network (TACNet) for 3D image segmentation in order to achieve high structural accuracy for 3D segmentation tasks. Specifically, we propose a Spatial Topology-Attention (STA) module to process a 3D image as a stack of 2D image slices and adopt ConvLSTM to le… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

    Comments: 12 pages, 6 figures, Accepted by MICCAI'21

  16. Hybrid Controlled User Association and Resource Management for Energy-Efficient Green RANs with Limited Fronthaul

    Authors: Li-Hsiang Shen, Chia-Lin Tsai, Chia-Yu Wang, Kai-Ten Feng

    Abstract: To alleviate green house effect, high network energy efficiency (EE) has increasingly become an important research target in wireless green communications. Therefore, the investigation for resource management to mitigate the co-tier interference in the small cell network (SCN) is provided. Moreover, with the merits of cloud radio access network (C-RAN), small cell base stations (SBSs) can be decom… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

    Journal ref: IEEE Access, 2022

  17. arXiv:2111.12925  [pdf, other

    cs.CV cs.AI cs.LG eess.IV eess.SP

    ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation

    Authors: Wei-Ting Chen, Cheng-Che Tsai, Hao-Yu Fang, I-Hsiang Chen, Jian-Jiun Ding, Sy-Yen Kuo

    Abstract: Images acquired from rainy scenes usually suffer from bad visibility which may damage the performance of computer vision applications. The rainy scenarios can be categorized into two classes: moderate rain and heavy rain scenes. Moderate rain scene mainly consists of rain streaks while heavy rain scene contains both rain streaks and the veiling effect (similar to haze). Although existing methods h… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Comments: This paper is accepted by BMVC 2021

  18. arXiv:2102.00502  [pdf, other

    cs.MM eess.IV

    A Machine Learning Approach to Optimal Inverse Discrete Cosine Transform (IDCT) Design

    Authors: Yifan Wang, Zhanxuan Mei, Chia-Yang Tsai, Ioannis Katsavounidis, C. -C. Jay Kuo

    Abstract: The design of the optimal inverse discrete cosine transform (IDCT) to compensate the quantization error is proposed for effective lossy image compression in this work. The forward and inverse DCTs are designed in pair in current image/video coding standards without taking the quantization effect into account. Yet, the distribution of quantized DCT coefficients deviate from that of original DCT coe… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: conference

  19. Blind Monaural Source Separation on Heart and Lung Sounds Based on Periodic-Coded Deep Autoencoder

    Authors: Kun-Hsi Tsai, Wei-Chien Wang, Chui-Hsuan Cheng, Chan-Yen Tsai, Jou-Kou Wang, Tzu-Hao Lin, Shih-Hau Fang, Li-Chin Chen, Yu Tsao

    Abstract: Auscultation is the most efficient way to diagnose cardiovascular and respiratory diseases. To reach accurate diagnoses, a device must be able to recognize heart and lung sounds from various clinical situations. However, the recorded chest sounds are mixed by heart and lung sounds. Thus, effectively separating these two sounds is critical in the pre-processing stage. Recent advances in machine lea… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

    Comments: 13 pages, 11 figures, Accepted by IEEE Journal of Biomedical and Health Informatics

  20. arXiv:2005.02706  [pdf, other

    eess.IV cs.CV

    Knee Injury Detection using MRI with Efficiently-Layered Network (ELNet)

    Authors: Chen-Han Tsai, Nahum Kiryati, Eli Konen, Iris Eshed, Arnaldo Mayer

    Abstract: Magnetic Resonance Imaging (MRI) is a widely-accepted imaging technique for knee injury analysis. Its advantage of capturing knee structure in three dimensions makes it the ideal tool for radiologists to locate potential tears in the knee. In order to better confront the ever growing workload of musculoskeletal (MSK) radiologists, automated tools for patients' triage are becoming a real need, redu… ▽ More

    Submitted 30 September, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

    Comments: 11 pages, 4 figures, Accepted to the Medical Imaging and Deep Learning (MIDL) Conference 2020

    Journal ref: Proceedings of Machine Learning Research 121 (2020) 784-794

  21. arXiv:1909.03434  [pdf, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    Order-free Learning Alleviating Exposure Bias in Multi-label Classification

    Authors: Che-** Tsai, Hung-Yi Lee

    Abstract: Multi-label classification (MLC) assigns multiple labels to each sample. Prior studies show that MLC can be transformed to a sequence prediction problem with a recurrent neural network (RNN) decoder to model the label dependency. However, training a RNN decoder requires a predefined order of labels, which is not directly available in the MLC specification. Besides, RNN thus trained tends to overfi… ▽ More

    Submitted 8 September, 2019; originally announced September 2019.

  22. arXiv:1904.04100  [pdf, other

    cs.CL cs.SD eess.AS

    Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models

    Authors: Kuan-Yu Chen, Che-** Tsai, Da-Rong Liu, Hung-Yi Lee, Lin-shan Lee

    Abstract: Producing a large annotated speech corpus for training ASR systems remains difficult for more than 95% of languages all over the world which are low-resourced, but collecting a relatively big unlabeled data set for such languages is more achievable. This is why some initial effort have been reported on completely unsupervised speech recognition learned from unlabeled data only, although with relat… ▽ More

    Submitted 23 August, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: Accepted by Interspeech 2019

  23. arXiv:1804.05306  [pdf, other

    cs.SD cs.CL eess.AS

    Transcribing Lyrics From Commercial Song Audio: The First Step Towards Singing Content Processing

    Authors: Che-** Tsai, Yi-Lin Tuan, Lin-shan Lee

    Abstract: Spoken content processing (such as retrieval and browsing) is maturing, but the singing content is still almost completely left out. Songs are human voice carrying plenty of semantic information just as speech, and may be considered as a special type of speech with highly flexible prosody. The various problems in song audio, for example the significantly changing phone duration over highly flexibl… ▽ More

    Submitted 15 April, 2018; originally announced April 2018.

    Comments: Accepted as a conference paper at ICASSP 2018