Skip to main content

Showing 1–27 of 27 results for author: Kao, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2308.14763  [pdf, other

    eess.AS cs.CL cs.SD

    VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired

    Authors: Jia-Jyu Su, Pang-Chen Liao, Yen-Ting Lin, Wu-Hao Li, Guan-Ting Liou, Cheng-Che Kao, Wei-Cheng Chen, Jen-Chieh Chiang, Wen-Yang Chang, Pin-Han Lin, Chen-Yu Chiang

    Abstract: Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: submitted to 26th International Conference of the ORIENTAL-COCOSDA

  2. arXiv:2306.05085  [pdf, other

    eess.IV

    TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception

    Authors: Yi-Hsin Chen, Ying-Chieh Weng, Chia-Hao Kao, Cheng Chien, Wei-Chen Chiu, Wen-Hsiao Peng

    Abstract: This work aims for transferring a Transformer-based image compression codec from human perception to machine perception without fine-tuning the codec. We propose a transferable Transformer-based image compression framework, termed TransTIC. Inspired by visual prompt tuning, TransTIC adopts an instance-specific prompt generator to inject instance-specific prompts to the encoder and task-specific pr… ▽ More

    Submitted 18 August, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted to ICCV 2023

  3. arXiv:2305.10807  [pdf, other

    eess.IV cs.CV

    Transformer-based Variable-rate Image Compression with Region-of-interest Control

    Authors: Chia-Hao Kao, Ying-Chieh Weng, Yi-Hsin Chen, Wei-Chen Chiu, Wen-Hsiao Peng

    Abstract: This paper proposes a transformer-based learned image compression system. It is capable of achieving variable-rate compression with a single model while supporting the region-of-interest (ROI) functionality. Inspired by prompt tuning, we introduce prompt generation networks to condition the transformer-based autoencoder of compression. Our prompt generation networks generate content-adaptive token… ▽ More

    Submitted 1 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to IEEE ICIP 2023

  4. arXiv:2303.10351  [pdf, other

    cs.SD eess.AS

    Weight-sharing Supernet for Searching Specialized Acoustic Event Classification Networks Across Device Constraints

    Authors: Guan-Ting Lin, Qingming Tang, Chieh-Chi Kao, Viktor Rozgic, Chao Wang

    Abstract: Acoustic Event Classification (AEC) has been widely used in devices such as smart speakers and mobile phones for home safety or accessibility support. As AEC models run on more and more devices with diverse computation resource constraints, it became increasingly expensive to develop models that are tuned to achieve optimal accuracy/computation trade-off for each given computation resource constra… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  5. arXiv:2301.04775  [pdf, ps, other

    eess.SY

    On Phase Change Rate Maximization with Practical Applications

    Authors: Chung-Yao Kao, Shinji Hara, Yutaka Hori, Tetsuya Iwasaki, Sei Zhen Khong

    Abstract: We recapitulate the notion of phase change rate maximization and demonstrate the usefulness of its solution on analyzing the robust instability of a cyclic network of multi-agent systems subject to a homogenous multiplicative perturbation. Subsequently, we apply the phase change rate maximization result to two practical applications. The first is a magnetic levitation system, while the second is a… ▽ More

    Submitted 4 April, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  6. arXiv:2209.13210  [pdf, other

    eess.IV cs.CV

    Neural Frank-Wolfe Policy Optimization for Region-of-Interest Intra-Frame Coding with HEVC/H.265

    Authors: Yung-Han Ho, Chia-Hao Kao, Wen-Hsiao Peng, **-Chun Hsieh

    Abstract: This paper presents a reinforcement learning (RL) framework that utilizes Frank-Wolfe policy optimization to solve Coding-Tree-Unit (CTU) bit allocation for Region-of-Interest (ROI) intra-frame coding. Most previous RL-based methods employ the single-critic design, where the rewards for distortion minimization and rate regularization are weighted by an empirically chosen hyper-parameter. Recently,… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by VCIP 2022. arXiv admin note: text overlap with arXiv:2203.05127

  7. arXiv:2205.09658  [pdf, other

    cs.RO eess.SY

    Image-Based Conditioning for Action Policy Smoothness in Autonomous Miniature Car Racing with Reinforcement Learning

    Authors: Bo-Jiun Hsu, Hoang-Giang Cao, I Lee, Chih-Yu Kao, **-Bo Huang, I-Chen Wu

    Abstract: In recent years, deep reinforcement learning has achieved significant results in low-level controlling tasks. However, the problem of control smoothness has less attention. In autonomous driving, unstable control is inevitable since the vehicle might suddenly change its actions. This problem will lower the controlling system's efficiency, induces excessive mechanical wear, and causes uncontrollabl… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  8. arXiv:2203.11997  [pdf, other

    cs.SD cs.LG eess.AS

    Federated Self-Supervised Learning for Acoustic Event Classification

    Authors: Meng Feng, Chieh-Chi Kao, Qingming Tang, Ming Sun, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization. Federated learning (FL) is a compelling framework that decouples data collection and model training to enhance customer privacy. In this work, we investigate the feasibility of applying FL to improve AEC performance while no customer data can be directly uploade… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  9. arXiv:2203.05127  [pdf, other

    eess.IV cs.LG

    Action-Constrained Reinforcement Learning for Frame-Level Bit Allocation in HEVC/H.265 through Frank-Wolfe Policy Optimization

    Authors: Yung-Han Ho, Yun Liang, Chia-Hao Kao, Wen-Hsiao Peng

    Abstract: This paper presents a reinforcement learning (RL) framework that leverages Frank-Wolfe policy optimization to address frame-level bit allocation for HEVC/H.265. Most previous RL-based approaches adopt the single-critic design, which weights the rewards for distortion minimization and rate regularization by an empirically chosen hyper-parameter. More recently, the dual-critic design is proposed to… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

  10. arXiv:2202.09500  [pdf, ps, other

    eess.SY math.OC

    Exact Instability Margin Analysis and Minimum-Norm Strong Stabilization -- phase change rate maximization --

    Authors: Shinji Hara, Chung-Yao Kao, Sei Zhen Khong, Tetsuya Iwasaki, Yutaka Hori

    Abstract: This paper is concerned with a new optimization problem named "phase change rate maximization" for single-input-single-output linear time-invariant systems. The problem relates to two control problems, namely robust instability analysis against stable perturbations and minimum-norm strong stabilization. We define an index of the instability margin called "robust instability radius (RIR)" as the sm… ▽ More

    Submitted 8 October, 2023; v1 submitted 18 February, 2022; originally announced February 2022.

  11. arXiv:2202.00673  [pdf, other

    cs.LG cs.AI cs.CL cs.HC cs.SD eess.AS

    Visualizing Automatic Speech Recognition -- Means for a Better Understanding?

    Authors: Karla Markert, Romain Parracone, Mykhailo Kulakov, Philip Sperl, Ching-Yu Kao, Konstantin Böttinger

    Abstract: Automatic speech recognition (ASR) is improving ever more at mimicking human speech processing. The functioning of ASR, however, remains to a large extent obfuscated by the complex structure of the deep neural networks (DNNs) they are based on. In this paper, we show how so-called attribution methods, that we import from image recognition and suitably adapt to handle audio data, can help to clarif… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Comments: Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication

  12. arXiv:2102.03229  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Task Self-Supervised Pre-Training for Music Classification

    Authors: Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang

    Abstract: Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and annotations for audio are time consuming and less intuitive. Besides, models learned from labeled dataset often embed biases specific to that particular dataset.… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  13. arXiv:2101.00796  [pdf, other

    eess.IV

    A Reduced Codebook and Re-Interpolation Approach for Enhancing Quality in Chroma Subsampling

    Authors: Kuo-Liang Chung, Chen-Wei Kao

    Abstract: Prior to encoding RGB full-color images or Bayer color filter array (CFA) images, chroma subsampling is a necessary and crucial step at the server side. In this paper, we first propose a flow diagram approach to analyze the coordinate-inconsistency (CI) problem and the upsampling process-inconsistency (UPI) problem existing in the traditional and state-of-the-art chroma subsampling methods under t… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

  14. arXiv:2010.06676  [pdf, other

    eess.AS cs.LG cs.SD

    On Front-end Gain Invariant Modeling for Wake Word Spotting

    Authors: Yixin Gao, Noah D. Stein, Chieh-Chi Kao, Yunliang Cai, Ming Sun, Tao Zhang, Shiv Vitaladevuni

    Abstract: Wake word (WW) spotting is challenging in far-field due to the complexities and variations in acoustic conditions and the environmental interference in signal transmission. A suite of carefully designed and optimized audio front-end (AFE) algorithms help mitigate these challenges and provide better quality audio signals to the downstream modules such as WW spotter. Since the WW model is trained wi… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: In Proc. Interspeech 2020

  15. arXiv:2009.01759  [pdf, other

    eess.AS cs.SD

    Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging

    Authors: Chun-Chieh Chang, Chieh-Chi Kao, Ming Sun, Chao Wang

    Abstract: Knowledge Distillation (KD) is a popular area of research for reducing the size of large models while still maintaining good performance. The outputs of larger teacher models are used to guide the training of smaller student models. Given the repetitive nature of acoustic events, we propose to leverage this information to regulate the KD training for Audio Tagging. This novel KD method, "Intra-Utt… ▽ More

    Submitted 3 September, 2020; originally announced September 2020.

    Comments: Accepted to Interspeech 2020

  16. arXiv:2008.03350  [pdf, other

    eess.AS cs.SD

    A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling

    Authors: Chieh-Chi Kao, Bowen Shi, Ming Sun, Chao Wang

    Abstract: This paper proposes a network architecture mainly designed for audio tagging, which can also be used for weakly supervised acoustic event detection (AED). The proposed network consists of a modified DenseNet as the feature extractor, and a global average pooling (GAP) layer to predict frame-level labels at inference time. This architecture is inspired by the work proposed by Zhou et al., a well-kn… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Accepted by Interspeech 2020

  17. arXiv:2002.09143  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Few-shot acoustic event detection via meta-learning

    Authors: Bowen Shi, Ming Sun, Krishna C. Puvvada, Chieh-Chi Kao, Spyros Matsoukas, Chao Wang

    Abstract: We study few-shot acoustic event detection (AED) in this paper. Few-shot learning enables detection of new events with very limited labeled data. Compared to other research areas like computer vision, few-shot learning for audio recognition has been under-studied. We formulate few-shot AED problem and explore different ways of utilizing traditional supervised methods for this setting as well as a… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

    Comments: ICASSP 2020

  18. arXiv:2002.06279  [pdf, other

    eess.AS cs.SD

    A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification

    Authors: Chieh-Chi Kao, Ming Sun, Weiran Wang, Chao Wang

    Abstract: Acoustic event classification (AEC) and acoustic event detection (AED) refer to the task of detecting whether specific target events occur in audios. As long short-term memory (LSTM) leads to state-of-the-art results in various speech related tasks, it is employed as a popular solution for AEC as well. This paper focuses on investigating the dynamics of LSTM model on AEC tasks. It includes a detai… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Accepted to ICASSP 2020

  19. arXiv:2002.04500  [pdf

    eess.IV cs.CV q-bio.QM

    Artificial Intelligence Assistance Significantly Improves Gleason Grading of Prostate Biopsies by Pathologists

    Authors: Wouter Bulten, Maschenka Balkenhol, Jean-Joël Awoumou Belinga, Américo Brilhante, Aslı Çakır, Xavier Farré, Katerina Geronatsiou, Vincent Molinié, Guilherme Pereira, Paromita Roy, Günter Saile, Paulo Salles, Ewout Schaafsma, Joëlle Tschui, Anne-Marie Vos, Hester van Boven, Robert Vink, Jeroen van der Laak, Christina Hulsbergen-van de Kaa, Geert Litjens

    Abstract: While the Gleason score is the most important prognostic marker for prostate cancer patients, it suffers from significant observer variability. Artificial Intelligence (AI) systems, based on deep learning, have proven to achieve pathologist-level performance at Gleason grading. However, the performance of such systems can degrade in the presence of artifacts, foreign tissue, or other anomalies. Pa… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: 21 pages, 5 figures

    Journal ref: Modern Pathology, Available online 5 August 2020

  20. arXiv:1912.10323  [pdf, ps, other

    math.OC eess.SY

    Integral quadratic constraints for asynchronous sample-and-hold links

    Authors: Michael Cantoni, Chung-Yao Kao, Mark A. Fabbro

    Abstract: A model is proposed for a class of asynchronous sample-and-hold operators that is relevant in the analysis of embedded and networked systems. The model is parametrized by characteristics of the corresponding time-varying input-output delay. Uncertainty in the relationship between the timing of zero-order-hold update events at the output and the possibly aperiodic sampling events at the input means… ▽ More

    Submitted 1 January, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

  21. Automated Gleason Grading of Prostate Biopsies using Deep Learning

    Authors: Wouter Bulten, Hans Pinckaers, Hester van Boven, Robert Vink, Thomas de Bel, Bram van Ginneken, Jeroen van der Laak, Christina Hulsbergen-van de Kaa, Geert Litjens

    Abstract: The Gleason score is the most important prognostic marker for prostate cancer patients but suffers from significant inter-observer variability. We developed a fully automated deep learning system to grade prostate biopsies. The system was developed using 5834 biopsies from 1243 patients. A semi-automatic labeling technique was used to circumvent the need for full manual annotation by pathologists.… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

    Comments: 13 pages, 6 figures

    Journal ref: The Lancet Oncology, Available online 8 January 2020

  22. arXiv:1907.01448  [pdf, other

    eess.AS cs.SD

    Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification

    Authors: Chieh-Chi Kao, Ming Sun, Yixin Gao, Shiv Vitaladevuni, Chao Wang

    Abstract: This paper proposes a Sub-band Convolutional Neural Network for spoken term classification. Convolutional neural networks (CNNs) have proven to be very effective in acoustic applications such as spoken term classification, keyword spotting, speaker identification, acoustic event detection, etc. Unlike applications in computer vision, the spatial invariance property of 2D convolutional kernels does… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

    Comments: Accepted by Interspeech 2019

  23. arXiv:1907.00873  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Compression of Acoustic Event Detection Models With Quantized Distillation

    Authors: Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems. Recently deep neural network significantly advances this field and reduces detection errors to a large scale. However how to efficiently execute deep models in AED has received much less attention. Meanwhile state-of-the-art AED models are based on lar… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: Interspeech 2019

  24. arXiv:1905.00855  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training

    Authors: Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models. Our experimental results show this combined compression approach is very effective. For a three-layer long short-term memory (LSTM) based AED model, the original model size can be r… ▽ More

    Submitted 2 May, 2019; originally announced May 2019.

    Comments: NeuralPS 2018 CDNNRIA workshop

  25. arXiv:1904.12926  [pdf, other

    eess.AS cs.LG cs.SD

    Semi-supervised Acoustic Event Detection based on tri-training

    Authors: Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: This paper presents our work of training acoustic event detection (AED) models using unlabeled dataset. Recent acoustic event detectors are based on large-scale neural networks, which are typically trained with huge amounts of labeled data. Labels for acoustic events are expensive to obtain, and relevant acoustic event audios can be limited, especially for rare events. In this paper we leverage an… ▽ More

    Submitted 29 April, 2019; originally announced April 2019.

    Comments: 5 pages

  26. arXiv:1808.06676  [pdf, other

    cs.SD eess.AS

    A simple model for detection of rare sound events

    Authors: Weiran Wang, Chieh-chi Kao, Chao Wang

    Abstract: We propose a simple recurrent model for detecting rare sound events, when the time boundaries of events are available for training. Our model optimizes the combination of an utterance-level loss, which classifies whether an event occurs in an utterance, and a frame-level loss, which classifies whether each frame corresponds to the event when it does occur. The two losses make use of a shared vecto… ▽ More

    Submitted 20 August, 2018; originally announced August 2018.

    Comments: Accepted by Interspeech 2018

  27. arXiv:1808.06627  [pdf, other

    cs.SD eess.AS

    R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection

    Authors: Chieh-Chi Kao, Weiran Wang, Ming Sun, Chao Wang

    Abstract: This paper proposes a Region-based Convolutional Recurrent Neural Network (R-CRNN) for audio event detection (AED). The proposed network is inspired by Faster-RCNN, a well known region-based convolutional network framework for visual object detection. Different from the original Faster-RCNN, a recurrent layer is added on top of the convolutional network to capture the long-term temporal context fr… ▽ More

    Submitted 20 August, 2018; originally announced August 2018.

    Comments: Accepted by Interspeech 2018