Skip to main content

Showing 1–10 of 10 results for author: Chou, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2310.08715  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Toward Joint Language Modeling for Speech Units and Text

    Authors: Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli

    Abstract: Speech and text are two major forms of human language. The research community has been focusing on map** speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform co… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: EMNLP findings 2023

  2. arXiv:2310.05919  [pdf, other

    cs.CL eess.AS

    Few-Shot Spoken Language Understanding via Joint Speech-Text Models

    Authors: Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu

    Abstract: Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared representations to address the persistent challenge of limited data availability in spoken language understanding tasks. By employing a pre-trained speech-text model, we fin… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  3. arXiv:2309.08030  [pdf, other

    eess.AS cs.CL cs.SD

    AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

    Authors: Ju-Chieh Chou, Chung-Ming Chien, Karen Livescu

    Abstract: Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environments with background noise and reverberation, hampering the development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based audio-visual s… ▽ More

    Submitted 8 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: extended version for the accepted paper at ICASSP 2024

  4. arXiv:2210.01986  [pdf, other

    cs.LG eess.SP q-bio.NC

    MAtt: A Manifold Attention Network for EEG Decoding

    Authors: Yue-Ting Pan, **g-Lun Chou, Chun-Shu Wei

    Abstract: Recognition of electroencephalographic (EEG) signals highly affect the efficiency of non-invasive brain-computer interfaces (BCIs). While recent advances of deep-learning (DL)-based EEG decoders offer improved performances, the development of geometric learning (GL) has attracted much attention for offering exceptional robustness in decoding noisy EEG data. However, there is a lack of studies on t… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

  5. arXiv:2208.11878  [pdf, other

    eess.SY

    Vulnerability Analysis of Time Synchronization in Automotive Ethernet

    Authors: Rishikesh Kakade, Joey Chou, Shannon Torcato

    Abstract: The operation of many network communication protocols require accurate time synchronization between nodes. In the automotive space, IEEE 802.3bw (commonly referred to as automotive ethernet) is quickly becoming the most popular in-vehicle communication protocol between electronic control units (ECUs). The rapid advance of autonomous vehicles is predicated on a high throughput of multiple HD video… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

    Comments: 6 pages, 9 figures

  6. arXiv:2107.04734  [pdf, other

    cs.CL cs.LG eess.AS

    Layer-wise Analysis of a Self-supervised Speech Representation Model

    Authors: Ankita Pasad, Ju-Chieh Chou, Karen Livescu

    Abstract: Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Develo** such insights can help understand the capabilities and limits of t… ▽ More

    Submitted 3 December, 2022; v1 submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted to ASRU 2021. Code: https://github.com/ankitapasad/layerwise-analysis

  7. arXiv:2106.04624  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    SpeechBrain: A General-Purpose Speech Toolkit

    Authors: Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

    Abstract: SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Preprint

  8. arXiv:1904.05742  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

    Authors: Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee

    Abstract: Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers. However, such model suffers from the limitation that it can only convert the voice to the speakers in the training data, which narrows down the applicable scenario of VC. In this paper, we proposed a n… ▽ More

    Submitted 22 August, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

    Comments: Interspeech 2019

  9. arXiv:1808.03113  [pdf, other

    cs.SD eess.AS

    Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

    Authors: Cheng-chieh Yeh, Po-chun Hsu, Ju-chieh Chou, Hung-yi Lee, Lin-shan Lee

    Abstract: Speaking rate refers to the average number of phonemes within some unit time, while the rhythmic patterns refer to duration distributions for realizations of different phonemes within different phonetic structures. Both are key components of prosody in speech, which is different for different speakers. Models like cycle-consistent adversarial network (Cycle-GAN) and variational auto-encoder (VAE)… ▽ More

    Submitted 9 August, 2018; originally announced August 2018.

    Comments: 8 pages, 6 figures, Submitted to SLT 2018

  10. arXiv:1804.02812  [pdf, other

    eess.AS cs.CL cs.SD

    Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations

    Authors: Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee, Lin-shan Lee

    Abstract: Recently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied to voice conversion to a different speaker without parallel data, although in those approaches an individual model is needed for each target speaker. In this paper, we propose an adversarial learning framework for voice conversion, with which a single model can be trained to convert the voice to many different… ▽ More

    Submitted 24 June, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: Accepted to Interspeech 2018