Skip to main content

Showing 1–13 of 13 results for author: Liu, A T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.04997  [pdf, ps, other

    eess.AS cs.LG

    On the social bias of speech self-supervised models

    Authors: Yi-Cheng Lin, Tzu-Quan Lin, Hsi-Che Lin, Andy T. Liu, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant concerns. Social bias refers to the phenomenon where algorithms potentially amplify disparate properties between social groups present in the data used for training. Bias in SSL models can perpetuate injustice by au… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  2. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  3. Parallel Synthesis for Autoregressive Speech Generation

    Authors: Po-chun Hsu, Da-rong Liu, Andy T. Liu, Hung-yi Lee

    Abstract: Autoregressive neural vocoders have achieved outstanding performance in speech synthesis tasks such as text-to-speech and voice conversion. An autoregressive vocoder predicts a sample at some time step conditioned on those at previous time steps. Though it synthesizes natural human speech, the iterative generation inevitably makes the synthesis time proportional to the utterance length, leading to… ▽ More

    Submitted 5 June, 2024; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  4. arXiv:2203.06849  [pdf, other

    cs.CL cs.SD eess.AS

    SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

    Authors: Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference

  5. arXiv:2110.07957  [pdf, other

    eess.AS cs.CL cs.SD

    Don't speak too fast: The impact of data bias on self-supervised speech models

    Authors: Yen Meng, Yi-Hui Chou, Andy T. Liu, Hung-yi Lee

    Abstract: Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these… ▽ More

    Submitted 26 April, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  6. arXiv:2106.00273  [pdf, other

    cs.SD cs.LG eess.AS

    Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

    Authors: Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-yi Lee

    Abstract: Previous works have shown that automatic speaker verification (ASV) is seriously vulnerable to malicious spoofing attacks, such as replay, synthetic speech, and recently emerged adversarial attacks. Great efforts have been dedicated to defending ASV against replay and synthetic speech; however, only a few approaches have been explored to deal with adversarial attacks. All the existing approaches t… ▽ More

    Submitted 4 June, 2024; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: Accepted by TASLP

  7. arXiv:2105.01051  [pdf, ps, other

    cs.CL cs.SD eess.AS

    SUPERB: Speech processing Universal PERformance Benchmark

    Authors: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge… ▽ More

    Submitted 15 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To appear in Interspeech 2021

  8. arXiv:2102.07047  [pdf, other

    eess.AS cs.AI

    Adversarial defense for automatic speaker verification by cascaded self-supervised learning models

    Authors: Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-yi Lee

    Abstract: Automatic speaker verification (ASV) is one of the core technologies in biometric identification. With the ubiquitous usage of ASV systems in safety-critical applications, more and more malicious attackers attempt to launch adversarial attacks at ASV systems. In the midst of the arms race between attack and defense in ASV, how to effectively improve the robustness of ASV against adversarial attack… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021

  9. arXiv:2007.06028  [pdf, other

    eess.AS cs.CL cs.LG

    TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

    Authors: Andy T. Liu, Shang-Wen Li, Hung-yi Lee

    Abstract: We introduce a self-supervised speech pre-training method called TERA, which stands for Transformer Encoder Representations from Alteration. Recent approaches often learn by using a single auxiliary task like contrastive prediction, autoregressive prediction, or masked reconstruction. Unlike previous methods, we use alteration along three orthogonal axes to pre-train Transformer Encoders on a larg… ▽ More

    Submitted 4 August, 2021; v1 submitted 12 July, 2020; originally announced July 2020.

    Comments: Published in IEEE/ACM TASLP, final published article available at https://ieeexplore.ieee.org/document/9478264

    Journal ref: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, Vol. 29, 2021

  10. arXiv:2006.03214  [pdf, other

    eess.AS cs.LG

    Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

    Authors: Haibin Wu, Andy T. Liu, Hung-yi Lee

    Abstract: High-performance anti-spoofing models for automatic speaker verification (ASV), have been widely used to protect ASV by identifying and filtering spoofing audio that is deliberately generated by text-to-speech, voice conversion, audio replay, etc. However, it has been shown that high-performance anti-spoofing models are vulnerable to adversarial attacks. Adversarial attacks, that are indistinguish… ▽ More

    Submitted 7 December, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

  11. arXiv:1912.02461  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Towards Robust Neural Vocoding for Speech Generation: A Survey

    Authors: Po-chun Hsu, Chun-hsuan Wang, Andy T. Liu, Hung-yi Lee

    Abstract: Recently, neural vocoders have been widely used in speech synthesis tasks, including text-to-speech and voice conversion. However, when encountering data distribution mismatch between training and inference, neural vocoders trained on real data often degrade in voice quality for unseen scenarios. In this paper, we train four common neural vocoders, including WaveNet, WaveRNN, FFTNet, Parallel Wave… ▽ More

    Submitted 20 August, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: Submitted to INTERSPEECH 2020

  12. arXiv:1910.12638  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

    Authors: Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee

    Abstract: We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech. Previous speech representation methods learn through conditioning on past frames and predicting information about future frames. Whereas Mockingjay is designed to predict the current frame through jointly conditioning on both past a… ▽ More

    Submitted 2 February, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: Accepted by ICASSP 2020, Lecture Session

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  13. arXiv:1905.11563  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion

    Authors: Andy T. Liu, Po-chun Hsu, Hung-yi Lee

    Abstract: We present an unsupervised end-to-end training scheme where we discover discrete subword units from speech without using any labels. The discrete subword units are learned under an ASR-TTS autoencoder reconstruction setting, where an ASR-Encoder is trained to discover a set of common linguistic units given a variety of speakers, and a TTS-Decoder trained to project the discovered units back to the… ▽ More

    Submitted 20 June, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Accepted by Interspeech 2019, Graz, Austria

    Journal ref: Interspeech 2019