Skip to main content

Showing 1–16 of 16 results for author: Ao, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.13340  [pdf, other

    cs.CL cs.SD eess.AS

    SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

    Authors: Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu

    Abstract: Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, includin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2402.15725  [pdf, other

    eess.AS

    Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

    Authors: Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

    Abstract: Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has just started. In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various s… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: 5 pages, 1 figures,5 tables, submit to IEEE Signal Processing Letters(SPL)

  3. arXiv:2312.16002  [pdf, other

    eess.AS cs.AI

    The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

    Authors: Meng Ge, Yizhou Peng, Yidi Jiang, **gru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

    Abstract: This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Technical Report. 2 pages. For ICMC-ASR-2023 Challenge

  4. arXiv:2309.11642  [pdf

    q-bio.TO eess.IV

    High-content stimulated Raman histology of human breast cancer

    Authors: Hongli Ni, Chinmayee Prabhu Dessai, Haonan Lin, Wei Wang, Shaoxiong Chen, Yuhao Yuan, Xiaowei Ge, Jianpeng Ao, Nolan Vild, Ji-Xin Cheng

    Abstract: Histological examination is crucial for cancer diagnosis, including hematoxylin and eosin (H&E) staining for map** morphology and immunohistochemistry (IHC) staining for revealing chemical information. Recently developed two-color stimulated Raman histology could bypass the complex tissue processing to mimic H&E-like morphology. Yet, the underlying chemical features are not revealed, compromisin… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 6 figures

  5. arXiv:2309.10674  [pdf, other

    cs.SD eess.AS

    USED: Universal Speaker Extraction and Diarization

    Authors: Junyi Ao, Mehmet Sinan Yıldırım, Ruijie Tao, Meng Ge, Shuai Wang, Yanmin Qian, Haizhou Li

    Abstract: Speaker extraction and diarization are two enabling techniques for real-world speech applications. Speaker extraction aims to extract a target speaker's voice from a speech mixture, while speaker diarization demarcates speech segments by speaker, annotating `who spoke when'. Previous studies have typically treated the two tasks independently. In practical applications, it is more meaningful to hav… ▽ More

    Submitted 9 May, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  6. arXiv:2307.09871  [pdf, other

    eess.AS

    Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder

    Authors: **gru Lin, Xianghu Yue, Junyi Ao, Haizhou Li

    Abstract: Acoustic word embeddings (AWEs) aims to map a variable-length speech segment into a fixed-dimensional representation. High-quality AWEs should be invariant to variations, such as duration, pitch and speaker. In this paper, we introduce a novel self-supervised method to learn robust AWEs from a large-scale unlabelled speech corpus. Our model, named Correspondence Transformer Encoder (CTE), employs… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  7. arXiv:2210.16755  [pdf, other

    cs.CL cs.SD eess.AS

    token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

    Authors: Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

    Abstract: Self-supervised pre-training has been successful in both text and speech processing. Speech and text offer different but complementary information. The question is whether we are able to perform a speech-text joint pre-training on unpaired speech and text. In this paper, we take the idea of self-supervised pre-training one step further and propose token2vec, a novel joint pre-training framework fo… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  8. arXiv:2210.04062  [pdf, other

    cs.SD eess.AS

    CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

    Authors: Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li

    Abstract: Speech is the surface form of a finite set of phonetic units, which can be represented by discrete codes. We propose the Code BERT (CoBERT) approach for self-supervised speech representation learning. The idea is to convert an utterance to a sequence of discrete codes, and perform code representation learning, where we predict the code representations based on a masked view of the original speech… ▽ More

    Submitted 5 July, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: Accepted by Interspeech 2023

  9. arXiv:2210.03730  [pdf, other

    cs.CL eess.AS

    SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

    Authors: Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, **yu Li, Furu Wei

    Abstract: The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder. Leveraging hidden-unit as an interface to align speech and text, we can decomp… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: 14 pages, accepted by EMNLP 2022

  10. arXiv:2206.05777  [pdf, other

    cs.CL eess.AS

    The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task

    Authors: Ziqiang Zhang, Junyi Ao, Long Zhou, Shujie Liu, Furu Wei, **yu Li

    Abstract: This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task, which translates from English audio to German, Chinese, and Japanese. The YiTrans system is built on large-scale pre-trained encoder-decoder models. More specifically, we first design a multi-stage pre-training strategy to build a multi-modality model with a large amount of labe… ▽ More

    Submitted 13 June, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: 11 pages

  11. arXiv:2203.17113  [pdf, other

    cs.SD cs.LG eess.AS

    Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

    Authors: Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, **yu Li, Yao Qian, Furu Wei

    Abstract: This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes, derived from an offline clustering model. One is to predict the pseudo codes via masked language mode… ▽ More

    Submitted 20 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  12. arXiv:2203.15610  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

    Authors: Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li

    Abstract: Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under low-resource settings. To this end, we propose LightHuBERT, a once-for-all Transformer compression framework, to find the desired architectures automatically by pr… ▽ More

    Submitted 18 June, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, accepted to Insterspeech 2022

  13. arXiv:2110.07205  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

    Authors: Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, **yu Li, Furu Wei

    Abstract: Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After prepro… ▽ More

    Submitted 24 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted by ACL 2022 main conference

  14. arXiv:2110.05036  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Multi-View Self-Attention Based Transformer for Speaker Recognition

    Authors: Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang

    Abstract: Initially developed for natural language processing (NLP), Transformer model is now widely used for speech processing tasks such as speaker recognition, due to its powerful sequence modeling capabilities. However, conventional self-attention mechanisms are originally designed for modeling textual sequence without considering the characteristics of speech and speaker modeling. Besides, different Tr… ▽ More

    Submitted 27 January, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Paper to appear at ICASSP 2022

  15. Lightning Impulse Current Tests on some Electroconductive Fabrics

    Authors: Jorge A. Cristancho C., Carlos A. Rivera G., Jorge E. Rodriguez M., John J. Pantoja A., Liz K. Herrera Q., Francisco Roman

    Abstract: On the search of lightweight lightning protection materials that can be used as part of lightning protection systems, we investigate some types of electroconductive fabrics by applying several lightning impulse currents in laboratory. Samples of four commercially available electroconductive textiles were analyzed, two rip-stop, a plain-weave, a nonwoven, and additionally a carbon-impregnated polym… ▽ More

    Submitted 2 May, 2023; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: Published on Journal of Applied Research and Technology 21 (2023) 241-255; 15 pages, 6 figures, 3 tables

    Journal ref: Journal of Applied Research and Technology 2023, 21(2), 241-255

  16. arXiv:1810.05502  [pdf

    eess.SP cs.DC

    Asynchronous Wi-Fi Control Interface (AWCI) Using Socket IO Technology

    Authors: Devipriya T K, Jovita Franci A, Deepa R, Godwin Sam Josh

    Abstract: The Internet of Things (IoT) is a system of interrelated computing devices to the Internet that are provided with unique identifiers which has the ability to transfer data over a network without requiring human-to- human or human-to- computer interaction. Raspberry pi-3 a popular, cheap, small and powerful computer with built in Wi-Fi can be used to make any devices smart by connecting to that par… ▽ More

    Submitted 6 October, 2018; originally announced October 2018.

    Comments: 5 pages, 5 figures, published with Global Research and Development Journal for Engineering

    Journal ref: Global Research and Development Journal for Engineering, 1(3), pp.66-70, 2017