Skip to main content

Showing 1–11 of 11 results for author: Dong, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  2. arXiv:2309.10795  [pdf, other

    eess.AS

    Exploring Speech Enhancement for Low-resource Speech Synthesis

    Authors: Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang

    Abstract: High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive. Applying speech enhancement on Automatic Speech Recognition (ASR) corpus mitigates the issue by augmenting the training data, while how the nonlinear speech distortion brought by speech enhancement models affects TTS… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  3. arXiv:2309.07707  [pdf, other

    cs.CL cs.SD eess.AS

    CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

    Authors: Heng-Jui Chang, Ning Dong, Ruslan Mavlyutov, Sravya Popuri, Yu-An Chung

    Abstract: Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks. Due to the high cost of develo** these large models, building new encoders for new tasks and deploying them to on-device applications are infeasible. Prior studies propose model compression methods to address this issue, but those works focus on smaller models a… ▽ More

    Submitted 27 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  4. arXiv:2307.08655  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Speech-to-Speech Translation into Multiple Target Languages

    Authors: Hongyu Gong, Ning Dong, Sravya Popuri, Vedanuj Goswami, Ann Lee, Juan Pino

    Abstract: Speech-to-speech translation (S2ST) enables spoken communication between people talking in different languages. Despite a few studies on multilingual S2ST, their focus is the multilinguality on the source side, i.e., the translation from multiple source languages to one target language. We present the first work on multilingual S2ST supporting multiple target languages. Leveraging recent advance i… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  5. arXiv:2305.03101  [pdf, other

    cs.CL cs.SD eess.AS

    Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

    Authors: Yun Tang, Anna Y. Sun, Hirofumi Inaguma, Xinyue Chen, Ning Dong, Xutai Ma, Paden D. Tomasello, Juan Pino

    Abstract: Transducer and Attention based Encoder-Decoder (AED) are two widely used frameworks for speech-to-text tasks. They are designed for different purposes and each has its own benefits and drawbacks for speech-to-text tasks. In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks. The new… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: ACL 2023 main conference

  6. arXiv:2211.04508  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

    Authors: Paul-Ambroise Duquenne, Hongyu Gong, Ning Dong, **gfei Du, Ann Lee, Vedanuj Goswani, Changhan Wang, Juan Pino, Benoît Sagot, Holger Schwenk

    Abstract: We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech. To evaluate the quality of this parallel speech, we train bilingual speech-to-speech translation models on mined data only and establish extensive basel… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 18 pages

  7. arXiv:2109.07504  [pdf, other

    cs.LG cs.CV eess.IV

    Federated Contrastive Learning for Decentralized Unlabeled Medical Images

    Authors: Nanqing Dong, Irina Voiculescu

    Abstract: A label-efficient paradigm in computer vision is based on self-supervised contrastive pre-training on unlabeled data followed by fine-tuning with a small number of labels. Making practical use of a federated computing environment in the clinical domain and learning on medical images poses specific challenges. In this work, we propose FedMoCo, a robust federated contrastive learning (FCL) framework… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: Accepted by MICCAI 2021

  8. arXiv:2011.14164  [pdf, other

    cs.CV cs.LG eess.IV

    Towards Robust Partially Supervised Multi-Structure Medical Image Segmentation on Small-Scale Data

    Authors: Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Min Xu, Irina Voiculescu, Eric P. Xing

    Abstract: The data-driven nature of deep learning (DL) models for semantic segmentation requires a large number of pixel-level annotations. However, large-scale and fully labeled medical datasets are often unavailable for practical tasks. Recently, partially supervised methods have been proposed to utilize images with incomplete labels in the medical domain. To bridge the methodological gaps in partially su… ▽ More

    Submitted 26 October, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

    Comments: Accepted by Applied Soft Computing

  9. arXiv:2008.03435  [pdf, other

    eess.IV cs.CV cs.LG

    Auto-weighting for Breast Cancer Classification in Multimodal Ultrasound

    Authors: Wang Jian, Miao Juzheng, Yang Xin, Li Rui, Zhou Guangquan, Huang Yuhao, Lin Zehui, Xue Wufeng, Jia Xiaohong, Zhou Jianqiao, Huang Ruobing, Ni Dong

    Abstract: Breast cancer is the most common invasive cancer in women. Besides the primary B-mode ultrasound screening, sonographers have explored the inclusion of Doppler, strain and shear-wave elasticity imaging to advance the diagnosis. However, recognizing useful patterns in all types of images and weighing up the significance of each modality can elude less-experienced clinicians. In this paper, we explo… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Early Accepted by MICCAI 2020

  10. arXiv:1905.13044  [pdf

    eess.SY cs.HC

    Shared control schematic for brain controlled vehicle based on fuzzy logic

    Authors: Na Dong, Wen-qi Zhang, Zhong-ke Gao

    Abstract: Brain controlled vehicle refers to the vehicle that obtains control commands by analyzing the driver's EEG through Brain-Computer Interface (BCI). The research of brain controlled vehicles can not only promote the integration of brain machines, but also expand the range of activities and living ability of the disabled or some people with limited physical activity, so the research of brain controll… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  11. arXiv:1905.12240  [pdf

    eess.SY cs.HC cs.RO

    Research on fuzzy PID Shared control method of small brain-controlled uav

    Authors: Na Dong, Wen-qi Zhang, Zhong-ke Gao

    Abstract: Brain-controlled unmanned aerial vehicle (uav) is a uav that can analyze human brain electrical signals through BCI to obtain flight commands. The research of brain-controlled uav can promote the integration of brain-computer and has a broad application prospect. At present, BCI still has some problems, such as limited recognition accuracy, limited recognition time and small number of recognition… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.