Skip to main content

Showing 1–13 of 13 results for author: Yan, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09873  [pdf, other

    eess.AS cs.AI cs.SD

    Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

    Authors: Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui Chen, Lan Wang, Xunying Liu, Feng Tian

    Abstract: Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by interspeech 2024

  2. arXiv:2405.03254  [pdf

    eess.AS

    Automatic Assessment of Dysarthria Using Audio-visual Vowel Graph Attention Network

    Authors: Xiaokang Liu, Xiaoxia Du, Juan Liu, Rongfeng Su, Manwa Lawrence Ng, Yumei Zhang, Yudong Yang, Shaofeng Zhao, Lan Wang, Nan Yan

    Abstract: Automatic assessment of dysarthria remains a highly challenging task due to high variability in acoustic signals and the limited data. Currently, research on the automatic assessment of dysarthria primarily focuses on two approaches: one that utilizes expert features combined with machine learning, and the other that employs data-driven deep learning methods to extract representations. Research ha… ▽ More

    Submitted 6 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures, 7 tables

  3. arXiv:2403.05820  [pdf, other

    cs.SD cs.CL eess.AS

    An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data

    Authors: Yudong Yang, Rongfeng Su, Xiaokang Liu, Nan Yan, Lan Wang

    Abstract: Acoustic-to-articulatory inversion (AAI) is to convert audio into articulator movements, such as ultrasound tongue imaging (UTI) data. An issue of existing AAI methods is only using the personalized acoustic information to derive the general patterns of tongue motions, and thus the quality of generated UTI data is limited. To address this issue, this paper proposes an audio-textual diffusion model… ▽ More

    Submitted 12 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: ICASSP2024 Accept

  4. arXiv:2401.01997  [pdf

    cs.SD eess.AS

    Generating Rhythm Game Music with Jukebox

    Authors: Nicholas Yan

    Abstract: Music has always been thought of as a "human" endeavor -- when praising a piece of music, we emphasize the composer's creativity and the emotions the music invokes. Because music also heavily relies on patterns and repetition in the form of recurring melodic themes and chord progressions, artificial intelligence has increasingly been able to replicate music in a human-like fashion. This research i… ▽ More

    Submitted 28 December, 2023; originally announced January 2024.

  5. arXiv:2210.17181  [pdf, other

    eess.SP

    Device Scheduling for Over-the-Air Federated Learning with Differential Privacy

    Authors: Na Yan, Kezhi Wang, Cunhua Pan, Kok Keong Chai

    Abstract: In this paper, we propose a device scheduling scheme for differentially private over-the-air federated learning (DP-OTA-FL) systems, referred to as S-DPOTAFL, where the privacy of the participants is guaranteed by channel noise. In S-DPOTAFL, the gradients are aligned by the alignment coefficient and aggregated via over-the-air computation (AirComp). The scheme schedules the devices with better ch… ▽ More

    Submitted 13 November, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: text overlap with arXiv:2210.07669

  6. arXiv:2210.07669  [pdf, other

    eess.SP

    Toward Secure and Private Over-the-Air Federated Learning

    Authors: Na Yan, Kezhi Wang, Kangda Zhi, Cunhua Pan, Kok Keong Chai, H. Vincent Poor

    Abstract: In this paper, a novel secure and private over-the-air federated learning (SP-OTA-FL) framework is studied where noise is employed to protect data privacy and system security. Specifically, the privacy leakage of user data and the security level of the system are measured by differential privacy (DP) and mean square error security (MSE-security), respectively. To mitigate the impact of noise on le… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  7. arXiv:2110.03392  [pdf, other

    eess.AS

    Enhanced Memory Network: The novel network structure for Symbolic Music Generation

    Authors: ** Li, Haibin Liu, Nan Yan, Lan Wang

    Abstract: Symbolic melodies generation is one of the essential tasks for automatic music generation. Recently, models based on neural networks have had a significant influence on generating symbolic melodies. However, the musical context structure is complicated to capture through deep neural networks. Although long short-term memory (LSTM) is attempted to solve this problem through learning order dependenc… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

  8. arXiv:2108.08663  [pdf, other

    eess.AS

    Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel

    Authors: ** Li, Nan Yan, Lan Wang

    Abstract: Speech Emotion Recognition (SER) in a single language has achieved remarkable results through deep learning approaches in the last decade. However, cross-lingual SER remains a challenge in real-world applications due to a great difference between the source and target domain distributions. To address this issue, we propose an unsupervised cross-lingual Neural Network with Pseudo Multilabel (NNPM)… ▽ More

    Submitted 7 October, 2021; v1 submitted 19 August, 2021; originally announced August 2021.

  9. arXiv:2108.07980  [pdf, other

    eess.AS

    A Multi-level Acoustic Feature Extraction Framework for Transformer Based End-to-End Speech Recognition

    Authors: ** Li, Rongfeng Su, Xurong Xie, Nan Yan, Lan Wang

    Abstract: Transformer based end-to-end modelling approaches with multiple stream inputs have been achieved great success in various automatic speech recognition (ASR) tasks. An important issue associated with such approaches is that the intermediate features derived from each stream might have similar representations and thus it is lacking of feature diversity, such as the descriptions related to speaker ch… ▽ More

    Submitted 7 July, 2022; v1 submitted 18 August, 2021; originally announced August 2021.

    Comments: Accepted by Interspeech 2022

  10. arXiv:2108.07974  [pdf, other

    eess.AS

    FDN: Finite Difference Network with Hierarchical Convolutional Features for Text-independent Speaker Verification

    Authors: ** Li, Nan Yan, Lan Wang

    Abstract: In recent years, using raw waveforms as input for deep networks has been widely explored for the speaker verification system. For example, RawNet and RawNet2 extracted speaker's feature embeddings from waveforms automatically for recognizing their voice, which can vastly reduce the front-end computation and obtain state-of-the-art performance. However, these models do not consider the speaker's hi… ▽ More

    Submitted 7 October, 2021; v1 submitted 18 August, 2021; originally announced August 2021.

  11. arXiv:2008.04542  [pdf

    eess.SY cs.LG

    An Intelligent Control Strategy for buck DC-DC Converter via Deep Reinforcement Learning

    Authors: Chenggang Cui, Nan Yan, Chuanlin Zhang

    Abstract: As a typical switching power supply, the DC-DC converter has been widely applied in DC microgrid. Due to the variation of renewable energy generation, research and design of DC-DC converter control algorithm with outstanding dynamic characteristics has significant theoretical and practical application value. To mitigate the bus voltage stability issue in DC microgrid, an innovative intelligent con… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

  12. arXiv:2003.02314  [pdf, other

    cs.CV cs.LG eess.IV

    The Impact of Hole Geometry on Relative Robustness of In-Painting Networks: An Empirical Study

    Authors: Masood S. Mortazavi, Ning Yan

    Abstract: In-painting networks use existing pixels to generate appropriate pixels to fill "holes" placed on parts of an image. A 2-D in-painting network's input usually consists of (1) a three-channel 2-D image, and (2) an additional channel for the "holes" to be in-painted in that image. In this paper, we study the robustness of a given in-painting neural network against variations in hole geometry distrib… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

  13. arXiv:1906.09884  [pdf, ps, other

    cs.CV cs.MM eess.IV

    Channel-by-Channel Demosaicking Networks with Embedded Spectral Correlation

    Authors: Niu Yan, Jihong Ouyang

    Abstract: Demosaicking is standardly the first step in today's Image Signal Processing (ISP) pipeline of digital cameras. It reconstructs image RGB values from the spatially and spectrally sparse Color Filter Array (CFA) samples, which are the original raw data digitized from electrical signals. High quality and low cost demosaicking is not only necessary for photography, but also fundamental for many machi… ▽ More

    Submitted 22 April, 2020; v1 submitted 24 June, 2019; originally announced June 2019.