Skip to main content

Showing 1–50 of 57 results for author: Deng, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui **, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  2. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui **, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui **g, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  4. arXiv:2406.08920  [pdf, other

    cs.SD cs.AI eess.AS

    AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

    Authors: Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu

    Abstract: Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2404.17903  [pdf, other

    eess.SP

    3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints

    Authors: Jiayin Deng, Zhiqun Hu, Yuxuan Xia, Zhaoming Lu, Xiangming Wen

    Abstract: Roadside perception is a key component in intelligent transportation systems. In this paper, we present a novel three-dimensional (3D) extended object tracking (EOT) method, which simultaneously estimates the object kinematics and extent state, in roadside perception using both the radar and camera data. Because of the influence of sensor viewing angle and limited angle resolution, radar measureme… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  6. arXiv:2404.14693  [pdf, other

    cs.CR cs.CV eess.IV

    Double Privacy Guard: Robust Traceable Adversarial Watermarking against Face Recognition

    Authors: Yunming Zhang, Dengpan Ye, Sipeng Shen, Caiyun Xie, Ziyi Liu, Jiacheng Deng, Long Tang

    Abstract: The wide deployment of Face Recognition (FR) systems poses risks of privacy leakage. One countermeasure to address this issue is adversarial attacks, which deceive malicious FR searches but simultaneously interfere the normal identity verification of trusted authorizers. In this paper, we propose the first Double Privacy Guard (DPG) scheme based on traceable adversarial watermarking. DPG employs a… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  7. arXiv:2404.02663  [pdf

    eess.SP cs.IT

    Ground-to-UAV sub-Terahertz channel measurement and modeling

    Authors: Da Li, Peian Li, Jiabiao Zhao, Jianjian Liang, Jiacheng Liu, Guohao Liu, Yuanshuai Lei, Wenbo Liu, Jianqin Deng, Fuyong Liu, Jianjun Ma

    Abstract: Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless… ▽ More

    Submitted 28 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Submitted to Optics Express

  8. arXiv:2404.00549  [pdf

    eess.IV cs.CV

    Pneumonia App: a mobile application for efficient pediatric pneumonia diagnosis using explainable convolutional neural networks (CNN)

    Authors: Jiaming Deng, Zhenglin Chen, Minjiang Chen, Lulu Xu, Jiaqi Yang, Zhendong Luo, Peiwu Qin

    Abstract: Mycoplasma pneumoniae pneumonia (MPP) poses significant diagnostic challenges in pediatric healthcare, especially in regions like China where it's prevalent. We introduce PneumoniaAPP, a mobile application leveraging deep learning techniques for rapid MPP detection. Our approach capitalizes on convolutional neural networks (CNNs) trained on a comprehensive dataset comprising 3345 chest X-ray (CXR)… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 27 Pages,7 figures

    MSC Class: 68 ACM Class: J.3

  9. arXiv:2403.16643  [pdf, other

    eess.IV cs.CV

    Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

    Authors: Qing** Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu

    Abstract: Artifact-free super-resolution (SR) aims to translate low-resolution images into their high-resolution counterparts with a strict integrity of the original content, eliminating any distortions or synthetic details. While traditional diffusion-based SR techniques have demonstrated remarkable abilities to enhance image detail, they are prone to artifact introduction during iterative procedures. Such… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  10. arXiv:2403.02307  [pdf, other

    eess.IV cs.CV

    Harnessing Intra-group Variations Via a Population-Level Context for Pathology Detection

    Authors: P. Bilha Githinji, Xi Yuan, Zhenglin Chen, Ijaz Gul, Dingqi Shang, Wen Liang, Jianming Deng, Dan Zeng, Dongmei yu, Chenggang Yan, Peiwu Qin

    Abstract: Realizing sufficient separability between the distributions of healthy and pathological samples is a critical obstacle for pathology detection convolutional models. Moreover, these models exhibit a bias for contrast-based images, with diminished performance on texture-based medical images. This study introduces the notion of a population-level context for pathology detection and employs a graph th… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  11. arXiv:2312.08641  [pdf, other

    eess.AS cs.SD

    Towards Automatic Data Augmentation for Disordered Speech Recognition

    Authors: Zengrui **, Xurong Xie, Tianzi Wang, Mengzhe Geng, Jiajun Deng, Guinan Li, Shujie Hu, Xunying Liu

    Abstract: Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity. This paper presents a reinforcement learning (RL) based on-the-fly data augmentation approach for training state-of-the-art PyChain TDNN and end-to-end Conformer ASR systems on such data. The handcrafted temporal and spectral mask operations in the standard SpecAugment method that are task an… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: To appear at IEEE ICASSP 2024

  12. arXiv:2308.07293  [pdf, other

    cs.SD cs.LG eess.AS

    DiffSED: Sound Event Detection with Denoising Diffusion

    Authors: Swapnil Bhosale, Sauradip Nag, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu

    Abstract: Sound Event Detection (SED) aims to predict the temporal boundaries of all the events of interest and their class labels, given an unconstrained audio sample. Taking either the splitand-classify (i.e., frame-level) strategy or the more principled event-level modeling approach, all existing methods consider the SED problem from the discriminative learning perspective. In this work, we reformulate t… ▽ More

    Submitted 16 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

  13. arXiv:2307.02909  [pdf, other

    eess.AS cs.AI cs.SD

    Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

    Authors: Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui **, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu

    Abstract: Accurate recognition of cocktail party speech containing overlap** speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is pro… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  14. arXiv:2306.15265  [pdf, other

    eess.AS cs.LG

    Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition

    Authors: Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui **, Mengzhe Geng, Yi Wang, Helen Meng, Xunying Liu

    Abstract: Automatic recognition of disordered and elderly speech remains highly challenging tasks to date due to data scarcity. Parameter fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models, while neural architecture hyper-parameters are set using expert knowledge and remain unchanged. This paper investigates hyper-parameter adaptation for Conformer AS… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 5 pages, 3 figures, 3 tables, accepted by Interspeech2023

  15. arXiv:2306.14608  [pdf, other

    eess.AS cs.CL

    Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

    Authors: Jiajun Deng, Guinan Li, Xurong Xie, Zengrui **, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu

    Abstract: Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised speaker-environment adaptive training and test time adaptation approach for Conformer ASR models. Speaker and environment level characteristics are separately mo… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  16. arXiv:2306.13307  [pdf, other

    eess.AS cs.CL

    Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

    Authors: Mingyu Cui, Jiawen Kang, Jiajun Deng, Xi Yin, Yutao Xie, Xie Chen, Xunying Liu

    Abstract: Current ASR systems are mainly trained and evaluated at the utterance level. Long range cross utterance context can be incorporated. A key task is to derive a suitable compact representation of the most relevant history contexts. In contrast to previous researches based on either LSTM-RNN encoded histories that attenuate the information from longer range contexts, or frame level concatenation of t… ▽ More

    Submitted 25 June, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  17. arXiv:2305.10659  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Use of Speech Impairment Severity for Dysarthric Speech Recognition

    Authors: Mengzhe Geng, Zengrui **, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu

    Abstract: A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity. Most prior researches on addressing this issue focused on using speaker-identity only. To this end, this paper proposes a novel set of techniques to use both severity and speaker-identity in dysarthric speech recognit… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH2023

  18. arXiv:2302.14564  [pdf, other

    cs.SD cs.AI eess.AS

    Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Zengrui **, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng

    Abstract: Automatic recognition of disordered and elderly speech remains a highly challenging task to date due to the difficulty in collecting such data in large quantities. This paper explores a series of approaches to integrate domain adapted SSL pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends… ▽ More

    Submitted 22 June, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: accepted by ICASSP 2023

  19. arXiv:2302.12434  [pdf, other

    cs.SD cs.AI eess.AS

    Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion

    Authors: Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu

    Abstract: Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling t… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted by USENIX Security Symposium 2023. Please cite this paper as "Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu. Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion. In 32nd USENIX Security Symposium (USENIX Security 23)."

  20. arXiv:2302.07521  [pdf, other

    eess.AS cs.SD

    Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

    Authors: Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui **, Guinan Li, Shujie Hu, Xunying Liu

    Abstract: Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compac… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  21. arXiv:2212.14354  [pdf

    eess.SP cs.CE

    A Fault Location Method Based on Electromagnetic Transient Convolution Considering Frequency-Dependent Parameters and Lossy Ground

    Authors: Guanbo Wang, Chijie Zhuang, Jun Deng, Zhicheng Xie

    Abstract: As the capacity of power systems grows, the need for quick and precise short-circuit fault location becomes increasingly vital for ensuring the safe and continuous supply of power. In this paper, we propose a fault location method that utilizes electromagnetic transient convolution (EMTC). We assess the performance of a naive EMTC implementation in multi-phase power lines by using frequency-depend… ▽ More

    Submitted 31 December, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

  22. arXiv:2212.00014  [pdf, other

    eess.IV cs.LG physics.optics

    Attentional Ptycho-Tomography (APT) for three-dimensional nanoscale X-ray imaging with minimal data acquisition and computation time

    Authors: Iksung Kang, Ziling Wu, Yi Jiang, Yudong Yao, Jun**g Deng, Jeffrey Klug, Stefan Vogt, George Barbastathis

    Abstract: Noninvasive X-ray imaging of nanoscale three-dimensional objects, e.g. integrated circuits (ICs), generally requires two types of scanning: ptychographic, which is translational and returns estimates of complex electromagnetic field through ICs; and tomographic scanning, which collects complex field projections from multiple angles. Here, we present Attentional Ptycho-Tomography (APT), an approach… ▽ More

    Submitted 29 November, 2022; originally announced December 2022.

    Comments: 27 pages, 7 figures

  23. Super-resolution Reconstruction of Single Image for Latent features

    Authors: Xin Wang, **g-Ke Yan, **g-Ye Cai, Jian-Hua Deng, Qin Qin, Yao Cheng

    Abstract: Single-image super-resolution (SISR) typically focuses on restoring various degraded low-resolution (LR) images to a single high-resolution (HR) image. However, during SISR tasks, it is often challenging for models to simultaneously maintain high quality and rapid sampling while preserving diversity in details and texture features. This challenge can lead to issues such as model collapse, lack of… ▽ More

    Submitted 9 November, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Journal ref: Computational Visual Media,2023

  24. arXiv:2211.01646  [pdf, other

    eess.AS cs.SD

    Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

    Authors: Zengrui **, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu

    Abstract: Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of impaired speech required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personali… ▽ More

    Submitted 19 March, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  25. arXiv:2210.16539  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Exploiting prompt learning with pre-trained language models for Alzheimer's Disease detection

    Authors: Yi Wang, Jiajun Deng, Tianzi Wang, Bo Zheng, Shoukang Hu, Xunying Liu, Helen Meng

    Abstract: Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression. Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Textual embedding features produced by pre-trained language models (PLMs) such as BERT are widely used in such systems. However, PLM domain f… ▽ More

    Submitted 31 March, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: Accepted ICASSP 2023 (will update with IEEE vision later)

  26. arXiv:2210.15140  [pdf, other

    cs.SD cs.AI cs.CR eess.AS

    V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time Voice Anonymization

    Authors: Jiangyi Deng, Fei Teng, Yanjiao Chen, Xiaofu Chen, Zhaohui Wang, Wenyuan Xu

    Abstract: Voice data generated on instant messaging or social media applications contains unique user voiceprints that may be abused by malicious adversaries for identity inference or identity theft. Existing voice anonymization techniques, e.g., signal processing and voice conversion/synthesis, suffer from degradation of perceptual quality. In this paper, we develop a voice anonymization system, named V-Cl… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted by USENIX Security Symposium 2023

  27. arXiv:2210.11422  [pdf, other

    eess.SP

    A Hybrid Millimeter-wave Channel Simulator for Joint Communication and Localization

    Authors: Junquan Deng

    Abstract: Joint communication and localization~(JCL) is envisioned to be a key feature in future millimeter-wave~(mmWave) wireless networks for context-aware applications. A map-based channel model considering both site-specific radio environment and statistical channel characteristics is essential to facilitate JCL research and to evaluate the performance of various JCL systems. To this end, this paper pre… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: 6 pages,8 figures,submitted to ICCC 2022

  28. arXiv:2208.14007  [pdf, other

    cs.LG eess.SP q-bio.NC

    Finding neural signatures for obesity through feature selection on source-localized EEG

    Authors: Yuan Yue, Dirk De Ridder, Patrick Manning, Samantha Ross, Jeremiah D. Deng

    Abstract: Obesity is a serious issue in the modern society and is often associated to significantly reduced quality of life. Current research conducted to explore obesity-related neurological evidences using electroencephalography (EEG) data are limited to traditional approaches. In this study, we developed a novel machine learning model to identify brain networks of obese females using alpha band functiona… ▽ More

    Submitted 21 June, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 4 pages, 3 figures, conference submission

  29. arXiv:2206.14940  [pdf, other

    eess.IV

    Physics-Inspired Unsupervised Classification for Region of Interest in X-Ray Ptychography

    Authors: Dergan Lin, Yi Jiang, Jun**g Deng, Zichao Wendy Di

    Abstract: X-ray ptychography allows for large fields to be imaged at high resolution at the cost of additional computational expense due to the large volume of data. Given limited information regarding the object, the acquired data often has an excessive amount of information that is outside the region of interest (RoI). In this work we propose a physics-inspired unsupervised learning algorithm to identify… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

  30. arXiv:2206.13232  [pdf, other

    eess.AS cs.LG cs.SD

    Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection

    Authors: Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui **, Xunying Liu, Helen Meng

    Abstract: Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression. This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection. The baseline Conformer system trained with speed perturbation and SpecAugment based data augmentation is significantl… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 5 pages, 1 figure, accepted by INTERSPEECH 2022

  31. arXiv:2206.12045  [pdf, other

    eess.AS cs.SD

    Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

    Authors: Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui **, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng

    Abstract: A key challenge for automatic speech recognition (ASR) systems is to model the speaker level variability. In this paper, compact speaker dependent learning hidden unit contributions (LHUC) are used to facilitate both speaker adaptive training (SAT) and test time unsupervised speaker adaptation for state-of-the-art Conformer based end-to-end ASR systems. The sensitivity during adaptation to supervi… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: It's accepted to INTERSPEECH 2022. arXiv admin note: text overlap with arXiv:2206.11596

  32. Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

    Authors: Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng

    Abstract: Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them. This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems. In multi-pass rescoring, state-of-the-art hybrid LF-MMI trained CNN-TDNN system fea… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: It' s accepted to ISCA 2022

  33. arXiv:2206.07327  [pdf, other

    eess.AS cs.AI

    Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Xunying Liu, Helen Meng

    Abstract: Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech. Their practical application to atypical task domains such as elderly and disordered speech across languages is often limited by the difficulty in collecting such specialist data from target speakers. This pa… ▽ More

    Submitted 22 June, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: accepted by INTERSPEECH 2023

  34. A BCS-GDE Multi-objective Optimization Algorithm for Combined Cooling, Heating and Power Model with Decision Strategies

    Authors: Jiaze Sun, Jiahui Deng, Yang Li, Nan Han

    Abstract: District energy systems can not only reduce energy consumption but also set energy supply dispatching schemes according to demand. In addition to economic cost, energy consumption and pollutant are more worthy of attention when evaluating combined cooling, heating and power (CCHP) models. In this paper, the CCHP model is established with the objective of economic cost, primary energy consumption,… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: Accpeted by Applied Thermal Engineering. arXiv admin note: substantial text overlap with arXiv:2108.07394

    Journal ref: Applied Thermal Engineering 213 (2022) 118685

  35. arXiv:2205.07711  [pdf, other

    cs.SD cs.CR eess.AS

    Transferability of Adversarial Attacks on Synthetic Speech Detection

    Authors: Jiacheng Deng, Shunyi Chen, Li Dong, Diqun Yan, Rangding Wang

    Abstract: Synthetic speech detection is one of the most important research problems in audio security. Meanwhile, deep neural networks are vulnerable to adversarial attacks. Therefore, we establish a comprehensive benchmark to evaluate the transferability of adversarial attacks on the synthetic speech detection task. Specifically, we attempt to investigate: 1) The transferability of adversarial attacks betw… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: 5 pages, submit to Interspeech2022

  36. Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

    Authors: Zengrui **, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu

    Abstract: Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. It is difficult to collect large quantities of such data for ASR system development due to the mobility issues often found among these users. To this end, data augmentation techniques play a vital role… ▽ More

    Submitted 23 June, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2202.10290

  37. arXiv:2204.03471  [pdf, other

    cs.AI cs.LG eess.SY

    DynLight: Realize dynamic phase duration with multi-level traffic signal control

    Authors: Liang Zhang, Shubin Xie, Jianming Deng

    Abstract: We would like to withdraw this article for the following reasons: 1 this article is not satisfactory for limited language and theoretical description; 2 we have enriched and revised this article with the help of other authors; 3 we must update the author contribution information.

    Submitted 13 March, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: We would like to withdraw this article for the following reasons: 1 this article is not satisfactory for limited language and theoretical description; 2 we have enriched and revised this article with the help of other authors; 3 we must update the author contribution information. PLease see: arXiv:2211.01025

  38. arXiv:2204.01977  [pdf, ps, other

    cs.SD cs.CV cs.MM eess.AS

    Audio-visual multi-channel speech separation, dereverberation and recognition

    Authors: Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng

    Abstract: Despite the rapid advance of automatic speech recognition (ASR) technologies, accurate recognition of cocktail party speech characterised by the interference from overlap** speakers, background noise and room reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, audio-visual speech enhancement techniques have been d… ▽ More

    Submitted 8 April, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted by ICASSP 2022

  39. arXiv:2202.03144  [pdf, other

    physics.ins-det eess.IV

    Ptychopy: GPU framework for ptychographic data analysis

    Authors: Ke Yue, Jun**g Deng, Yi Jiang, Youssef Nashed, David Vine, Stefan Vogt

    Abstract: X-ray ptychography imaging at synchrotron facilities like the Advanced Photon Source (APS) involves controlling instrument hardwares to collect a set of diffraction patterns from overlap** coherent illumination spots on extended samples, managing data storage, reconstructing ptychographic images from acquired diffraction patterns, and providing the visualization of results and feedback. In addit… ▽ More

    Submitted 24 January, 2022; originally announced February 2022.

    Comments: X-Ray Nanoimaging: Instruments and Methods V

  40. arXiv:2201.03943  [pdf, other

    eess.AS cs.SD

    Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

    Authors: Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng

    Abstract: State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive. The optimal design of deep neural networks (DNNs) for these systems often require expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper-parameters of factored time delay neural network… ▽ More

    Submitted 28 March, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP). arXiv admin note: text overlap with arXiv:2007.08818

  41. arXiv:2201.00006  [pdf, ps, other

    cs.LG cs.AI eess.SY

    Leveraging Queue Length and Attention Mechanisms for Enhanced Traffic Signal Control Optimization

    Authors: Liang Zhang, Shubin Xie, Jianming Deng

    Abstract: Reinforcement learning (RL) techniques for traffic signal control (TSC) have gained increasing popularity in recent years. However, most existing RL-based TSC methods tend to focus primarily on the RL model structure while neglecting the significance of proper traffic state representation. Furthermore, some RL-based methods heavily rely on expert-designed traffic signal phase competition. In this… ▽ More

    Submitted 25 September, 2023; v1 submitted 30 December, 2021; originally announced January 2022.

    Comments: 16 pages, 5 figures

    Journal ref: "Leveraging Queue Length and Attention Mechanisms for Enhanced Traffic Signal Control Optimization." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer Nature Switzerland, 2023

  42. arXiv:2111.13573  [pdf, other

    cs.LG cs.NI eess.SP

    Semi-supervised t-SNE for Millimeter-wave Wireless Localization

    Authors: Junquan Deng, Wei Shi, Jian Hu, Xianlong Jiao

    Abstract: We consider the mobile localization problem in future millimeter-wave wireless networks with distributed Base Stations (BSs) based on multi-antenna channel state information (CSI). For this problem, we propose a Semi-supervised tdistributed Stochastic Neighbor Embedding (St-SNE) algorithm to directly embed the high-dimensional CSI samples into the 2D geographical map. We evaluate the performance o… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: 5 pages,6 figures, accepted to 7th International Conference on Computer and Communications

  43. arXiv:2110.11998  [pdf, other

    eess.IV cs.CV

    Semi-Supervised Semantic Segmentation of Vessel Images using Leaking Perturbations

    Authors: **yong Hou, Xuejie Ding, Jeremiah D. Deng

    Abstract: Semantic segmentation based on deep learning methods can attain appealing accuracy provided large amounts of annotated samples. However, it remains a challenging task when only limited labelled data are available, which is especially common in medical imaging. In this paper, we propose to use Leaking GAN, a GAN-based semi-supervised architecture for retina vessel semantic segmentation. Our key ide… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: To appear in WACV'22

  44. arXiv:2110.09966  [pdf, other

    eess.SP cs.LG

    SleepPriorCL: Contrastive Representation Learning with Prior Knowledge-based Positive Mining and Adaptive Temperature for Sleep Staging

    Authors: Hongjun Zhang, **g Wang, Qinfeng Xiao, Jiaoxue Deng, Youfang Lin

    Abstract: The objective of this paper is to learn semantic representations for sleep stage classification from raw physiological time series. Although supervised methods have gained remarkable performance, they are limited in clinical situations due to the requirement of fully labeled data. Self-supervised learning (SSL) based on contrasting semantically similar (positive) and dissimilar (negative) pairs of… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  45. A BCS-GDE Algorithm for Multi-objective Optimization of Combined Cooling, Heating and Power Model

    Authors: Jiaze Sun, Jiahui Deng, Yang Li, Shuaiyin Ma, Nan Han

    Abstract: District energy systems can not only reduce energy consumption but also set energy supply dispatching schemes according to demand. In this paper, the combined cooling heating and power economic emission dispatch (CCHPEED) model is established with the objective of economic cost, primary energy consumption, and pollutant emissions, as well as three decision-making strategies, are proposed to meet t… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

    Comments: Accepted by 2021 IEEE/IAS Industrial and Commercial Power System Asia (IEEE I&CPS Asia 2021)

    Journal ref: 2021 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia)

  46. arXiv:2106.08505  [pdf, other

    cs.CV cs.LG eess.IV

    Dynamically Grown Generative Adversarial Networks

    Authors: Lanlan Liu, Yuting Zhang, Jia Deng, Stefano Soatto

    Abstract: Recent work introduced progressive network growing as a promising way to ease the training for large GANs, but the model design and architecture-growing strategy still remain under-explored and needs manual design for different image data. In this paper, we propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation. T… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted to AAAI 2021

  47. arXiv:2012.12686  [pdf, other

    eess.IV math.NA physics.comp-ph

    Adorym: A multi-platform generic x-ray image reconstruction framework based on automatic differentiation

    Authors: Ming Du, Saugat Kandel, Jun**g Deng, Xiao**g Huang, Arnaud Demortiere, Tuan Tu Nguyen, Remi Tucoulou, Vincent De Andrade, Qiaoling **, Chris Jacobsen

    Abstract: We describe and demonstrate an optimization-based x-ray image reconstruction framework called Adorym. Our framework provides a generic forward model, allowing one code framework to be used for a wide range of imaging methods ranging from near-field holography to and fly-scan ptychographic tomography. By using automatic differentiation for optimization, Adorym has the flexibility to refine experime… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    MSC Class: 78-04

  48. arXiv:2012.11105  [pdf, other

    eess.SP cs.LG

    Resting-state EEG sex classification using selected brain connectivity representation

    Authors: Jean Li, Jeremiah D. Deng, Divya Adhia, Dirk de Ridder

    Abstract: Effective analysis of EEG signals for potential clinical applications remains a challenging task. So far, the analysis and conditioning of EEG have largely remained sex-neutral. This paper employs a machine learning approach to explore the evidence of sex effects on EEG signals, and confirms the generality of these effects by achieving successful sex prediction of resting-state EEG signals. We hav… ▽ More

    Submitted 20 December, 2020; originally announced December 2020.

    Comments: 11 pages, 6 figures, book chapter to be published by Springer

  49. arXiv:2007.06341  [pdf, other

    eess.IV cs.CV

    DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation

    Authors: Shunjie Dong, **long Zhao, Maojun Zhang, Zhengxue Shi, Jianing Deng, Yiyu Shi, Mei Tian, Cheng Zhuo

    Abstract: Automatic segmentation of cardiac magnetic resonance imaging (MRI) facilitates efficient and accurate volume measurement in clinical applications. However, due to anisotropic resolution and ambiguous border (e.g., right ventricular endocardium), existing methods suffer from the degradation of accuracy and robustness in 3D cardiac MRI video segmentation. In this paper, we propose a novel Deformable… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

  50. arXiv:2004.04975  [pdf, other

    eess.SP

    Supervised Learning Based Online Tracking Filters: An XGBoost Implementation

    Authors: Jie Deng, Wei Yi

    Abstract: The target state filter is an important module in the traditional target tracking framework. In order to get satisfactory tracking results, traditional Bayesian methods usually need accurate motion models, which require the complicated prior information and parameter estimation. Therefore, the modeling process has a key impact on traditional Bayesian filters for target tracking. However, when enco… ▽ More

    Submitted 4 May, 2020; v1 submitted 10 April, 2020; originally announced April 2020.