Skip to main content

Showing 1–50 of 100 results for author: Du, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.15160  [pdf, other

    eess.AS eess.SP

    Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

    Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

    Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by icme2024

  2. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2406.04582  [pdf, other

    eess.AS cs.SD

    Neural Codec-based Adversarial Sample Detection for Speaker Verification

    Authors: Xuanjun Chen, Jiawei Du, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

    Abstract: Automatic Speaker Verification (ASV), increasingly used in security-critical applications, faces vulnerabilities from rising adversarial attacks, with few effective defenses available. In this paper, we propose a neural codec-based adversarial sample detection method for ASV. The approach leverages the codec's ability to discard redundant perturbations and retain essential information. Specificall… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.02262  [pdf, other

    eess.SP

    A DAFT Based Unified Waveform Design Framework for High-Mobility Communications

    Authors: Xingyao Zhang, Haoran Yin, Yanqun Tang, Yu Zhou, Yuqing Liu, **ming Du, Yipeng Ding

    Abstract: With the increasing demand for multi-carrier communication in high-mobility scenarios, it is urgent to design new multi-carrier communication waveforms that can resist large delay-Doppler spreads. Various multi-carrier waveforms in the transform domain were proposed for the fast time-varying channels, including orthogonal time frequency space (OTFS), orthogonal chirp division multiplexing (OCDM),… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.16952  [pdf, other

    eess.AS

    A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

    Authors: Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

    Abstract: In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  6. arXiv:2405.15863  [pdf, other

    cs.SD cs.AI eess.AS

    Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

    Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

    Abstract: In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  7. arXiv:2405.09353  [pdf, other

    eess.IV cs.CV

    Large coordinate kernel attention network for lightweight image super-resolution

    Authors: Fangwei Hao, Jiesheng Wu, Haotian Lu, Ji Du, **g Xu

    Abstract: The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in comp… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  8. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  9. arXiv:2404.03329  [pdf

    cs.LG eess.SP stat.ML

    DeepFunction: Deep Metric Learning-based Imbalanced Classification for Diagnosing Threaded Pipe Connection Defects using Functional Data

    Authors: Yukun Xie, Juan Du, Chen Zhang

    Abstract: In modern manufacturing, most of the product lines are conforming. Few products are nonconforming but with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing development, signals of process variables can be collected in high resolution, which can be regarded as multichannel functional data. They have abundant infor… ▽ More

    Submitted 24 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Revised version for submission to IISE Transactions

  10. arXiv:2403.11445  [pdf, other

    cs.CR cs.DS eess.SP

    Budget Recycling Differential Privacy

    Authors: Bo Jiang, Jian Du, Sagar Shamar, Qiang Yan

    Abstract: Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within… ▽ More

    Submitted 16 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  11. arXiv:2403.11091  [pdf, other

    cs.SD cs.CV eess.AS

    Multitask frame-level learning for few-shot sound event detection

    Authors: Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang

    Abstract: This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 6 pages, 4 figures, conference

  12. arXiv:2403.08196  [pdf, other

    cs.CL eess.AS

    SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation

    Authors: Jiayu Du, **peng Li, Guoguo Chen, Wei-Qiang Zhang

    Abstract: In the wake of the surging tide of deep learning over the past decade, Automatic Speech Recognition (ASR) has garnered substantial attention, leading to the emergence of numerous publicly accessible ASR systems that are actively being integrated into our daily lives. Nonetheless, the impartial and replicable evaluation of these ASR systems encounters challenges due to various crucial subtleties. I… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  13. arXiv:2403.04245  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

    Authors: Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee

    Abstract: Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames, performing even worse than single-modality models. While applying the dropout technique to the video modality enhances robustness to missing frames, it simultaneously results in a performance loss when dealing with complete data input. In this paper, we investigate this contrasting p… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: the paper is accepted by CVPR2024

  14. arXiv:2402.13018  [pdf, other

    eess.AS cs.SD

    EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

    Authors: Haibin Wu, Huang-Cheng Chou, Kai-Wei Chang, Lucas Goncalves, Jiawei Du, Jyh-Shing Roger Jang, Chi-Chun Lee, Hung-Yi Lee

    Abstract: Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems. However, 80.77% of SER papers yield results that cannot be reproduced. We develop EMO-SUPERB, short for EMOtion Speech Universal PERformance Benchmark, which aims to enhance open-source initiatives for SER. EMO-SUPERB includes a user-friendly codebase to leverage 15 state-of-the-art speech self-supervi… ▽ More

    Submitted 12 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: webpage: https://emosuperb.github.io/

  15. arXiv:2401.17573  [pdf

    stat.ML cs.LG eess.IV eess.SY

    Tensor-based process control and monitoring for semiconductor manufacturing with unstable disturbances

    Authors: Yanrong Li, Juan Du, Fugee Tsung, Wei Jiang

    Abstract: With the development and popularity of sensors installed in manufacturing systems, complex data are collected during manufacturing processes, which brings challenges for traditional process control methods. This paper proposes a novel process control and monitoring method for the complex structure of high-dimensional image-based overlay errors (modeled in tensor form), which are collected in semic… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 30 pages, 5 figures

  16. arXiv:2401.15206  [pdf

    eess.SY

    Backscatter Measurements and Models for RF Sensing Applications in Cluttered Environments

    Authors: Dmitry Chizhik, **feng Du, Jakub Sapis, Reinaldo A. Valenzuela, Abhishek Adhikari, Gil Zussman, Manuel A. Almendra, Mauricio Rodriguez, Rodolfo Feick

    Abstract: A statistical backscatter channel model for indoor clutter is developed for indoor RF sensing applications based on measurements. A narrowband 28 GHz sounder used a quazi-monostatic radar arrangement with an omnidirectional transmit antenna illuminating an indoor scene and a spinning horn receive antenna less than 1 m away collecting backscattered power as a function of azimuth. Median average bac… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  17. arXiv:2401.14612  [pdf, ps, other

    math.OC eess.SY

    On Inhomogeneous Infinite Products of Stochastic Matrices and Applications

    Authors: Zhaoyue Xia, Jun Du, Chunxiao Jiang, H. Vincent Poor, Zhu Han, Yong Ren

    Abstract: With the growth of magnitude of multi-agent networks, distributed optimization holds considerable significance within complex systems. Convergence, a pivotal goal in this domain, is contingent upon the analysis of infinite products of stochastic matrices (IPSMs). In this work, convergence properties of inhomogeneous IPSMs are investigated. The convergence rate of inhomogeneous IPSMs towards an abs… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  18. arXiv:2401.03357  [pdf

    cs.NI eess.SP

    Measured and Modeled Outdoor Indoor Coverage at 28 GHz into High Thermal Efficiency Buildings

    Authors: Dmitry Chizhik, **feng Du, Reinaldo Valenzuela, Andrea Bedin, Martti Moisio, Rodolfo Feick

    Abstract: 28 GHz outdoor-indoor coverage into modern office buildings with high thermal efficiency windows is found to be severely limited due to 46 dB median penetration loss at normal incidence and additional 15 dB median oblique incidence loss. The study is based on measurements of path gain over 280 outdoor-indoor links, at ranges up to 100 m. A simple theoretical path gain model is extended to include… ▽ More

    Submitted 4 September, 2023; originally announced January 2024.

    Comments: 2 pages, 3 figures. Presented at IEEE International Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting

  19. arXiv:2312.11125  [pdf, other

    eess.SP

    A Low-Complexity Range Estimation with Adjusted Affine Frequency Division Multiplexing Waveform

    Authors: Jiajun Zhu, Yanqun Tang, Xizhang Wei, Haoran Yin, **ming Du, Zhengpeng Wang, Yuqinng Liu

    Abstract: Affine frequency division multiplexing (AFDM) is a recently proposed communication waveform for time-varying channel scenarios. As a chirp-based multicarrier modulation technique it can not only satisfy the needs of multiple scenarios in future mobile communication networks but also achieve good performance in radar sensing by adjusting the built-in parameters, making it a promising air interface… ▽ More

    Submitted 29 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: The paper has been submitted to IEEE WCNC 2024 WS-13: Mobile Sensing-Communication-Computation Synergy for 6G Internet of Things

  20. arXiv:2310.15930  [pdf, other

    cs.SD eess.AS

    CDSD: Chinese Dysarthria Speech Database

    Authors: Mengyi Sun, Ming Gao, Xinchen Kang, Shiru Wang, Jun Du, Dengfeng Yao, Su-**g Wang

    Abstract: We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text poo… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 9 pages, 3 figures

  21. arXiv:2310.09723  [pdf, other

    cs.IT eess.SP

    A generalization of the achievable rate of a MISO system using Bode-Fano wideband matching theory

    Authors: Nitish Deshpande, Miguel R. Castellanos, Saeed R. Khosravirad, **feng Du, Harish Viswanathan, Robert W. Heath Jr

    Abstract: Impedance-matching networks affect power transfer from the radio frequency (RF) chains to the antennas. Their design impacts the signal to noise ratio (SNR) and the achievable rate. In this paper, we maximize the information-theoretic achievable rate of a multiple-input-single-output (MISO) system with wideband matching constraints. Using a multiport circuit theory approach with frequency-selectiv… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

  22. arXiv:2309.09270   

    eess.AS cs.AI cs.SD

    Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning

    Authors: Zilu Guo, Jun Du, CHin-Hui Lee

    Abstract: In this paper, we explore a continuous modeling approach for deep-learning-based speech enhancement, focusing on the denoising process. We use a state variable to indicate the denoising process. The starting state is noisy speech and the ending state is clean speech. The noise component in the state variable decreases with the change of the state index until the noise component is 0. During traini… ▽ More

    Submitted 7 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: We found the results are got from some wrong experimental settings. We needs new experiments

  23. arXiv:2309.09205  [pdf

    cs.LG eess.SY stat.ML

    MFRL-BI: Design of a Model-free Reinforcement Learning Process Control Scheme by Using Bayesian Inference

    Authors: Yanrong Li, Juan Du, Wei Jiang

    Abstract: Design of process control scheme is critical for quality assurance to reduce variations in manufacturing systems. Taking semiconductor manufacturing as an example, extensive literature focuses on control optimization based on certain process models (usually linear models), which are obtained by experiments before a manufacturing process starts. However, in real applications, pre-defined models may… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: 31 pages, 7 figures, and 3 tables

  24. arXiv:2309.09180  [pdf, other

    eess.AS cs.AI cs.SD

    Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

    Authors: Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee

    Abstract: We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance. Next, we further decrease the memory occupation of decoding by in… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  25. arXiv:2309.08348  [pdf, other

    eess.AS cs.SD

    The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

    Authors: Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, **gdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

    Abstract: Previous Multimodal Information based Speech Processing (MISP) challenges mainly focused on audio-visual speech recognition (AVSR) with commendable success. However, the most advanced back-end recognition systems often hit performance limits due to the complex acoustic environments. This has prompted a shift in focus towards the Audio-Visual Target Speaker Extraction (AVTSE) task for the MISP 2023… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures

  26. arXiv:2309.07925  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

    Authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng

    Abstract: In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for e… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures

    Journal ref: The 31st ACM International Conference on Multimedia (MM'23), 2023

  27. arXiv:2309.04975  [pdf, ps, other

    cs.IT eess.SP

    Trade-Off Between Beamforming and Macro-Diversity Gains in Distributed mMIMO

    Authors: Eduardo Noboro Tominaga, Hsuan-Jung Su, **feng Du, Sivarama Venkatesan, Richard Demo Souza, Hirley Alves

    Abstract: Industry and academia have been working towards the evolution from Centralized massive Multiple-Input Multiple-Output (CmMIMO) to Distributed mMIMO (DmMIMO) architectures. Instead of splitting a coverage area into many cells, each served by a single Base Station equipped with several antennas, the whole coverage area is jointly covered by several Access Points (AP) equipped with few or single ante… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 6 pages, 3 figures. Manuscript submitted to the IEEE Wireless Communications and Networking Conference (WCNC) 2024, Dubai, United Arab Emirates

  28. arXiv:2308.14638  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

    Authors: Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

    Abstract: This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy base… ▽ More

    Submitted 10 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by 2023 CHiME Workshop, Oral

  29. arXiv:2308.14393  [pdf

    eess.SY cs.RO

    Research on the Influence of Underwater Environment on the Dynamic Performance of the Mechanical Leg of a Deep-sea Crawling and Swimming Robot

    Authors: Lihui Liao, Baoren Li, Dijia Zhang, Lu** Gao, MboulƩ Ngwa, **gmin Du

    Abstract: The performance of underwater crawling and adjustment of the body posture for underwater manipulating of the deep-sea crawling and swimming robot (DCSR) is directly influenced by the dynamic performance of the underwater mechanical legs (UWML), as it serves as the executive mechanism of the DCSR. Compared with the mechanical legs of legged robots working on land, the UWML of the DCSR not only poss… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: conference for 2023 IEEE 9th International Conference on Fluid Power and Mechatronics (FPM2023)

    MSC Class: 93C40 ACM Class: C.5

  30. arXiv:2308.08488  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder

    Authors: Yusheng Dai, Hang Chen, Jun Du, Xiaofei Ding, Ning Ding, Feijun Jiang, Chin-Hui Lee

    Abstract: In recent research, slight performance improvement is observed from automatic speech recognition systems to audio-visual speech recognition systems in the end-to-end framework with low-quality videos. Unmatching convergence rates and specialized input representations between audio and visual modalities are considered to cause the problem. In this paper, we propose two novel techniques to improve a… ▽ More

    Submitted 8 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: 6 pages, 2 figures, published in ICME2023

  31. arXiv:2308.04805  [pdf, other

    cs.IR cs.SD eess.AS

    DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music

    Authors: Hongru Liang, **gyao Liu, Yuanxin Xiang, Jiachen Du, Lanjun Zhou, Shushen Pan, Wenqiang Lei

    Abstract: Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough map**s to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures, published to ACM MM 2023

  32. arXiv:2307.08688  [pdf, other

    eess.AS

    Semi-supervised multi-channel speaker diarization with cross-channel attention

    Authors: Shilong Wu, Jun Du, Maokui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee

    Abstract: Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios. This paper proposes a semi-supervised speaker diarization system to utilize large-scale multi-channel training data by generating pseudo-labels for unlabeled data. Furthermore, we introduce cross-channel attention into the Neural Speaker Diarization Using Me… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 8 pages,3 figures

  33. arXiv:2306.08527  [pdf, other

    eess.AS cs.AI cs.SD

    Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement

    Authors: Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang

    Abstract: The goal of this study is to implement diffusion models for speech enhancement (SE). The first step is to emphasize the theoretical foundation of variance-preserving (VP)-based interpolation diffusion under continuous conditions. Subsequently, we present a more concise framework that encapsulates both the VP- and variance-exploding (VE)-based interpolation diffusion methods. We demonstrate that th… ▽ More

    Submitted 17 September, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  34. arXiv:2305.08465  [pdf, other

    eess.SP

    An Overview of Resource Allocation in Integrated Sensing and Communication

    Authors: **ming Du, Yanqun Tang, Xizhang Wei, Jiaojiao Xiong, Jiajun Zhu, Haoran Yin, Chi Zhang, Haibo Chen

    Abstract: Integrated sensing and communication (ISAC) is considered as a promising solution for improving spectrum efficiency and relieving wireless spectrum congestion. This paper systematically introduces the evolutionary path of ISAC technologies, then sorts out and summarizes the current research status of ISAC resource allocation. From the perspective of different integrated levels of ISAC, we introduc… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 6 pages,4 figures,conference

  35. arXiv:2303.16772  [pdf, other

    eess.SY

    Maximin Headway Control of Automated Vehicles for System Optimal Dynamic Traffic Assignment in General Networks

    Authors: **xiao Du, Wei Ma

    Abstract: This study develops the headway control framework in a fully automated road network, as we believe headway of Automated Vehicles (AVs) is another influencing factor to traffic dynamics in addition to conventional vehicle behaviors (e.g. route and departure time choices). Specifically, we aim to search for the optimal time headway between AVs on each link that achieves the network-wide system optim… ▽ More

    Submitted 11 June, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

  36. arXiv:2302.14224  [pdf, other

    eess.SP cs.NI

    Overview and Performance Analysis of Various Waveforms in High Mobility Scenarios

    Authors: Yu Zhou, Haoran Yin, Jiaojiao Xiong, Shiyu Song, Jiajun Zhu, **ming Du, Haibo Chen, Yanqun Tang

    Abstract: In the high-mobility scenarios of next-generation wireless communication systems (beyond 5G/6G), the performance of orthogonal frequency division multiplexing (OFDM) deteriorates drastically due to the loss of orthogonality between the subcarriers caused by large Doppler frequency shifts. Various emerging waveforms have been proposed for fast time-varying channels with excellent results. In this p… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  37. arXiv:2301.01119  [pdf

    eess.SP cs.IT eess.SY

    Energy Efficient Extreme MIMO: Design Goals and Directions

    Authors: Stefan Wesemann, **feng Du, Harish Viswanathan

    Abstract: Ever since the invention of Bell Laboratories Layer Space-Time (BLAST) in mid 1990s, the focus of MIMO research and development has been largely on pushing the limit of spectral efficiency. While massive MIMO technologies laid the foundation of high spectrum efficiency in 5G and beyond, the challenge remains in improving energy efficiency given the increasing complexity of the associated radio sys… ▽ More

    Submitted 22 June, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

    Comments: This work has been accepted for publication by IEEE Communications Magazine. Copyright may be transferred without notice

    Journal ref: IEEE Communications Magazine, 2023

  38. arXiv:2212.00491  [pdf, ps, other

    eess.SY

    Gradient and Channel Aware Dynamic Scheduling for Over-the-Air Computation in Federated Edge Learning Systems

    Authors: Jun Du, Bingqing Jiang, Chunxiao Jiang, Yuanming Shi, Zhu Han

    Abstract: To satisfy the expected plethora of computation-heavy applications, federated edge learning (FEEL) is a new paradigm featuring distributed learning to carry the capacities of low-latency and privacy-preserving. To further improve the efficiency of wireless data aggregation and model learning, over-the-air computation (AirComp) is emerging as a promising solution by using the superposition characte… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  39. arXiv:2211.06474  [pdf, other

    cs.CL cs.SD eess.AS

    Speech-to-Speech Translation For A Real-world Unwritten Language

    Authors: Peng-Jen Chen, Kevin Tran, Yilin Yang, **gfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee

    Abstract: We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  40. arXiv:2211.04508  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

    Authors: Paul-Ambroise Duquenne, Hongyu Gong, Ning Dong, **gfei Du, Ann Lee, Vedanuj Goswani, Changhan Wang, Juan Pino, BenoƮt Sagot, Holger Schwenk

    Abstract: We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech. To evaluate the quality of this parallel speech, we train bilingual speech-to-speech translation models on mined data only and establish extensive basel… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 18 pages

  41. arXiv:2210.14581  [pdf, other

    eess.AS cs.SD eess.IV

    Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function

    Authors: Qing Wang, Hang Chen, Ya Jiang, Zhe Wang, Yuyang Wang, Jun Du, Chin-Hui Lee

    Abstract: In this paper, we propose a deep learning based multi-speaker direction of arrival (DOA) estimation with audio and visual signals by using permutation-free loss function. We first collect a data set for multi-modal sound source localization (SSL) where both audio and visual signals are recorded in real-life home TV scenarios. Then we propose a novel spatial annotation method to produce the ground… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 figures, accepted by ISCSLP 2022

  42. arXiv:2210.12361  [pdf

    eess.IV cs.CV

    MS-DCANet: A Novel Segmentation Network For Multi-Modality COVID-19 Medical Images

    Authors: Xiaoyu Pan, Huazheng Zhu, **glong Du, Guangtao Hu, Baoru Han, Yuanyuan Jia

    Abstract: The Coronavirus Disease 2019 (COVID-19) pandemic has increased the public health burden and brought profound disaster to humans. For the particularity of the COVID-19 medical images with blurred boundaries, low contrast and different infection sites, some researchers have improved the accuracy by adding more complexity. Also, they overlook the complexity of lesions, which hinder their ability to c… ▽ More

    Submitted 19 July, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: 21pages,13 figures,9 tables

    Journal ref: J Multidiscip Healthc. 2023;16:2023-2043

  43. arXiv:2207.10969  [pdf, ps, other

    eess.SP

    Convergence Theory of Generalized Distributed Subgradient Method with Random Quantization

    Authors: Zhaoyue Xia, Jun Du, Yong Ren

    Abstract: The distributed subgradient method (DSG) is a widely discussed algorithm to cope with large-scale distributed optimization problems in the arising machine learning applications. Most exisiting works on DSG focus on ideal communication between the cooperative agents such that the shared information between agents is exact and perfect. This assumption, however, could lead to potential privacy concer… ▽ More

    Submitted 23 August, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

  44. arXiv:2206.14323  [pdf, ps, other

    eess.SP cs.IT

    A wideband generalization of the near-field region for extremely large phased-arrays

    Authors: Nitish Deshpande, Miguel R. Castellanos, Saeed R. Khosravirad, **feng Du, Harish Viswanathan, Robert W. Heath Jr

    Abstract: The narrowband and far-field assumption in conventional wireless system design leads to a mismatch with the optimal beamforming required for wideband and near-field systems. This discrepancy is exacerbated for larger apertures and bandwidths. To characterize the behavior of near-field and wideband systems, we derive the beamforming gain expression achieved by a frequency-flat phased array designed… ▽ More

    Submitted 29 June, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

  45. arXiv:2205.09436  [pdf, other

    eess.SP cs.NI

    Outdoor-to-Indoor 28 GHz Wireless Measurements in Manhattan: Path Loss, Environmental Effects, and 90% Coverage

    Authors: Manav Kohli, Abhishek Adhikari, Gulnur Avci, Sienna Brent, Aditya Dash, Jared Moser, Sabbir Hossain, Igor Kadota, Carson Garland, Shivan Mukherjee, Rodolfo Feick, Dmitry Chizhik, **feng Du, Reinaldo A. Valenzuela, Gil Zussman

    Abstract: Outdoor-to-indoor (OtI) signal propagation further challenges the already tight link budgets at millimeter-wave (mmWave). To gain insight into OtI mmWave scenarios at 28 GHz, we conducted an extensive measurement campaign consisting of over 2,200 link measurements. In total, 43 OtI scenarios were measured in West Harlem, New York City, covering seven highly diverse buildings. The measured OtI path… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: 13 pages, 13 figures

  46. arXiv:2205.00698  [pdf

    eess.IV cs.CV cs.LG

    Unsupervised Denoising of Optical Coherence Tomography Images with Dual_Merged CycleWGAN

    Authors: Jie Du, Xujian Yang, Kecheng **, Xuanzheng Qi, Hu Chen

    Abstract: Nosie is an important cause of low quality Optical coherence tomography (OCT) image. The neural network model based on Convolutional neural networks(CNNs) has demonstrated its excellent performance in image denoising. However, OCT image denoising still faces great challenges because many previous neural network algorithms required a large number of labeled data, which might cost much time or is ex… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: Mr. Hu Chen is our corresponding author

  47. arXiv:2203.04114  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification

    Authors: Qing Wang, Jun Du, Siyuan Zheng, Yunqing Li, Yajian Wang, Yuzhong Wu, Hu Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee

    Abstract: In this paper, we propose two techniques, namely joint modeling and data augmentation, to improve system performances for audio-visual scene classification (AVSC). We employ pre-trained networks trained only on image data sets to extract video embedding; whereas for audio embedding models, we decide to train them from scratch. We explore different neural network architectures for joint modeling to… ▽ More

    Submitted 31 August, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: 5 pages, 1 figure

  48. arXiv:2203.03813  [pdf, other

    eess.SP

    Dense Urban Outdoor-Indoor Coverage from 3.5 to 28 GHz

    Authors: Dipankar Shakya, Dmitry Chizhik, **feng Du, Reinaldo A. Valenzuela, Theodore S. Rappaport

    Abstract: In the US, people spend 87% of their time indoors and have an average of four connected devices per person (in 2020). As such, providing indoor coverage has always been a challenge but becomes even more difficult as carrier frequencies increase to mmWave and beyond. This paper investigates the outdoor and outdoor-indoor coverage of an urban network comparing globally standardized building penetrat… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 6 pages, 8 figures, ICC 2022 conference

  49. arXiv:2202.08710  [pdf, ps, other

    eess.SP

    Sampling and Reconstructing Angular Domains with Uniform Arrays

    Authors: Silvio Mandelli, Marcus Henninger, **feng Du

    Abstract: The surge of massive antenna arrays in wireless networks calls for the adoption of analog/hybrid array solutions, where multiple antenna elements are driven by a common radio front end to form a beam along a specific angle in order to maximize the beamforming gain. Many heuristics have been proposed to sample the angular domain by trading off between sampling step size and overhead, where arbitrar… ▽ More

    Submitted 3 July, 2023; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: 15 pages, 10 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. This version corrects a typo in Section IV-D of the published paper in IEEE Transactions of Wireless Communications

  50. arXiv:2202.08509  [pdf, other

    cs.SD cs.AI cs.CV cs.LG eess.AS

    A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning

    Authors: Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee

    Abstract: Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission. In this paper, we investigate on designing a compact audio-visual WWS system by utilizing visual information to alleviate the degradation. Specifically, in order to use visual information, we first encode the detected lips to fixed-size vectors with MobileNet an… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: Accepted to ICASSP 2022. H. Zhou et al