Skip to main content

Showing 1–50 of 233 results for author: Wang, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00481  [pdf

    eess.SP

    Machine-Type Communication Waveforms: An Exploration of New Dimensions

    Authors: Michael Wang, Lei Wang, Xiaohu You

    Abstract: This paper derives a generalized class of waveforms with an application to machine-type communication (MTC) while studying its underlying structural characteristics in relation to conventional modulation waveforms. First, a canonical waveform of frequency-error tolerance is identified for a unified preamble and traffic signal design, ideal for MTC use as a composite waveform, commonly known as a t… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 17 pages, 9 figures

  2. arXiv:2406.19608  [pdf, other

    eess.SY

    Multi-service collaboration and composition of cloud manufacturing customized production based on problem decomposition

    Authors: Hao Yue, Yingtao Wu, Min Wang, Hesuan Hu, Weimin Wu, Jihui Zhang

    Abstract: Cloud manufacturing system is a service-oriented and knowledge-based one, which can provide solutions for the large-scale customized production. The service resource allocation is the primary factor that restricts the production time and cost in the cloud manufacturing customized production (CMCP). In order to improve the efficiency and reduce the cost in CMCP, we propose a new framework which con… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures

    ACM Class: J.0

  3. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.16942  [pdf, other

    eess.IV cs.AI cs.CV

    Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

    Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

    Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

  5. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.08248  [pdf, other

    eess.SY

    Traffic Signal Cycle Control with Centralized Critic and Decentralized Actors under Varying Intervention Frequencies

    Authors: Maonan Wang, Yirong Chen, Yuheng Kan, Chengcheng Xu, Michael Lepech, Man-On Pun, Xi Xiong

    Abstract: Traffic congestion in urban areas is a significant problem, leading to prolonged travel times, reduced efficiency, and increased environmental concerns. Effective traffic signal control (TSC) is a key strategy for reducing congestion. Unlike most TSC systems that rely on high-frequency control, this study introduces an innovative joint phase traffic signal cycle control method that operates effect… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 26 pages, 17 figures

  7. arXiv:2406.07662  [pdf, other

    eess.IV cs.AI cs.CV cs.LG q-bio.NC

    Progress Towards Decoding Visual Imagery via fNIRS

    Authors: Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu

    Abstract: We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 2… ▽ More

    Submitted 22 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  8. arXiv:2406.07532  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Hearing Anything Anywhere

    Authors: Mason Wang, Ryosuke Sawata, Samuel Clarke, Ruohan Gao, Shangzhe Wu, Jiajun Wu

    Abstract: Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. The first two authors contributed equally. Project page: https://masonlwang.com/hearinganythinganywhere/

    ACM Class: I.2.10; I.4.8

  9. arXiv:2406.05799  [pdf, ps, other

    eess.SP

    Double-RIS-Assisted Orbital Angular Momentum Near-Field Secure Communications

    Authors: Li** Liang, Minmin Wang, Wenchi Cheng, Wei Zhang

    Abstract: To satisfy the various demands of growing devices and services, emerging high-frequency-based technologies promote near-field wireless communications. Therefore, near-field physical layer security has attracted much attention to facilitate the wireless information security against illegitimate eavesdrop**. However, highly correlated channels between legitimate transceivers and eavesdroppers of e… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  10. Fractal OAM Generation and Detection Schemes

    Authors: Runyu Lyu, Wenchi Cheng, Muyao Wang, Wei Zhang

    Abstract: Orbital angular momentum (OAM) carried electromagnetic waves have the potential to improve spectrum efficiency in optical and radio-frequency communications due to the orthogonal wavefronts of different OAM modes. However, OAM beams are vortically hollow and divergent, which significantly decreases the capacity of OAM transmissions. In addition, unaligned transceivers in OAM transmissions can resu… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 15 pages, 20 figures

    Journal ref: IEEE Journal on Selected Areas in Communications, vol. 42, no. 6, pp. 1598-1612, June 2024

  11. arXiv:2406.02554  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.LG cs.MM

    Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

    Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

    Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-rel… ▽ More

    Submitted 22 March, 2024; originally announced June 2024.

  12. arXiv:2406.00758  [pdf, other

    eess.IV cs.CV cs.MM

    Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

    Authors: Anqi Li, Yuxi Liu, Huihui Bai, Feng Li, Runmin Cong, Meng Wang, Yao Zhao

    Abstract: Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, Control-GIC, the first capable of… ▽ More

    Submitted 5 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  13. arXiv:2405.18167  [pdf, other

    eess.IV cs.CV

    Confidence-aware multi-modality learning for eye disease screening

    Authors: Ke Zou, Tian Lin, Zongbo Han, Meng Wang, Xuedong Yuan, Haoyu Chen, Changqing Zhang, Xiao**g Shen, Huazhu Fu

    Abstract: Multi-modal ophthalmic image classification plays a key role in diagnosing eye diseases, as it integrates information from different sources to complement their respective performances. However, recent improvements have mainly focused on accuracy, often neglecting the importance of confidence and robustness in predictions for diverse modalities. In this study, we propose a novel multi-modality evi… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 27 pages, 7 figures, 9 tables

  14. arXiv:2405.16965  [pdf, ps, other

    cs.IT eess.SP

    Timeliness of Status Update System: The Effect of Parallel Transmission Using Heterogeneous Updating Devices

    Authors: Zhengchuan Chen, Kang Lang, Nikolaos Pappas, Howard H. Yang, Min Wang, Zhong Tian, Tony Q. S. Quek

    Abstract: Timely status updating is the premise of emerging interaction-based applications in the Internet of Things (IoT). Using redundant devices to update the status of interest is a promising method to improve the timeliness of information. However, parallel status updating leads to out-of-order arrivals at the monitor, significantly challenging timeliness analysis. This work studies the Age of Informat… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  15. arXiv:2404.16852  [pdf, other

    cs.LG cs.AI cs.CL eess.IV

    A Disease Labeler for Chinese Chest X-Ray Report Generation

    Authors: Mengwei Wang, Ruixin Yan, Zeyi Hou, Ning Lang, Xiuzhuang Zhou

    Abstract: In the field of medical image analysis, the scarcity of Chinese chest X-ray report datasets has hindered the development of technology for generating Chinese chest X-ray reports. On one hand, the construction of a Chinese chest X-ray report dataset is limited by the time-consuming and costly process of accurate expert disease annotation. On the other hand, a single natural language generation metr… ▽ More

    Submitted 18 March, 2024; originally announced April 2024.

  16. arXiv:2404.13941  [pdf, other

    eess.SY cs.AI cs.LG

    Autoencoder-assisted Feature Ensemble Net for Incipient Faults

    Authors: Mingxuan Gao, Min Wang, Maoyin Chen

    Abstract: Deep learning has shown the great power in the field of fault detection. However, for incipient faults with tiny amplitude, the detection performance of the current deep learning networks (DLNs) is not satisfactory. Even if prior information about the faults is utilized, DLNs can't successfully detect faults 3, 9 and 15 in Tennessee Eastman process (TEP). These faults are notoriously difficult to… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  17. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  18. arXiv:2404.10026  [pdf

    eess.IV cs.CR cs.LG

    Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor Detection

    Authors: Lisang Zhou, Meng Wang, Ning Zhou

    Abstract: Distributed training can facilitate the processing of large medical image datasets, and improve the accuracy and efficiency of disease diagnosis while protecting patient privacy, which is crucial for achieving efficient medical image analysis and accelerating medical research progress. This paper presents an innovative approach to medical image classification, leveraging Federated Learning (FL) to… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Journal ref: Journal of Information, Technology and Policy (2023): 1-12

  19. arXiv:2404.09192  [pdf, other

    cs.SD cs.AI eess.AS

    Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

    Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, **zuomu Zhong, Benlai Tang

    Abstract: Over the past decade, a series of unflagging efforts have been dedicated to develo** highly expressive and controllable text-to-speech (TTS) systems. In general, the holistic TTS comprises two interconnected components: the frontend module and the backend module. The frontend excels in capturing linguistic representations from the raw text input, while the backend module converts linguistic cues… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  20. arXiv:2404.07959  [pdf

    eess.SP eess.SY

    Damage identification of offshore jacket platforms in a digital twin framework considering optimal sensor placement

    Authors: Mengmeng Wang, Atilla Incecik, Shizhe Feng, M. K. Gupta, Grzegorz Krlolczyk, Z Li

    Abstract: A new digital twin (DT) framework with optimal sensor placement (OSP) is proposed to accurately calculate the modal responses and identify the damage ratios of the offshore jacket platforms. The proposed damage identification framework consists of two models (namely one OSP model and one damage identification model). The OSP model adopts the multi-objective Lichtenberg algorithm (MOLA) to perform… ▽ More

    Submitted 26 March, 2024; originally announced April 2024.

  21. arXiv:2403.20130  [pdf, other

    cs.SD cs.LG eess.AS

    Sound event localization and classification using WASN in Outdoor Environment

    Authors: Dongzhe Zhang, Jianfeng Chen, Jisheng Bai, Mou Wang

    Abstract: Deep learning-based sound event localization and classification is an emerging research area within wireless acoustic sensor networks. However, current methods for sound event localization and classification typically rely on a single microphone array, making them susceptible to signal attenuation and environmental noise, which limits their monitoring range. Moreover, methods using multiple microp… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  22. IDF-CR: Iterative Diffusion Process for Divide-and-Conquer Cloud Removal in Remote-sensing Images

    Authors: Meilin Wang, Yexing Song, Pengxu Wei, Xiaoyu Xian, Yukai Shi, Liang Lin

    Abstract: Deep learning technologies have demonstrated their effectiveness in removing cloud cover from optical remote-sensing images. Convolutional Neural Networks (CNNs) exert dominance in the cloud removal tasks. However, constrained by the inherent limitations of convolutional operations, CNNs can address only a modest fraction of cloud occlusion. In recent years, diffusion models have achieved state-of… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE TGRS, we first present an iterative diffusion process for cloud removal, the code is available at: https://github.com/SongYxing/IDF-CR

  23. arXiv:2403.08337  [pdf, other

    eess.SY cs.AI cs.LG

    LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments

    Authors: Maonan Wang, Aoyu Pang, Yuheng Kan, Man-On Pun, Chung Shue Chen, Bo Huang

    Abstract: Traffic congestion in metropolitan areas presents a formidable challenge with far-reaching economic, environmental, and societal ramifications. Therefore, effective congestion management is imperative, with traffic signal control (TSC) systems being pivotal in this endeavor. Conventional TSC systems, designed upon rule-based algorithms or reinforcement learning (RL), frequently exhibit deficiencie… ▽ More

    Submitted 12 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: 20 pages, 11 figures

  24. arXiv:2403.00671  [pdf, other

    eess.IV

    Asymmetric Feature Fusion for Image Retrieval

    Authors: Hui Wu, Min Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li

    Abstract: In asymmetric retrieval systems, models with different capacities are deployed on platforms with different computational and storage resources. Despite the great progress, existing approaches still suffer from a dilemma between retrieval efficiency and asymmetric accuracy due to the limited capacity of the lightweight query model. In this work, we propose an Asymmetric Feature Fusion (AFF) paradig… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  25. arXiv:2403.00648  [pdf, other

    eess.IV

    Structure Similarity Preservation Learning for Asymmetric Image Retrieval

    Authors: Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

    Abstract: Asymmetric image retrieval is a task that seeks to balance retrieval accuracy and efficiency by leveraging lightweight and large models for the query and gallery sides, respectively. The key to asymmetric image retrieval is realizing feature compatibility between different models. Despite the great progress, most existing approaches either rely on classifiers inherited from gallery models or simpl… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  26. arXiv:2402.12660  [pdf, other

    cs.SD cs.HC eess.AS

    SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion

    Authors: Liumeng Xue, Chaoren Wang, Mingxuan Wang, Xueyao Zhang, Jun Han, Zhizheng Wu

    Abstract: In this study, we present SingVisio, an interactive visual analysis system that aims to explain the diffusion model used in singing voice conversion. SingVisio provides a visual display of the generation process in diffusion models, showcasing the step-by-step denoising of the noisy spectrum and its transformation into a clean spectrum that captures the desired singer's timbre. The system also fac… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  27. arXiv:2402.11211  [pdf, other

    eess.IV cs.CV

    Training-free image style alignment for self-adapting domain shift on handheld ultrasound devices

    Authors: Hongye Zeng, Ke Zou, Zhihao Chen, Yuchong Gao, Hongbo Chen, Haibin Zhang, Kang Zhou, Meng Wang, Rick Siow Mong Goh, Yong Liu, Chang Jiang, Rui Zheng, Huazhu Fu

    Abstract: Handheld ultrasound devices face usage limitations due to user inexperience and cannot benefit from supervised deep learning without extensive expert annotations. Moreover, the models trained on standard ultrasound device data are constrained by training data distribution and perform poorly when directly applied to handheld device data. In this study, we propose the Training-free Image Style Align… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  28. arXiv:2402.02694  [pdf, other

    eess.AS cs.LG cs.SD

    Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

    Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

    Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More

    Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  29. arXiv:2402.01828  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Retrieval Augmented End-to-End Spoken Dialog Models

    Authors: Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

    Abstract: We recently developed SLM, a joint speech and language model, which fuses a pretrained foundational speech model and a large language model (LLM), while preserving the in-context learning capability intrinsic to the pretrained LLM. In this paper, we apply SLM to speech dialog applications where the dialog states are inferred directly from the audio signal. Task-oriented dialogs often contain dom… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Journal ref: Proc. ICASSP 2024

  30. arXiv:2402.01194  [pdf, other

    eess.SP

    A Robust Super-resolution Gridless Imaging Framework for UAV-borne SAR Tomography

    Authors: Silin Gao, Wenlong Wang, Muhan Wang, Zhe Zhang, Zai Yang, Xiaolan Qiu, Bingchen Zhang, Yirong Wu

    Abstract: Synthetic aperture radar (SAR) tomography (TomoSAR) retrieves three-dimensional (3-D) information from multiple SAR images, effectively addresses the layover problem, and has become pivotal in urban map**. Unmanned aerial vehicle (UAV) has gained popularity as a TomoSAR platform, offering distinct advantages such as the ability to achieve 3-D imaging in a single flight, cost-effectiveness, rapid… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  31. arXiv:2401.15613  [pdf, other

    eess.IV cs.CV

    Towards Arbitrary-Scale Histopathology Image Super-resolution: An Efficient Dual-branch Framework via Implicit Self-texture Enhancement

    Authors: Minghong Duan, Linhao Qu, Zhiwei Yang, Manning Wang, Chenxi Zhang, Zhijian Song

    Abstract: High-quality whole-slide scanners are expensive, complex, and time-consuming, thus limiting the acquisition and utilization of high-resolution pathology whole-slide images in daily clinical work. Deep learning-based single-image super-resolution techniques are an effective way to solve this problem by synthesizing high-resolution images from low-resolution ones. However, the existing super-resolut… ▽ More

    Submitted 26 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  32. arXiv:2401.08678  [pdf, other

    eess.AS cs.SD

    Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

    Authors: Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

    Abstract: This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Submitted to ICASSP 2024

  33. UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

    Authors: Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang

    Abstract: Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tu… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted in ICASSP 2023

  34. arXiv:2312.17538  [pdf, other

    cs.CV cs.LG eess.IV

    Distance Guided Generative Adversarial Network for Explainable Binary Classifications

    Authors: Xiangyu Xiong, Yue Sun, Xiaohong Liu, Wei Ke, Chan-Tong Lam, Jiangang Chen, Mingfeng Jiang, Mingwei Wang, Hui Xie, Tong Tong, Qinquan Gao, Hao Chen, Tao Tan

    Abstract: Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classi… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: 12 pages, 8 figures. This work has been submitted to the IEEE TNNLS for possible publication. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media

  35. arXiv:2312.13585  [pdf, other

    cs.CL cs.SD eess.AS

    Speech Translation with Large Language Models: An Industrial Practice

    Authors: Zhichao Huang, Rong Ye, Tom Ko, Qianqian Dong, Shanbo Cheng, Mingxuan Wang, Hang Li

    Abstract: Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long au… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Technical report. 13 pages. Demo: https://speechtranslation.github.io/llm-st/

  36. arXiv:2312.09911  [pdf, other

    cs.SD eess.AS

    Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

    Authors: Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu

    Abstract: Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, a… ▽ More

    Submitted 22 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Amphion Website: https://github.com/open-mmlab/Amphion

  37. UniTSA: A Universal Reinforcement Learning Framework for V2X Traffic Signal Control

    Authors: Maonan Wang, Xi Xiong, Yuheng Kan, Chengcheng Xu, Man-On Pun

    Abstract: Traffic congestion is a persistent problem in urban areas, which calls for the development of effective traffic signal control (TSC) systems. While existing Reinforcement Learning (RL)-based methods have shown promising performance in optimizing TSC, it is challenging to generalize these methods across intersections of different structures. In this work, a universal RL-based TSC framework is propo… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 18 pages, 9 figures

    Journal ref: IEEE Transactions on Vehicular Technology, 2024

  38. arXiv:2311.14924  [pdf, other

    eess.SY

    Sequencing-enabled Hierarchical Cooperative CAV On-ramp Merging Control with Enhanced Stability and Feasibility

    Authors: Sixu Li, Yang Zhou, Xinyue Ye, Jiwan Jiang, Meng Wang

    Abstract: This paper develops a sequencing-enabled hierarchical connected automated vehicle (CAV) cooperative on-ramp merging control framework. The proposed framework consists of a two-layer design: the upper level control sequences the vehicles to harmonize the traffic density across mainline and on-ramp segments while enhancing lower-level control efficiency through a mixed-integer linear programming for… ▽ More

    Submitted 25 May, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  39. arXiv:2311.14068  [pdf, other

    eess.AS

    Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection

    Authors: Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

    Abstract: Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-… ▽ More

    Submitted 7 December, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: to be improved (unfinished)

  40. arXiv:2311.12371  [pdf, other

    eess.AS

    AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning

    Authors: Jisheng Bai, Han Yin, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen, Susanto Rahardja

    Abstract: Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-sema… ▽ More

    Submitted 4 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  41. arXiv:2311.03974  [pdf, ps, other

    cs.IT eess.SP

    NOMA Enabled Multi-Access Edge Computing: A Joint MU-MIMO Precoding and Computation Offloading Design

    Authors: Deyou Zhang, Meng Wang, Shuo Shi, Ming Xiao

    Abstract: This letter investigates computation offloading and transmit precoding co-design for multi-access edge computing (MEC), where multiple MEC users (MUs) equipped with multiple antennas access the MEC server in a non-orthogonal multiple access manner. We aim to minimize the total energy consumption of all MUs while satisfying the latency constraints by jointly optimizing the computational frequency,… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  42. arXiv:2311.03517  [pdf, other

    cs.SD cs.CV eess.AS

    SoundCam: A Dataset for Finding Humans Using Room Acoustics

    Authors: Mason Wang, Samuel Clarke, Jui-Hsien Wang, Ruohan Gao, Jiajun Wu

    Abstract: A room's acoustic properties are a product of the room's geometry, the objects within the room, and their specific positions. A room's acoustic properties can be characterized by its impulse response (RIR) between a source and listener location, or roughly inferred from recordings of natural signals present in the room. Variations in the positions of objects in a room can effect measurable changes… ▽ More

    Submitted 15 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: In NeurIPS 2023 Datasets and Benchmarks Track. Project page: https://masonlwang.com/soundcam/. Wang and Clarke contributed equally to this work

  43. arXiv:2310.12249  [pdf, other

    eess.SY

    A Link Transmission Model with Variable Speed Limits and Turn-Level Queue Transmission at Signalized Intersections

    Authors: Lei Wei, S. Travis Waller, Yu Mei, Yunpeng Wang, Meng Wang

    Abstract: The link transmission model (LTM) is an efficient and widely used macro-level approach for simulating traffic flow. However, the state-of-the-art LTMs usually focused on segment-level modelling, in which the traffic operation differences among multiple turning directions are neglected. Such models are incapable of differentiating the turn-level queue transmission, resulting in underrepresented que… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  44. arXiv:2310.11957  [pdf, other

    cs.DC eess.SP

    Supporting UAVs with Edge Computing: A Review of Opportunities and Challenges

    Authors: Malte Janßen, Tobias Pfandzelter, Minghe Wang, David Bermbach

    Abstract: Over the last years, Unmanned Aerial Vehicles (UAVs) have seen significant advancements in sensor capabilities and computational abilities, allowing for efficient autonomous navigation and visual tracking applications. However, the demand for computationally complex tasks has increased faster than advances in battery technology. This opens up possibilities for improvements using edge computing. In… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Report number: MCC.2023.3

  45. arXiv:2310.10992  [pdf, other

    cs.SD eess.AS

    A High Fidelity and Low Complexity Neural Audio Coding

    Authors: Wenzhe Liu, Wei Xiao, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su, Shidong Shang, Dong Yu

    Abstract: Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor high-frequency expression and high computational cost and storage consumption, we proposed an integrated framework that utilizes a neural network to model wide… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  46. arXiv:2310.08804  [pdf, other

    eess.SP

    Spiking Semantic Communication for Feature Transmission with HARQ

    Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan

    Abstract: In Collaborative Intelligence (CI), the Artificial Intelligence (AI) model is divided between the edge and the cloud, with intermediate features being sent from the edge to the cloud for inference. Several deep learning-based Semantic Communication (SC) models have been proposed to reduce feature transmission overhead and mitigate channel noise interference. Previous research has demonstrated that… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  47. arXiv:2310.00230  [pdf, other

    cs.CL cs.SD eess.AS

    SLM: Bridge the thin gap between speech and text foundation models

    Authors: Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu

    Abstract: We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their capabilities, and only trains a simple adapter with just 1\% (156M) of the foundation models' parameters. This adaptation not only leads SLM to achiev… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  48. arXiv:2309.15529  [pdf

    eess.IV cs.CV cs.LG

    Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data

    Authors: Muyu Wang, Shiyu Fan, Yichen Li, Hui Chen

    Abstract: Fusing multi-modal data can improve the performance of deep learning models. However, missing modalities are common for medical data due to patients' specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities. This study aimed to develop an efficient multi-modal fusion architecture for medical data… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  49. arXiv:2309.05298  [pdf, other

    cs.RO eess.SY

    Real-Time Parallel Trajectory Optimization with Spatiotemporal Safety Constraints for Autonomous Driving in Congested Traffic

    Authors: Lei Zheng, Rui Yang, Zengqi Peng, Haichao Liu, Michael Yu Wang, Jun Ma

    Abstract: Multi-modal behaviors exhibited by surrounding vehicles (SVs) can typically lead to traffic congestion and reduce the travel efficiency of autonomous vehicles (AVs) in dense traffic. This paper proposes a real-time parallel trajectory optimization method for the AV to achieve high travel efficiency in dynamic and congested environments. A spatiotemporal safety module is developed to facilitate the… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 8 pages, 7 figures, accepted for publication in the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

  50. arXiv:2309.03440  [pdf, other

    eess.IV cs.CV cs.LG

    Punctate White Matter Lesion Segmentation in Preterm Infants Powered by Counterfactually Generative Learning

    Authors: Zehua Ren, Yongheng Sun, Miaomiao Wang, Yuying Feng, Xianjun Li, Chao **, Jian Yang, Chunfeng Lian, Fan Wang

    Abstract: Accurate segmentation of punctate white matter lesions (PWMLs) are fundamental for the timely diagnosis and treatment of related developmental disorders. Automated PWMLs segmentation from infant brain MR images is challenging, considering that the lesions are typically small and low-contrast, and the number of lesions may dramatically change across subjects. Existing learning-based methods directl… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 10 pages, 3 figures, Medical Image Computing and Computer Assisted Intervention(MICCAI)