Skip to main content

Showing 1–50 of 305 results for author: Hu, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18548  [pdf

    eess.IV cs.CV

    Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis

    Authors: Yuxiang Hu, Haowei Yang, Ting Xu, Shuyao He, Jiajie Yuan, Haozhang Deng

    Abstract: The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  2. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  3. arXiv:2406.05982  [pdf

    eess.IV cs.LG physics.med-ph

    Artificial Intelligence for Neuro MRI Acquisition: A Review

    Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

    Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Submitted to MAGMA for review

  4. arXiv:2406.05915  [pdf, other

    cs.CV eess.IV

    Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering

    Authors: Yueyu Hu, Ran Gong, Yao Wang

    Abstract: Point cloud is a promising 3D representation for volumetric streaming in emerging AR/VR applications. Despite recent advances in point cloud compression, decoding and rendering high-quality images from lossy compressed point clouds is still challenging in terms of quality and complexity, making it a major roadblock to achieve real-time 6-Degree-of-Freedom video streaming. In this paper, we address… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  5. arXiv:2406.02479  [pdf

    cs.LG eess.SP eess.SY

    Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis

    Authors: Yi Hu, Hyeon** Kim, Kai Ye, Ning Lu

    Abstract: This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate t… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  6. arXiv:2406.00654  [pdf, other

    cs.CL cs.SD eess.AS

    Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

    Authors: Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang

    Abstract: In recent years, text-to-speech (TTS) technology has witnessed impressive advancements, particularly with large-scale training datasets, showcasing human-level speech quality and impressive zero-shot capabilities on unseen speakers. However, despite human subjective evaluations, such as the mean opinion score (MOS), remaining the gold standard for assessing the quality of synthetic speech, even st… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 19 pages, Preprint

  7. Fair Evaluation of Federated Learning Algorithms for Automated Breast Density Classification: The Results of the 2022 ACR-NCI-NVIDIA Federated Learning Challenge

    Authors: Kendall Schmidt, Benjamin Bearce, Ken Chang, Laura Coombs, Keyvan Farahani, Marawan Elbatele, Kaouther Mouhebe, Robert Marti, Ruipeng Zhang, Yao Zhang, Yanfeng Wang, Yaojun Hu, Haochao Ying, Yuyang Xu, Conrad Testagrose, Mutlu Demirer, Vikash Gupta, Ünal Akünal, Markus Bujotzek, Klaus H. Maier-Hein, Yi Qin, Xiaomeng Li, Jayashree Kalpathy-Cramer, Holger R. Roth

    Abstract: The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 16 pages, 9 figures

    Journal ref: Medical Image Analysis Volume 95, July 2024, 103206

  8. arXiv:2405.14161  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang

    Abstract: We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifica… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 23 pages, Preprint

  9. arXiv:2405.13339  [pdf, other

    eess.SP

    Floor-Plan-aided Indoor Localization: Zero-Shot Learning Framework, Data Sets, and Prototype

    Authors: Haiyao Yu, Changyang She, Yunkai Hu, Geng Wang, Rui Wang, Branka Vucetic, Yonghui Li

    Abstract: Machine learning has been considered a promising approach for indoor localization. Nevertheless, the sample efficiency, scalability, and generalization ability remain open issues of implementing learning-based algorithms in practical systems. In this paper, we establish a zero-shot learning framework that does not need real-world measurements in a new communication environment. Specifically, a gra… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  10. arXiv:2405.10025  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models

    Authors: Yuchen Hu, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng, Ruizhe Li

    Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which aims to predict the ground-truth transcription from the decoded N-best hypotheses. Thanks to the strong language generation ability of LLMs and rich information in the N-best list, GER shows great effectiveness in enhancing ASR results. However, it still suf… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 14 pages, Accepted by ACL 2024

  11. arXiv:2404.19182  [pdf, other

    eess.SP

    Robust Proximity Detection using On-Device Gait Monitoring

    Authors: Yuqian Hu, Guozhen Zhu, Beibei Wang, K. J. Ray Liu

    Abstract: Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: This work has been accepted in IEEE 9th World Forum on Internet of Things (WFIoT)

  12. arXiv:2404.17069  [pdf, other

    cs.IT cs.LG eess.SP

    Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks

    Authors: Yaqi Hu, Mingsheng Yin, Marco Mezzavilla, Hao Guo, Sundeep Rangan

    Abstract: The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operat… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  13. An unsupervised learning-based shear wave tracking method for ultrasound elastography

    Authors: Remi Delaunay, Yipeng Hu, Tom Vercauteren

    Abstract: Shear wave elastography involves applying a non-invasive acoustic radiation force to the tissue and imaging the induced deformation to infer its mechanical properties. This work investigates the use of convolutional neural networks to improve displacement estimation accuracy in shear wave imaging. Our training approach is completely unsupervised, which allows to learn the estimation of the induced… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted to SPIE Medical Imaging 2022

  14. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  15. arXiv:2404.11171  [pdf, other

    cs.LG cs.AI eess.SP

    Personalized Heart Disease Detection via ECG Digital Twin Generation

    Authors: Yaojun Hu, **tai Chen, Lianting Hu, Dantong Li, Jiahuan Yan, Haochao Ying, Huiying Liang, Jian Wu

    Abstract: Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ dig… ▽ More

    Submitted 11 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  16. arXiv:2404.10240  [pdf, other

    eess.SY

    Disturbance Rejection-Guarded Learning for Vibration Suppression of Two-Inertia Systems

    Authors: Fan Zhang, **feng Chen, Yu Hu, Zhiqiang Gao, Ge Lv, Qin Lin

    Abstract: Model uncertainty presents significant challenges in vibration suppression of multi-inertia systems, as these systems often rely on inaccurate nominal mathematical models due to system identification errors or unmodeled dynamics. An observer, such as an extended state observer (ESO), can estimate the discrepancy between the inaccurate nominal model and the true model, thus improving control perfor… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  17. arXiv:2404.09979  [pdf, other

    cs.CV eess.IV

    One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Compression for Stereoscopic Teleconferencing

    Authors: Yueyu Hu, Onur G. Guleryuz, Philip A. Chou, Danhang Tang, Jonathan Taylor, Rus Maxham, Yao Wang

    Abstract: Stereoscopic video conferencing is still challenging due to the need to compress stereo RGB-D video in real-time. Though hardware implementations of standard video codecs such as H.264 / AVC and HEVC are widely available, they are not designed for stereoscopic videos and suffer from reduced quality and performance. Specific multiview or 3D extensions of these codecs are complex and lack efficient… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024 Workshop (AIS: Vision, Graphics and AI for Streaming https://ai4streaming-workshop.github.io )

  18. arXiv:2404.08175  [pdf, ps, other

    eess.SY

    A Novel Vision Transformer based Load Profile Analysis using Load Images as Inputs

    Authors: Hyeon** Kim, Yi Hu, Kai Ye, Ning Lu

    Abstract: This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset,… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  19. arXiv:2404.02461  [pdf, other

    cs.LG eess.SP

    On the Efficiency and Robustness of Vibration-based Foundation Models for IoT Sensing: A Case Study

    Authors: Tomoyoshi Kimura, **yang Li, Tianshi Wang, Denizhan Kara, Yizhuo Chen, Yigong Hu, Ruijie Wang, Maggie Wigness, Shengzhong Liu, Mani Srivastava, Suhas Diggavi, Tarek Abdelzaher

    Abstract: This paper demonstrates the potential of vibration-based Foundation Models (FMs), pre-trained with unlabeled sensing data, to improve the robustness of run-time inference in (a class of) IoT applications. A case study is presented featuring a vehicle classification application using acoustic and seismic sensing. The work is motivated by the success of foundation models in the areas of natural lang… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  20. arXiv:2404.02159  [pdf, other

    cs.IT eess.SP

    Fairness-aware Age-of-Information Minimization in WPT-Assisted Short-Packet THz Communications for mURLLC

    Authors: Yao Zhu, Xiaopeng Yuan, Yulin Hu, Bo Ai, Ruikang Wang, Bin Han, Anke Schmeink

    Abstract: The technological landscape is swiftly advancing towards large-scale systems, creating significant opportunities, particularly in the domain of Terahertz (THz) communications. Networks designed for massive connectivity, comprising numerous Internet of Things (IoT) devices, are at the forefront of this advancement. In this paper, we consider Wireless Power Transfer (WPT)-enabled networks that suppo… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

  21. arXiv:2403.11102  [pdf, other

    cs.NI eess.SP

    Jointly Optimizing Terahertz based Sensing and Communications in Vehicular Networks: A Dynamic Graph Neural Network Approach

    Authors: Xuefei Li, Mingzhe Chen, Ye Hu, Zhilong Zhang, Danpu Liu, Shiwen Mao

    Abstract: In this paper, the problem of vehicle service mode selection (sensing, communication, or both) and vehicle connections within terahertz (THz) enabled joint sensing and communications over vehicular networks is studied. The considered network consists of several service provider vehicles (SPVs) that can provide: 1) only sensing service, 2) only communication service, and 3) both services, sensing s… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  22. arXiv:2403.08168  [pdf, other

    eess.SP

    Collaborative Automotive Radar Sensing via Mixed-Precision Distributed Array Completion

    Authors: Arian Eamaz, Farhang Yeganegi, Yunqiao Hu, Mojtaba Soltanalian, Shunqiao Sun

    Abstract: This paper investigates the effects of coarse quantization with mixed precision on measurements obtained from sparse linear arrays, synthesized by a collaborative automotive radar sensing strategy. The mixed quantization precision significantly reduces the data amount that needs to be shared from radar nodes to the fusion center for coherent processing. We utilize the low-rank properties inherent… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.05423

  23. arXiv:2403.01789  [pdf, other

    cs.CR eess.SY

    DECOR: Enhancing Logic Locking Against Machine Learning-Based Attacks

    Authors: Yinghua Hu, Kaixin Yang, Subhajit Dutta Chowdhury, Pierluigi Nuzzo

    Abstract: Logic locking (LL) has gained attention as a promising intellectual property protection measure for integrated circuits. However, recent attacks, facilitated by machine learning (ML), have shown the potential to predict the correct key in multiple LL schemes by exploiting the correlation of the correct key value with the circuit structure. This paper presents a generic LL enhancement method based… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 8 pages. Accepted at the International Symposium on Quality Electronic Design (ISQED), 2024

  24. arXiv:2403.00972  [pdf, other

    cs.GT eess.SY

    Understanding Police Force Resource Allocation using Adversarial Optimal Transport with Incomplete Information

    Authors: Yinan Hu, Juntao Chen, Quanyan Zhu

    Abstract: Adversarial optimal transport has been proven useful as a mathematical formulation to model resource allocation problems to maximize the efficiency of transportation with an adversary, who modifies the data. It is often the case, however, that only the adversary knows which nodes are malicious and which are not. In this paper we formulate the problem of seeking adversarial optimal transport into B… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  25. arXiv:2403.00434  [pdf, other

    cs.IT eess.SP

    Probabilistic Semantic Communication over Wireless Networks with Rate Splitting

    Authors: Zhouxiang Zhao, Zhaohui Yang, Ye Hu, Qianqian Yang, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, the problem of joint transmission and computation resource allocation for probabilistic semantic communication (PSC) system with rate splitting multiple access (RSMA) is investigated. In the considered model, the base station (BS) needs to transmit a large amount of data to multiple users with RSMA. Due to limited communication resources, the BS is required to utilize semantic commu… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  26. arXiv:2402.12820  [pdf, other

    eess.SY

    ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer

    Authors: Tong Xie, Yixuan Hu, Renjie Wei, Meng Li, Yuan Wang, Runsheng Wang, Ru Huang

    Abstract: Stochastic computing (SC) has emerged as a promising computing paradigm for neural acceleration. However, how to accelerate the state-of-the-art Vision Transformer (ViT) with SC remains unclear. Unlike convolutional neural networks, ViTs introduce notable compatibility and efficiency challenges because of their nonlinear functions, e.g., softmax and Gaussian Error Linear Units (GELU). In this pape… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted in DATE 2024

  27. arXiv:2402.10728  [pdf, other

    eess.IV cs.CV

    Semi-weakly-supervised neural network training for medical image registration

    Authors: Yiwen Li, Yunguan Fu, Iani J. M. B. Gayo, Qianye Yang, Zhe Min, Shaheer U. Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton, Matthew J. Clarkson, Dean C. Barratt, Victor A. Prisacariu, Yipeng Hu

    Abstract: For training registration networks, weak supervision from segmented corresponding regions-of-interest (ROIs) have been proven effective for (a) supplementing unsupervised methods, and (b) being used independently in registration tasks in which unsupervised losses are unavailable or ineffective. This correspondence-informing supervision entails cost in annotation that requires significant specialis… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  28. arXiv:2402.09181  [pdf, other

    eess.IV cs.CV

    OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

    Authors: Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, ** Luo

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this pape… ▽ More

    Submitted 21 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  29. arXiv:2402.06894  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

    Abstract: Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the divers… ▽ More

    Submitted 16 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: 18 pages, Accepted by ACL 2024. This work is open sourced at: https://github.com/YUCHEN005/GenTranslate

  30. arXiv:2402.05457  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

    Authors: Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng, Chao-Han Huck Yang

    Abstract: Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct map** from the N-best hypotheses list generated by an ASR system to the predicted output transcription. However, despite its effectiveness, GER introd… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR 2024, 17 pages. This work will be open sourced under MIT license

  31. arXiv:2402.00996  [pdf, other

    cs.CV eess.SP

    mmID: High-Resolution mmWave Imaging for Human Identification

    Authors: Sakila S. Jayaweera, Sai Deepika Regani, Yuqian Hu, Beibei Wang, K. J. Ray Liu

    Abstract: Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: This paper was published in the IEEE 9th World Forum on Internet of Things

  32. arXiv:2401.11377  [pdf, other

    eess.SP

    Joint User Scheduling and Computing Resource Allocation Optimization in Asynchronous Mobile Edge Computing Networks

    Authors: Yihan Cang, Ming Chen, Yi** Pan, Zhaohui Yang, Ye Hu, Haijian Sun, Mingzhe Chen

    Abstract: In this paper, the problem of joint user scheduling and computing resource allocation in asynchronous mobile edge computing (MEC) networks is studied. In such networks, edge devices will offload their computational tasks to an MEC server, using the energy they harvest from this server. To get their tasks processed on time using the harvested energy, edge devices will strategically schedule their t… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  33. arXiv:2401.10446  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng

    Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition results. The latest work proposes a GER benchmark with HyPoradise dataset to learn the map** from ASR N-best hypotheses to ground-truth transcription by e… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024, Spotlight top 5%, 24 pages. This work will be open sourced at: https://github.com/YUCHEN005/RobustGER under MIT license

  34. arXiv:2401.09119  [pdf, other

    eess.SP

    Anchor-points Assisted Uplink Sensing in Perceptive Mobile Networks

    Authors: Yanmo Hu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and ef… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 12 figures, journal paper

  35. arXiv:2401.09064  [pdf, other

    cs.IT eess.SP

    Performance Bounds and Optimization for CSI-Ratio based Bi-static Doppler Sensing in ISAC Systems

    Authors: Yanmo Hu, Kai Wu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the r… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 15 figures, journal paper

  36. arXiv:2401.03468  [pdf, other

    eess.AS cs.SD

    Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

    Authors: Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, Lirong Dai

    Abstract: Self-supervised speech pre-training methods have developed rapidly in recent years, which show to be very effective for many near-field single-channel speech tasks. However, far-field multichannel speech processing is suffering from the scarcity of labeled multichannel data and complex ambient noises. The efficacy of self-supervised learning for far-field multichannel and multi-modal speech proces… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  37. arXiv:2401.02099  [pdf

    cs.CV cs.SD eess.AS

    Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition

    Authors: Zeyu Li, Suncheng Xiang, Tong Yu, **gsheng Gao, Jiacheng Ruan, Yan** Hu, Ting Liu, Yuzhuo Fu

    Abstract: The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audi… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted by ICIC 2024

  38. arXiv:2312.16014  [pdf, other

    cs.CV eess.IV

    Passive Non-Line-of-Sight Imaging with Light Transport Modulation

    Authors: Jiarui Zhang, Ruixu Geng, Xiaolong Du, Yan Chen, Houqiang Li, Yang Hu

    Abstract: Passive non-line-of-sight (NLOS) imaging has witnessed rapid development in recent years, due to its ability to image objects that are out of sight. The light transport condition plays an important role in this task since changing the conditions will lead to different imaging models. Existing learning-based NLOS methods usually train independent models for different light transport conditions, whi… ▽ More

    Submitted 26 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  39. arXiv:2312.11947  [pdf, other

    cs.CL cs.SD eess.AS

    Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

    Authors: Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li

    Abstract: Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion mo… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 9 pages, 4 figures, Accepted by AAAI'2024, Code and audio samples: https://github.com/walker-hyf/ECSS

  40. arXiv:2312.05423  [pdf, other

    eess.SP

    Automotive Radar Sensing with Sparse Linear Arrays Using One-Bit Hankel Matrix Completion

    Authors: Arian Eamaz, Farhang Yeganegi, Yunqiao Hu, Shunqiao Sun, Mojtaba Soltanalian

    Abstract: The design of sparse linear arrays has proven instrumental in the implementation of cost-effective and efficient automotive radar systems for high-resolution imaging. This paper investigates the impact of coarse quantization on measurements obtained from such arrays. To recover azimuth angles from quantized measurements, we leverage the low-rank properties of the constructed Hankel matrix. In part… ▽ More

    Submitted 5 March, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

  41. arXiv:2312.02487  [pdf

    eess.SP

    Metasurface Sensing Approach to DOA Estimation of Coherent Signals

    Authors: Yishuo Zhao, Yan Hu, Yougen Xu

    Abstract: The DOA estimation method of coherent signals based on periodical coding metasurface is proposed. After periodical coding, the DOA information of incident signals in the time domain is represented as the amplitude and phase information at different frequency points in the frequency domain. Finite time Fourier transform (FTFT) is performed on the received signal and appropriate frequency points are… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  42. arXiv:2312.01544  [pdf, other

    cs.LG cs.AI eess.SY

    KEEC: Embed to Control on An Equivariant Geometry

    Authors: Xiaoyuan Cheng, Yiming Yang, Wei Jiang, Yukun Hu

    Abstract: This paper investigates how representation learning can enable optimal control in unknown and complex dynamics, such as chaotic and non-linear systems, without relying on prior domain knowledge of the dynamics. The core idea is to establish an equivariant geometry that is diffeomorphic to the manifold defined by a dynamical system and to perform optimal control within this corresponding geometry,… ▽ More

    Submitted 10 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  43. arXiv:2312.01338  [pdf, other

    eess.IV cs.CV cs.LG

    Enhancing and Adapting in the Clinic: Source-free Unsupervised Domain Adaptation for Medical Image Enhancement

    Authors: Heng Li, Ziqin Lin, Zhongxi Qiu, Zinan Li, Huazhu Fu, Yan Hu, Jiang Liu

    Abstract: Medical imaging provides many valuable clues involving anatomical structure and pathological characteristics. However, image degradation is a common issue in clinical practice, which can adversely impact the observation and diagnosis by physicians and algorithms. Although extensive enhancement models have been developed, these models require a well pre-training before deployment, while failing to… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: 14 pages, 9 figures, in IEEE Transactions on Medical Imaging

  44. arXiv:2312.00727  [pdf, other

    cs.LG cs.AI eess.SY

    Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space

    Authors: Xiaoyuan Cheng, Boli Chen, Liz Varga, Yukun Hu

    Abstract: This paper delves into the problem of safe reinforcement learning (RL) in a partially observable environment with the aim of achieving safe-reachability objectives. In traditional partially observable Markov decision processes (POMDP), ensuring safety typically involves estimating the belief in latent states. However, accurately estimating an optimal Bayesian filter in POMDP to infer latent states… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  45. arXiv:2311.18073  [pdf, other

    eess.IV

    DiffGEPCI: 3D MRI Synthesis from mGRE Signals using 2.5D Diffusion Model

    Authors: Yuyang Hu, Satya V. V. N. Kothapalli, Weijie Gan, Alexander L. Sukstanskii, Gregory F. Wu, Manu Goyal, Dmitriy A. Yablonskiy, Ulugbek S. Kamilov

    Abstract: We introduce a new framework called DiffGEPCI for cross-modality generation in magnetic resonance imaging (MRI) using a 2.5D conditional diffusion model. DiffGEPCI can synthesize high-quality Fluid Attenuated Inversion Recovery (FLAIR) and Magnetization Prepared-Rapid Gradient Echo (MPRAGE) images, without acquiring corresponding measurements, by leveraging multi-Gradient-Recalled Echo (mGRE) MRI… ▽ More

    Submitted 18 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  46. arXiv:2311.02003  [pdf, other

    eess.IV cs.CV

    A Structured Pruning Algorithm for Model-based Deep Learning

    Authors: Chicago Park, Weijie Gan, Zihao Zou, Yuyang Hu, Zhixin Sun, Ulugbek S. Kamilov

    Abstract: There is a growing interest in model-based deep learning (MBDL) for solving imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits the… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  47. arXiv:2310.17742  [pdf

    eess.AS cs.LG eess.SP

    BERT-PIN: A BERT-based Framework for Recovering Missing Data Segments in Time-series Load Profiles

    Authors: Yi Hu, Kai Ye, Hyeon** Kim, Ning Lu

    Abstract: Inspired by the success of the Transformer model in natural language processing and computer vision, this paper introduces BERT-PIN, a Bidirectional Encoder Representations from Transformers (BERT) powered Profile Inpainting Network. BERT-PIN recovers multiple missing data segments (MDSs) using load and temperature time-series profiles as inputs. To adopt a standard Transformer model structure for… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  48. arXiv:2310.13013  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Generative error correction for code-switching speech recognition using large language models

    Authors: Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Hexin Liu, Sabato Marco Siniscalchi, Eng Siong Chng

    Abstract: Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence. Despite the recent advances in automatic speech recognition (ASR), CS-ASR is still a challenging task ought to the grammatical structure complexity of the phenomenon and the data scarcity of specific training corpus. In this work, we propose to leverage large language models (LLMs) and lis… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP2024

  49. Long-term Dependency for 3D Reconstruction of Freehand Ultrasound Without External Tracker

    Authors: Qi Li, Ziyi Shen, Qian Li, Dean C. Barratt, Thomas Dowrick, Matthew J. Clarkson, Tom Vercauteren, Yipeng Hu

    Abstract: Objective: Reconstructing freehand ultrasound in 3D without any external tracker has been a long-standing challenge in ultrasound-assisted procedures. We aim to define new ways of parameterising long-term dependencies, and evaluate the performance. Methods: First, long-term dependency is encoded by transformation positions within a frame sequence. This is achieved by combining a sequence model wit… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to IEEE Transactions on Biomedical Engineering (TBME, 2023)

  50. arXiv:2310.05647  [pdf, other

    eess.IV cs.CV

    Exploiting Manifold Structured Data Priors for Improved MR Fingerprinting Reconstruction

    Authors: Peng Li, Yu** Ji, Yue Hu

    Abstract: Estimating tissue parameter maps with high accuracy and precision from highly undersampled measurements presents one of the major challenges in MR fingerprinting (MRF). Many existing works project the recovered voxel fingerprints onto the Bloch manifold to improve reconstruction performance. However, little research focuses on exploiting the latent manifold structure priors among fingerprints. To… ▽ More

    Submitted 16 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: 10 pages, 10 figures, will submit to IEEE Transactions on Medical Imaging

    ACM Class: I.4.5; I.2.6