Skip to main content

Showing 1–50 of 196 results for author: Huang, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19072  [pdf, other

    eess.SP

    Scatterer Recognition from LiDAR Point Clouds for Environment-Embedded Vehicular Channel Modeling via Synesthesia of Machines

    Authors: Ziwei Huang, Lu Bai, Zengrui Han, Xiang Cheng

    Abstract: In this paper, a novel environment-embedded vehicular channel model is proposed by scatterer recognition from light detection and ranging (LiDAR) point clouds via Synesthesia of Machines (SoM). To provide a robust data foundation, a new intelligent sensing-communication integration dataset in vehicular urban scenarios is constructed. Based on the constructed dataset, the complex SoM mechanism, i.e… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  3. arXiv:2406.14931  [pdf, other

    eess.SP

    Multi-beam Training for Near-field Communications in High-frequency Bands

    Authors: Cong Zhou, Changsheng You, Zixuan Huang, Shuo Shi, Yi Gong, Chan-Byoung Chae, Kaibin Huang

    Abstract: In this paper, we study efficient multi-beam training design for near-field communications to reduce the beam training overhead of conventional single-beam training methods. In particular, the array-division based multi-beam training method, which is widely used in far-field communications, cannot be directly applied to the near-field scenario, since different sub-arrays may observe different user… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: In this paper, a novel near-field multi-beam training scheme is proposed by sparsely activating a portion of antennas to form a sparse linear array

  4. arXiv:2406.14878  [pdf, other

    cs.CV cs.LG eess.IV

    MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

    Authors: Zhuoxiao Chen, Junjie Meng, Mahsa Baktashmotlagh, Zi Huang, Yadan Luo

    Abstract: LiDAR-based 3D object detection is pivotal across many applications, yet the performance of such detection systems often degrades after deployment, especially when faced with unseen test point clouds originating from diverse locations or subjected to corruption. In this work, we introduce a new online adaptation framework for detectors named Model Synergy (MOS). Specifically, MOS dynamically assem… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.08806  [pdf, ps, other

    eess.SY

    Adaptive Cooperative Streaming of Holographic Video Over Wireless Networks: A Proximal Policy Optimization Solution

    Authors: Wanli Wen, Ji** Yan, Yulu Zhang, Zhen Huang, Liang Liang, Yunjian Jia

    Abstract: Adapting holographic video streaming to fluctuating wireless channels is essential to maintain consistent and satisfactory Quality of Experience (QoE) for users, which, however, is a challenging task due to the dynamic and uncertain characteristics of wireless networks. To address this issue, we propose a holographic video cooperative streaming framework designed for a generic wireless network in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in IEEE Wireless Communications Letters

  6. arXiv:2406.03872  [pdf, other

    cs.CL cs.SD eess.AS

    BLSP-Emo: Towards Empathetic Large Speech-Language Models

    Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, Junhong Wu, Chengqing Zong, Jiajun Zhang

    Abstract: The recent release of GPT-4o showcased the potential of end-to-end multimodal models, not just in terms of low latency but also in their ability to understand and generate expressive speech with rich emotions. While the details are unknown to the open research community, it likely involves significant amounts of curated data and compute, neither of which is readily accessible. In this paper, we pr… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  7. arXiv:2406.03274  [pdf, other

    eess.AS cs.AI cs.SD

    Enhancing CTC-based speech recognition with diverse modeling units

    Authors: Shiyi Han, Zhihong Lei, Mingbin Xu, Xingyu Na, Zhen Huang

    Abstract: In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model's N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvem… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  8. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  9. arXiv:2405.19041  [pdf, other

    cs.CL cs.SD eess.AS

    BLSP-KD: Bootstrap** Language-Speech Pre-training via Knowledge Distillation

    Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, Jiajun Zhang

    Abstract: Recent end-to-end approaches have shown promise in extending large language models (LLMs) to speech inputs, but face limitations in directly assessing and optimizing alignment quality and fail to achieve fine-grained alignment due to speech-text length mismatch. We introduce BLSP-KD, a novel approach for Bootstrap** Language-Speech Pretraining via Knowledge Distillation, which addresses these li… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  10. arXiv:2405.16765  [pdf, ps, other

    cs.LG eess.SP

    Study of Robust Direction Finding Based on Joint Sparse Representation

    Authors: Y. Li, W. Xiao, L. Zhao, Z. Huang, Q. Li, L. Li, R. C. de Lamare

    Abstract: Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  11. arXiv:2405.09073  [pdf, other

    eess.SP

    Interpretable attributed scattering center extracted via deep unfolding

    Authors: Haodong Yang, Zhe Zhang, Zhongling Huang

    Abstract: Most existing sparse representation-based approaches for attributed scattering center (ASC) extraction adopt traditional iterative optimization algorithms, which suffer from lengthy computation times and limited precision. This paper presents a solution by introducing an interpretable network that can effectively and rapidly extract ASC via deep unfolding. Initially, we create a dictionary contain… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by IGARSS2024

  12. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  13. arXiv:2405.07777  [pdf, other

    cs.CV eess.IV

    GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images

    Authors: Xinying Wang, Zhixiong Huang, Sifan Zhang, Jiawen Zhu, Lin Feng

    Abstract: Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-l… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  14. arXiv:2405.00077  [pdf, other

    cs.LG eess.SP

    BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations

    Authors: Kaiqiao Han, Yi Yang, Zijie Huang, Xuan Kan, Yang Yang, Ying Guo, Lifang He, Liang Zhan, Yizhou Sun, Wei Wang, Carl Yang

    Abstract: Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes. However, widely used brain signals such as Blood Oxygen Level Dependent (BOLD) time series generated from functional Magnetic Resonance Imaging (fMRI) often manifest three challenges: (1) missing values, (2) irregular samp… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  15. arXiv:2404.17138  [pdf, other

    eess.SP

    Sub-6GHz Assisted mmWave Hybrid Beamforming with Heterogeneous Graph Neural Network

    Authors: Zhaohui Huang, Zhaocheng Wang, Sheng Chen

    Abstract: In next-generation communications, sub-6GHz and millimeter-wave (mmWave) links typically coexist, with the sub-6GHz link always active and the mmWave link active when high-rate transmission is required. Due to the spatial similarities between sub-6GHz and mmWave channels, sub-6GHz channel information can be utilized to support hybrid beamforming in mmWave communications to reduce overhead costs. W… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: This paper has been submitted to IEEE Transactions on Communications (IEEE TCOM)

  16. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  17. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhi**g Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  18. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  19. arXiv:2403.14185  [pdf, other

    eess.SP

    A LiDAR-Aided Channel Model for Vehicular Intelligent Sensing-Communication Integration

    Authors: Ziwei Huang, Lu Bai, Mingran Sun, Xiang Cheng

    Abstract: In this paper, a novel channel modeling approach, named light detection and ranging (LiDAR)-aided geometry-based stochastic modeling (LA-GBSM), is developed. Based on the developed LA-GBSM approach, a new millimeter wave (mmWave) channel model for sixth-generation (6G) vehicular intelligent sensing-communication integration is proposed, which can support the design of intelligent transportation sy… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  20. arXiv:2403.07247  [pdf, other

    eess.IV cs.CV cs.LG

    GuideGen: A Text-guided Framework for Joint CT Volume and Anatomical structure Generation

    Authors: Linrui Dai, Rongzhao Zhang, Zhongzhen Huang, Xiaofan Zhang

    Abstract: The annotation burden and extensive labor for gathering a large medical dataset with images and corresponding labels are rarely cost-effective and highly intimidating. This results in a lack of abundant training data that undermines downstream tasks and partially contributes to the challenge image analysis faces in the medical field. As a workaround, given the recent success of generative neural m… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: submitted to MICCAI2024

  21. arXiv:2402.19073  [pdf, other

    eess.SP

    Automatic Radar Signal Detection and FFT Estimation using Deep Learning

    Authors: Akila Pemasiri, Zi Huang, Fraser Williams, Ethan Goan, Simon Denman, Terrence Martin, Clinton Fookes

    Abstract: This paper addresses a critical preliminary step in radar signal processing: detecting the presence of a radar signal and robustly estimating its bandwidth. Existing methods which are largely statistical feature-based approaches face challenges in electronic warfare (EW) settings where prior information about signals is lacking. While alternate deep learning based methods focus on more challenging… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  22. arXiv:2402.17450  [pdf

    eess.SP

    Conformal Shield: A Novel Adversarial Attack Detection Framework for Automatic Modulation Classification

    Authors: Tailai Wen, Da Ke, Xiang Wang, Zhitao Huang

    Abstract: Deep learning algorithms have become an essential component in the field of cognitive radio, especially playing a pivotal role in automatic modulation classification. However, Deep learning also present risks and vulnerabilities. Despite their outstanding classification performance, they exhibit fragility when confronted with meticulously crafted adversarial examples, posing potential risks to the… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  23. arXiv:2402.09463  [pdf

    eess.IV

    Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results

    Authors: Kelly Payette, Céline Steger, Roxane Licandro, Priscille de Dumast, Hongwei Bran Li, Matthew Barkovich, Liu Li, Maik Dannecker, Chen Chen, Cheng Ouyang, Niccolò McConnell, Alina Miron, Yongmin Li, Alena Uus, Irina Grigorescu, Paula Ramirez Gilliland, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Haoyu Wang, Ziyan Huang, ** Ye, Mireia Alenyà, Valentin Comte, Oscar Camara , et al. (42 additional authors not shown)

    Abstract: Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Results from FeTA Challenge 2022, held at MICCAI; Manuscript submitted. Supplementary Info (including submission methods descriptions) available here: https://zenodo.org/records/10628648

  24. arXiv:2401.13998  [pdf, other

    eess.IV cs.CV

    WAL-Net: Weakly supervised auxiliary task learning network for carotid plaques classification

    Authors: Haitao Gan, Lingchao Fu, Ran Zhou, Weiyan Gan, Furong Wang, Xiaoyan Wu, Zhi Yang, Zhongwei Huang

    Abstract: The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this appro… ▽ More

    Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  25. arXiv:2401.09032  [pdf, other

    cs.RO cs.MA eess.SY

    Improved Consensus ADMM for Cooperative Motion Planning of Large-Scale Connected Autonomous Vehicles with Limited Communication

    Authors: Haichao Liu, Zhenmin Huang, Zicheng Zhu, Yulin Li, Shaojie Shen, Jun Ma

    Abstract: This paper investigates a cooperative motion planning problem for large-scale connected autonomous vehicles (CAVs) under limited communications, which addresses the challenges of high communication and computing resource requirements. Our proposed methodology incorporates a parallel optimization algorithm with improved consensus ADMM considering a more realistic locally connected topology network,… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 15 pages, 10 figures

  26. arXiv:2401.08476  [pdf, other

    cs.CR eess.SY

    Incentivizing Secure Software Development: The Role of Liability (Waiver) and Audit

    Authors: Ziyuan Huang, Gergely Biczók, Mingyan Liu

    Abstract: Misaligned incentives in secure software development have long been the focus of research in the economics of security. Product liability, a powerful legal framework in other industries, has been largely ineffective for software products until recent times. However, the rapid regulatory responses to recent global cyberattacks by both the United States and the European Union, together with the (rel… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 21 pages, 6 figures, submitted to the 23rd Workshop on the Economics of Information Security

    ACM Class: K.6.5; I.2.8

  27. arXiv:2401.04968  [pdf, other

    cs.RO eess.SY

    A Universal Cooperative Decision-Making Framework for Connected Autonomous Vehicles with Generic Road Topologies

    Authors: Zhenmin Huang, Shaojie Shen, Jun Ma

    Abstract: Cooperative decision-making of Connected Autonomous Vehicles (CAVs) presents a longstanding challenge due to its inherent nonlinearity, non-convexity, and discrete characteristics, compounded by the diverse road topologies encountered in real-world traffic scenarios. The majority of current methodologies are only applicable to a single and specific scenario, predicated on scenario-specific assumpt… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  28. arXiv:2312.13585  [pdf, other

    cs.CL cs.SD eess.AS

    Speech Translation with Large Language Models: An Industrial Practice

    Authors: Zhichao Huang, Rong Ye, Tom Ko, Qianqian Dong, Shanbo Cheng, Mingxuan Wang, Hang Li

    Abstract: Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long au… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Technical report. 13 pages. Demo: https://speechtranslation.github.io/llm-st/

  29. arXiv:2312.12358  [pdf, other

    cs.IT eess.SP

    Localization and Discrete Beamforming with a Large Reconfigurable Intelligent Surface

    Authors: Baojia Luo, Yili Deng, Miaomiao Dong, Zhongyi Huang, Xiang Chen, Wei Han, Bo Bai

    Abstract: In millimeter-wave (mmWave) cellular systems, reconfigurable intelligent surfaces (RISs) are foreseeably deployed with a large number of reflecting elements to achieve high beamforming gains. The large-sized RIS will make radio links fall in the near-field localization regime with spatial non-stationarity issues. Moreover, the discrete phase restriction on the RIS reflection coefficient incurs exp… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 13 pages

  30. arXiv:2312.09489  [pdf, other

    cs.LG eess.SP

    Multi-stage Learning for Radar Pulse Activity Segmentation

    Authors: Zi Huang, Akila Pemasiri, Simon Denman, Clinton Fookes, Terrence Martin

    Abstract: Radio signal recognition is a crucial function in electronic warfare. Precise identification and localisation of radar pulse activities are required by electronic warfare systems to produce effective countermeasures. Despite the importance of these tasks, deep learning-based radar pulse activity recognition methods have remained largely underexplored. While deep learning for radar modulation recog… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 5 pages, 8 figures

  31. arXiv:2311.12071  [pdf, other

    eess.IV cs.CV cs.LG

    Enhancing Low-dose CT Image Reconstruction by Integrating Supervised and Unsupervised Learning

    Authors: Ling Chen, Zhishen Huang, Yong Long, Saiprasad Ravishankar

    Abstract: Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent application of deep learning methods for image reconstruction provides a successful data-driven approach to addressing the challenges when reconstructing images with undersampled measurements or various types of noise. In this work, we propose a hybrid supervised-unsupervi… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: submitted to IEEE Transactions on Medical Imaging

  32. arXiv:2311.11969  [pdf, other

    eess.IV cs.CV

    SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

    Authors: ** Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao

    Abstract: Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowled… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  33. arXiv:2311.00697  [pdf, other

    cs.CL eess.AS

    End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

    Authors: Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

    Abstract: Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combin… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023. Code: https://github.com/amazon-science/stac-speech-translation

  34. arXiv:2310.16367  [pdf, other

    eess.AS

    UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

    Authors: Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu

    Abstract: The speech field is evolving to solve more challenging scenarios, such as multi-channel recordings with multiple simultaneous talkers. Given the many types of microphone setups out there, we present the UniX-Encoder. It's a universal encoder designed for multiple tasks, and worked with any microphone array, in both solo and multi-talker environments. Our research enhances previous multi-channel sp… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  35. arXiv:2310.14796  [pdf, other

    cs.SD cs.AI eess.AS

    A Novel Transfer Learning Method Utilizing Acoustic and Vibration Signals for Rotating Machinery Fault Diagnosis

    Authors: Zhongliang Chen, Zhuofei Huang, Wenxiong Kang

    Abstract: Fault diagnosis of rotating machinery plays a important role for the safety and stability of modern industrial systems. However, there is a distribution discrepancy between training data and data of real-world operation scenarios, which causing the decrease of performance of existing systems. This paper proposed a transfer learning based method utilizing acoustic and vibration signal to address th… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  36. arXiv:2310.13906  [pdf, other

    cs.CV eess.IV

    Exploring Driving Behavior for Autonomous Vehicles Based on Gramian Angular Field Vision Transformer

    Authors: Junwei You, Ying Chen, Zhuoyu Jiang, Zhangchi Liu, Zilin Huang, Yifeng Ding, Bin Ran

    Abstract: Effective classification of autonomous vehicle (AV) driving behavior emerges as a critical area for diagnosing AV operation faults, enhancing autonomous driving algorithms, and reducing accident rates. This paper presents the Gramian Angular Field Vision Transformer (GAF-ViT) model, designed to analyze AV driving behavior. The proposed GAF-ViT model consists of three key components: GAF Transforme… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  37. arXiv:2310.10095  [pdf, other

    eess.IV cs.CV cs.LG

    A Multi-Scale Spatial Transformer U-Net for Simultaneously Automatic Reorientation and Segmentation of 3D Nuclear Cardiac Images

    Authors: Yangfan Ni, Duo Zhang, Gege Ma, Lijun Lu, Zhongke Huang, Wentao Zhu

    Abstract: Accurate reorientation and segmentation of the left ventricular (LV) is essential for the quantitative analysis of myocardial perfusion imaging (MPI), in which one critical step is to reorient the reconstructed transaxial nuclear cardiac images into standard short-axis slices for subsequent image processing. Small-scale LV myocardium (LV-MY) region detection and the diverse cardiac structures of i… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 17 pages, 7 figures

  38. arXiv:2310.07062  [pdf, other

    cs.SD cs.LG eess.AS

    Acoustic Model Fusion for End-to-end Speech Recognition

    Authors: Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu

    Abstract: Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, tr… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  39. arXiv:2309.15697  [pdf, other

    cs.CV eess.IV

    Physics Inspired Hybrid Attention for SAR Target Recognition

    Authors: Zhongling Huang, Chong Wu, Xiwen Yao, Zhicheng Zhao, Xiankai Huang, Junwei Han

    Abstract: There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly dep… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  40. arXiv:2309.10738  [pdf, other

    cs.SD cs.AI cs.CL cs.IR cs.MM eess.AS

    MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

    Authors: Xinda Wu, Zhijie Huang, Kejun Zhang, Jiaxing Yu, Xu Tan, Tieyao Zhang, Zihao Wang, Lingyun Sun

    Abstract: Pre-trained language models have achieved impressive results in various music understanding and generation tasks. However, existing pre-training methods for symbolic melody generation struggle to capture multi-scale, multi-dimensional structural information in note sequences, due to the domain knowledge discrepancy between text and music. Moreover, the lack of available large-scale symbolic melody… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  41. arXiv:2309.07464  [pdf

    cs.RO eess.SY

    A Delay Compensation Framework Based on Eye-Movement for Teleoperated Ground Vehicles

    Authors: Qiang Zhang, Lingfang Yang, Zhi Huang, Xiaolin Song

    Abstract: An eye-movement-based predicted trajectory guidance control (ePTGC) is proposed to mitigate the maneuverability degradation of a teleoperated ground vehicle caused by communication delays. Human sensitivity to delays is the main reason for the performance degradation of a ground vehicle teleoperation system. The proposed framework extracts human intention from eye-movement. Then, it combines it wi… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 9 pages, 11 figures

  42. arXiv:2309.03906  [pdf, other

    eess.IV cs.CV

    A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation

    Authors: Ziyan Huang, Zhongying Deng, ** Ye, Haoyu Wang, Yanzhou Su, Tianbin Li, Hui Sun, Junlong Cheng, Jianpin Chen, Junjun He, Yun Gu, Shaoting Zhang, Lixu Gu, Yu Qiao

    Abstract: Although deep learning have revolutionized abdominal multi-organ segmentation, models often struggle with generalization due to training on small, specific datasets. With the recent emergence of large-scale datasets, some important questions arise: \textbf{Can models trained on these datasets generalize well on different ones? If yes/no, how to further improve their generalizability?} To address t… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  43. arXiv:2309.00916  [pdf, other

    cs.CL cs.SD eess.AS

    BLSP: Bootstrap** Language-Speech Pre-training via Behavior Alignment of Continuation Writing

    Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, **liang Lu, Junhong Wu, Yuchen Liu, Chengqing Zong, Jiajun Zhang

    Abstract: The emergence of large language models (LLMs) has sparked significant interest in extending their remarkable language capabilities to speech. However, modality alignment between speech and text still remains an open problem. Current solutions can be categorized into two strategies. One is a cascaded approach where outputs (tokens or states) of a separately trained speech recognition system are use… ▽ More

    Submitted 28 May, 2024; v1 submitted 2 September, 2023; originally announced September 2023.

  44. arXiv:2309.00169  [pdf, other

    eess.AS cs.LG cs.SD

    RepCodec: A Speech Representation Codec for Speech Tokenization

    Authors: Zhichao Huang, Chutong Meng, Tom Ko

    Abstract: With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing overall performance. To improve the performance of these discrete speech tokens, we present RepCodec, a novel speech representation codec for semantic speech token… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 August, 2023; originally announced September 2023.

  45. Structure-Aware Parametric Representations for Time-Resolved Light Transport

    Authors: Diego Royo, Zesheng Huang, Yun Liang, Boyan Song, Adolfo Muñoz, Diego Gutierrez, Julio Marco

    Abstract: Time-resolved illumination provides rich spatio-temporal information for applications such as accurate depth sensing or hidden geometry reconstruction, becoming a useful asset for prototy** and as input for data-driven approaches. However, time-resolved illumination measurements are high-dimensional and have a low signal-to-noise ratio, hampering their applicability in real scenarios. We propose… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  46. arXiv:2308.14946  [pdf, other

    cs.LG eess.IV

    Reinforcement Learning for Sampling on Temporal Medical Imaging Sequences

    Authors: Zhishen Huang

    Abstract: Accelerated magnetic resonance imaging resorts to either Fourier-domain subsampling or better reconstruction algorithms to deal with fewer measurements while still generating medical images of high quality. Determining the optimal sampling strategy given a fixed reconstruction protocol often has combinatorial complexity. In this work, we apply double deep Q-learning and REINFORCE algorithms to lea… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: ICML 2023 Workshop SODS

  47. arXiv:2308.05862  [pdf, other

    eess.IV cs.AI cs.CV

    Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Shihao Ma, Adamo Young, Cheng Zhu, Kangkang Meng, Xin Yang, Ziyan Huang, Fan Zhang, Wentao Liu, YuanKe Pan, Shou** Huang, Jiacheng Wang, Mingze Sun, Weixin Xu, Dengqiang Jia, Jae Won Choi, Natália Alves, Bram de Wilde, Gregor Koehler, Yajun Wu, Manuel Wiesenfarth, Qiongjie Zhu , et al. (4 additional authors not shown)

    Abstract: Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations,… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: MICCAI FLARE22: https://flare22.grand-challenge.org/

  48. arXiv:2308.05757  [pdf, other

    eess.SP cs.DC cs.IT cs.LG

    OrcoDCS: An IoT-Edge Orchestrated Online Deep Compressed Sensing Framework

    Authors: Cheng-Wei Ching, Chirag Gupta, Zi Huang, Liting Hu

    Abstract: Compressed data aggregation (CDA) over wireless sensor networks (WSNs) is task-specific and subject to environmental changes. However, the existing compressed data aggregation (CDA) frameworks (e.g., compressed sensing-based data aggregation, deep learning(DL)-based data aggregation) do not possess the flexibility and adaptivity required to handle distinct sensing tasks and environmental changes.… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: 6 pages, 8 figures, to appear in 2023 IEEE International Conference on Distributed Computing Systems Workshop on ECAI

  49. arXiv:2308.03056  [pdf, other

    eess.SY

    Optimal battery thermal management for electric vehicles with battery degradation minimization

    Authors: Yue Wu, Zhiwu Huang, Dongjun Li, Heng Li, Jun Peng, Daniel Stroe, Ziyou Song

    Abstract: The control of a battery thermal management system (BTMS) is essential for the thermal safety, energy efficiency, and durability of electric vehicles (EVs) in hot weather. To address the battery cooling optimization problem, this paper utilizes dynamic programming (DP) to develop an online rule-based control strategy. Firstly, an electrical-thermal-aging model of the $\rm LiFePO_4$ battery pack is… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  50. arXiv:2308.02190  [pdf, other

    cs.SD cs.CL eess.AS

    Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition

    Authors: Jiaxin Ye, Yujie Wei, Xin-Cheng Wen, Chenglong Ma, Zhizhong Huang, Kunhong Liu, Hongming Shan

    Abstract: Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring speech emotion from a well-labeled corpus to an unlabeled one, which is a rather challenging task due to the significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle to learn corpus-invariant features by global distribution alignment, bu… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023