Skip to main content

Showing 1–50 of 466 results for author: Liu, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19305  [pdf, other

    eess.SY

    A Max Pressure Algorithm for Traffic Signals Considering Pedestrian Queues

    Authors: Hao Liu, Vikash V. Gayah, Michael Levin

    Abstract: This paper proposes a novel max-pressure (MP) algorithm that incorporates pedestrian traffic into the MP control architecture. Pedestrians are modeled as being included in one of two groups: those walking on sidewalks and those queued at intersections waiting to cross. Traffic dynamics models for both groups are developed. Under the proposed control policy, the signal timings are determined based… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19269  [pdf, other

    eess.SY

    OCC-MP: A Max-Pressure framework to prioritize transit and high occupancy vehicles

    Authors: Tanveer Ahmed, Hao Liu, Vikash V. Gayah

    Abstract: Max-pressure (MP) is a decentralized adaptive traffic signal control approach that has been shown to maximize throughput for private vehicles. However, MP-based signal control algorithms do not differentiate the movement of transit vehicles from private vehicles or between high and single-occupancy private vehicles. Prioritizing the movement of transit or other high occupancy vehicles (HOVs) is vi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.17800  [pdf, other

    q-bio.QM cs.SD eess.AS

    Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

    Authors: Meng Cui, Xubo Liu, Haohe Liu, **zheng Zhao, Daoliang Li, Wenwu Wang

    Abstract: Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which are essential for optimizing production efficiency, enhancing fish welfare, and improving resource management. Previous reviews have focused on single… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.16058  [pdf, other

    eess.AS

    Text-Queried Target Sound Event Localization

    Authors: **zheng Zhao, Xinyuan Qian, Yong Xu, Haohe Liu, Yin Cao, Davide Berghi, Wenwu Wang

    Abstract: Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target sound event localization (SEL), a new paradigm that allows the user to input the… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by EUSIPCO 2024

  5. arXiv:2406.15716  [pdf, other

    eess.IV cs.CV

    Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge

    Authors: Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz

    Abstract: Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the flu… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  6. arXiv:2406.14977  [pdf, other

    cs.AI eess.IV

    Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data

    Authors: Shan Cong, Zhoujie Fan, Hongwei Liu, Yinghan Zhang, Xin Wang, Haoran Luo, Xiaohui Yao

    Abstract: Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Qiaozhi Tan, Tong Chen, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, **lin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  8. arXiv:2406.12596  [pdf, ps, other

    eess.SP

    Beyond Near-Field: Far-Field Location Division Multiple Access in Downlink MIMO Systems

    Authors: Haoyan Liu, Caijian Jie, Min Yang, Chengguang Li

    Abstract: Exploring channel dimensions has been the driving force behind breakthroughs in successive generations of mobile communication systems. In 5G, space division multiple access (SDMA) leveraging massive MIMO has been crucial in enhancing system capacity through spatial differentiation of users. However, SDMA can only finely distinguish users at adjacent angles in ultra-dense networks by extremely lar… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  9. arXiv:2406.12254  [pdf, other

    eess.IV cs.CV

    Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

    Authors: Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael Kim, Shunxing Bao, Ann Xenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  10. arXiv:2406.11568  [pdf, other

    cs.CL cs.SD eess.AS q-bio.NC

    Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

    Authors: Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang

    Abstract: In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis. Our methodology leverages the comprehensive reasoning abilities of large language models (LLMs) to facilitate direct decoding. By fully integrating LLMs, we achieve results comparable to the state-of-the-art cascade m… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.11336  [pdf, other

    eess.SY

    LFPLM: A General and Flexible Load Forecasting Framework based on Pre-trained Language Model

    Authors: Mingyang Gao, Suyang Zhou, Wei Gu, Zhi Wu, Zijian Hu, Hong Zhu, Haiquan Liu

    Abstract: Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures and 5 tables

  12. arXiv:2406.08837  [pdf

    eess.IV cs.CV cs.LG

    Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network

    Authors: Houze Liu, Iris Li, Yaxin Liang, Dan Sun, Yining Yang, Haowei Yang

    Abstract: Neural networks with relatively shallow layers and simple structures may have limited ability in accurately identifying pneumonia. In addition, deep neural networks also have a large demand for computing resources, which may cause convolutional neural networks to be unable to be implemented on terminals. Therefore, this paper will carry out the optimal classification of convolutional neural networ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  13. arXiv:2406.06979  [pdf, other

    cs.LG cs.CR cs.SD eess.AS

    AudioMarkBench: Benchmarking Robustness of Audio Watermarking

    Authors: Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong

    Abstract: The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present A… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  14. Multi-Objective Sizing Optimization Method of Microgrid Considering Cost and Carbon Emissions

    Authors: Xiang Zhu, Guangchun Ruan, Hua Geng, Honghai Liu, Mingfei Bai, Chao Peng

    Abstract: Microgrid serves as a promising solution to integrate and manage distributed renewable energy resources. In this paper, we establish a stochastic multi-objective sizing optimization (SMOSO) model for microgrid planning, which fully captures the battery degradation characteristics and the total carbon emissions. The microgrid operator aims to simultaneously maximize the economic benefits and minimi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industry Applications

  15. arXiv:2406.06295  [pdf, other

    cs.SD eess.AS

    Zero-Shot Audio Captioning Using Soft and Hard Prompts

    Authors: Yiming Zhang, Xuenan Xu, Ruoyi Du, Haohe Liu, Yuan Dong, Zheng-Hua Tan, Wenwu Wang, Zhanyu Ma

    Abstract: In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these model… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  16. arXiv:2406.02918  [pdf, other

    eess.IV cs.CV

    U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation

    Authors: Chenxin Li, Xinyu Liu, Wuyang Li, Cheng Wang, Hengyu Liu, Yixuan Yuan

    Abstract: U-Net has become a cornerstone in various visual applications such as image segmentation and diffusion probability models. While numerous innovative designs and improvements have been introduced by incorporating transformers or MLPs, the networks are still limited to linearly modeling patterns as well as the deficient interpretability. To address these challenges, our intuition is inspired by the… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  17. arXiv:2406.00356  [pdf, other

    eess.AS cs.SD

    AudioLCM: Text-to-Audio Generation with Latent Consistency Models

    Authors: Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao

    Abstract: Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the forefront of various generative tasks. However, their iterative sampling process poses a significant computational burden, resulting in slow generation speeds and limiting their application in text-to-audio generation deployment. In this work, we introduce AudioLCM, a novel consistency-based model tailored for efficie… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  18. arXiv:2405.16258  [pdf, other

    cs.LG cs.AI eess.SY

    USD: Unsupervised Soft Contrastive Learning for Fault Detection in Multivariate Time Series

    Authors: Hong Liu, Xiuxiu Qiu, Yiming Shi, Zelin Zang

    Abstract: Unsupervised fault detection in multivariate time series is critical for maintaining the integrity and efficiency of complex systems, with current methodologies largely focusing on statistical and machine learning techniques. However, these approaches often rest on the assumption that data distributions conform to Gaussian models, overlooking the diversity of patterns that can manifest in both nor… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 19 pages, 7 figures, under review

  19. arXiv:2405.12996  [pdf, other

    eess.IV

    Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data

    Authors: Huidong Xie, Weijie Gan, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Benjamin A. Spencer, Reimund Bayerlein, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang, Ramsey D. Badawi, Chi Liu

    Abstract: As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 16 Pages, 15 Figures, 4 Tables. Paper under review. arXiv admin note: substantial text overlap with arXiv:2311.04248

  20. arXiv:2405.12609  [pdf, other

    eess.AS cs.SD

    Mamba in Speech: Towards an Alternative to Self-Attention

    Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

    Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and comp… ▽ More

    Submitted 23 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  21. arXiv:2405.12464  [pdf, other

    eess.SY

    Evaluation of Connected Vehicle Identification-Aware Mixed Traffic Freeway Cooperative Merging

    Authors: Haoji Liu, Fatemeh Jahedinia, Zeyu Mu, B. Brian Park

    Abstract: Cooperative on-ramp merging control for connected automated vehicles (CAVs) has been extensively investigated. However, they did neglect the connected vehicle identification process, which is a must for CAV cooperations. In this paper, we introduced a connected vehicle identification system (VIS) into the on-ramp merging control process for the first time and proposed an evaluation framework to as… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: @2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  22. arXiv:2405.12357  [pdf

    eess.IV cs.CV

    Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI

    Authors: Di Xu, Xin Miao, Hengjie Liu, Jessica E. Scholey, Wensha Yang, Mary Feng, Michael Ohliger, Hui Lin, Yi Lao, Yang Yang, Ke Sheng

    Abstract: Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI rec… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  23. arXiv:2405.10948  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

    Authors: Guankun Wang, Long Bai, Wan Jun Nah, Jie Wang, Zhaoxi Zhang, Zhen Chen, **lin Wu, Mobarakol Islam, Hongbin Liu, Hongliang Ren

    Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recogni… ▽ More

    Submitted 22 March, 2024; originally announced May 2024.

  24. arXiv:2405.10606  [pdf, other

    eess.SP

    Carrier Aggregation Enabled MIMO-OFDM Integrated Sensing and Communication

    Authors: Haotian Liu, Zhiqing Wei, **ghui Piao, Huici Wu, Xingwang Li, Zhiyong Feng

    Abstract: In the evolution towards the forthcoming era of sixth-generation (6G) mobile communication systems characterized by ubiquitous intelligence, integrated sensing and communication (ISAC) is in a phase of burgeoning development. However, the capabilities of communication and sensing within single frequency band fall short of meeting the escalating demands. To this end, this paper introduces a carrier… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 13page, 9figures, Submitted to IEEE Transactions on Wireless Communications

  25. arXiv:2405.09179  [pdf, other

    eess.SP

    Integrated Sensing and Communication Enabled Cooperative Passive Sensing Using Mobile Communication System

    Authors: Zhiqing Wei, Haotian Liu, Hujun Li, Wangjun Jiang, Zhiyong Feng, Huici Wu, ** Zhang

    Abstract: Integrated sensing and communication (ISAC) is a potential technology of the sixth-generation (6G) mobile communication system, which enables communication base station (BS) with sensing capability. However, the performance of single-BS sensing is limited, which can be overcome by multi-BS cooperative sensing. There are three types of multi-BS cooperative sensing, including cooperative active sens… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 figures, Submitted to IEEE Transactions on Mobile Computing

  26. arXiv:2405.02873  [pdf, other

    eess.SP

    Target Localization with Macro and Micro Base Stations Cooperative Sensing

    Authors: Haotian Liu, Zhiqing Wei, Furong Yang, Huici Wu, Kaifeng Han, Zhiyong Feng

    Abstract: Addressing the communication and sensing demands of sixth-generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered traction in academia and industry. With the sensing limitation of single base station (BS), multi-BS cooperative sensing is regarded as a promising solution. The coexistence and overlapped coverage of macro BS (MBS) and micro BS (MiBS) are… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 7 pages 6 figures, submitted to 2024 IEEE GLOBECOM

  27. arXiv:2405.02788  [pdf, other

    eess.SP

    Antenna Failure Resilience: Deep Learning-Enabled Robust DOA Estimation with Single Snapshot Sparse Arrays

    Authors: Ruxin Zheng, Shunqiao Sun, Hongshan Liu, Honglei Chen, Mojtaba Soltanalian, Jian Li

    Abstract: Recent advancements in Deep Learning (DL) for Direction of Arrival (DOA) estimation have highlighted its superiority over traditional methods, offering faster inference, enhanced super-resolution, and robust performance in low Signal-to-Noise Ratio (SNR) environments. Despite these advancements, existing research predominantly focuses on multi-snapshot scenarios, a limitation in the context of aut… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Invited paper for IEEE Asilomar conference 2024

  28. arXiv:2405.01104  [pdf, other

    cs.IT eess.SP

    Multi-user ISAC through Stacked Intelligent Metasurfaces: New Algorithms and Experiments

    Authors: Ziqing Wang, Hongzheng Liu, Jianan Zhang, Ru**g Xiong, Kai Wan, Xuewen Qian, Marco Di Renzo, Robert Caiming Qiu

    Abstract: This paper investigates a Stacked Intelligent Metasurfaces (SIM)-assisted Integrated Sensing and Communications (ISAC) system. An extended target model is considered, where the BS aims to estimate the complete target response matrix relative to the SIM. Under the constraints of minimum Signal-to-Interference-plus-Noise Ratio (SINR) for the communication users (CUs) and maximum transmit power, we j… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  29. arXiv:2405.00316  [pdf, other

    cs.RO eess.SY

    Enhance Planning with Physics-informed Safety Controller for End-to-end Autonomous Driving

    Authors: Hang Zhou, Haichao Liu, Hongliang Lu, Dan Xu, Jun Ma, Yiding Ji

    Abstract: Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have… ▽ More

    Submitted 5 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  30. arXiv:2405.00233  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

    Authors: Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these chal… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Demo and code: https://haoheliu.github.io/SemantiCodec/

  31. arXiv:2404.17806  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

    Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

    Abstract: Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introd… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Preprint submitted to IEEE MLSP 2024

  32. arXiv:2404.15284  [pdf, other

    eess.SP cs.AI

    Global 4D Ionospheric STEC Prediction based on DeepONet for GNSS Rays

    Authors: Dijia Cai, Zenghui Shi, Haiyang Fu, Huan Liu, Hongyi Qian, Yun Sui, Feng Xu, Ya-Qiu **

    Abstract: The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. Th… ▽ More

    Submitted 12 March, 2024; originally announced April 2024.

  33. arXiv:2404.14700  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    FlashSpeech: Efficient Zero-Shot Speech Synthesis

    Authors: Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large… ▽ More

    Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Efficient zero-shot speech synthesis

  34. arXiv:2404.13677  [pdf, other

    cs.CV eess.IV

    A Dataset and Model for Realistic License Plate Deblurring

    Authors: Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, **gxin Liu, Siqi Huang, Hongbin Liu

    Abstract: Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we int… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  35. arXiv:2404.08667  [pdf, other

    eess.SY stat.AP

    Traffic State Estimation and Uncertainty Quantification at Signalized Intersections with Low Penetration Rate Vehicle Trajectory Data

    Authors: Xingmin Wang, Zihao Wang, Zachary Jerome, Henry X. Liu

    Abstract: This paper studies the traffic state estimation problem at signalized intersections with low penetration rate vehicle trajectory data. While many existing studies have proposed different methods to estimate unknown traffic states and parameters (e.g., penetration rate, queue length) with this data, most of them only provide a point estimation without knowing the uncertainty of these estimated valu… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  36. arXiv:2404.07556  [pdf, other

    eess.IV cs.CV

    Attention-Aware Laparoscopic Image Desmoking Network with Lightness Embedding and Hybrid Guided Embedding

    Authors: Ziteng Liu, Jiahua Zhu, Bainan Liu, Hao Liu, Wenpeng Gao, Yili Fu

    Abstract: This paper presents a novel method of smoke removal from the laparoscopic images. Due to the heterogeneous nature of surgical smoke, a two-stage network is proposed to estimate the smoke distribution and reconstruct a clear, smoke-free surgical scene. The utilization of the lightness channel plays a pivotal role in providing vital information pertaining to smoke density. The reconstruction of smok… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: ISBI2024

  37. arXiv:2404.01110  [pdf, other

    cs.RO eess.SY

    A Center-of-Mass Shifting Aerial Manipulation Platform for Heavy-Tool Handling on Non-Horizontal Surfaces

    Authors: Tong Hui, Stefan Rucareanu, Haotian Liu, Matteo Fumagalli

    Abstract: Aerial vehicles equipped with manipulators can serve contact-based industrial applications, where fundamental tasks like drilling and grinding often necessitate aerial platforms to handle heavy tools. Industrial environments often involve non-horizontal surfaces. Existing aerial manipulation platforms based on multirotors typically feature a fixed CoM (Center of Mass) within the rotor-defined area… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Journal ref: This paper is under submission to IEEE Robotics and Auotmation Letter, 2024

  38. arXiv:2403.19983  [pdf, other

    eess.IV cs.CV

    A multi-stage semi-supervised learning for ankle fracture classification on CT images

    Authors: Hongzhi Liu, Guicheng Li, Jiacheng Nie, Hui Tang, Chunfeng Yang, Qian** Feng, Hailin Xu, Yang Chen

    Abstract: Because of the complicated mechanism of ankle injury, it is very difficult to diagnose ankle fracture in clinic. In order to simplify the process of fracture diagnosis, an automatic diagnosis model of ankle fracture was proposed. Firstly, a tibia-fibula segmentation network is proposed for the joint tibiofibular region of the ankle joint, and the corresponding segmentation dataset is established o… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  39. arXiv:2403.17337  [pdf, other

    eess.SY eess.SP

    Destination-Constrained Linear Dynamical System Modeling in Set-Valued Frameworks

    Authors: Xiaowei Yang, Haiqi Liu, Fanqin Meng, Xiao**g Shen

    Abstract: Directional motion towards a specified destination is a common occurrence in physical processes and human societal activities. Utilizing this prior information can significantly improve the control and predictive performance of system models. This paper primarily focuses on reconstructing linear dynamic system models based on destination constraints in the set-valued framework. We treat destinatio… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 15 pages, 11 figures

  40. arXiv:2403.15379  [pdf

    physics.med-ph eess.IV

    Time-efficient, high-resolution 3T whole-brain relaxometry using Cartesian 3D MR-STAT with CSF suppression

    Authors: Hongyan Liu, Edwin Versteeg, Miha Fuderer, Oscar van der Heide, Martin B. Schilder, Cornelis A. T. van den Berg, Alessandro Sbrizzi

    Abstract: Purpose: Current 3D Magnetic Resonance Spin TomogrAphy in Time-domain (MR-STAT) protocols use transient-state, gradient-spoiled gradient-echo sequences that are prone to cerebrospinal fluid (CSF) pulsation artifacts when applied to the brain. This study aims at develo** a 3D MR-STAT protocol for whole-brain relaxometry that overcomes the challenges posed by CSF-induced ghosting artifacts. Method… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  41. arXiv:2403.14943  [pdf, ps, other

    cs.IT eess.SP

    Primary Rate Maximization in Movable Antennas Empowered Symbiotic Radio Communications

    Authors: Bin Lyu, Hao Liu, Wenqing Hong, Shimin Gong, Feng Tian

    Abstract: In this paper, we propose a movable antenna (MA) empowered scheme for symbiotic radio (SR) communication systems. Specifically, multiple antennas at the primary transmitter (PT) can be flexibly moved to favorable locations to boost the channel conditions of the primary and secondary transmissions. The primary transmission is achieved by the active transmission from the PT to the primary user (PU),… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: To appear in IEEE VTC-Spring 2024. 6 Pages,5 figures

  42. arXiv:2403.09527  [pdf, other

    eess.AS

    WavCraft: Audio Editing and Generation with Large Language Models

    Authors: **hua Liang, Huan Zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

    Abstract: We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural language and prompts the LLM conditioned on audio descriptions and user requests. WavCraft leverages the in-context learning ability of the LLM to decompo… ▽ More

    Submitted 10 May, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  43. arXiv:2403.07364  [pdf, other

    eess.IV

    Hybrid Kinetics Embedding Framework for Dynamic PET Reconstruction

    Authors: Yubo Ye, Huafeng Liu, Linwei Wang

    Abstract: In dynamic positron emission tomography (PET) reconstruction, the importance of leveraging the temporal dependence of the data has been well appreciated. Current deep-learning solutions can be categorized in two groups in the way the temporal dynamics is modeled: data-driven approaches use spatiotemporal neural networks to learn the temporal dynamics of tracer kinetics from data, which relies heav… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Under Review

  44. arXiv:2403.05887  [pdf, other

    eess.AS

    Aligning Speech to Languages to Enhance Code-switching Speech Recognition

    Authors: Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

    Abstract: Code-switching (CS) refers to the switching of languages within a speech signal and results in language confusion for automatic speech recognition (ASR). To address language confusion, we propose the language alignment loss that performs frame-level language identification using pseudo language labels learned from the ASR decoder. This eliminates the need for frame-level language annotations. To f… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Manuscript submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  45. arXiv:2403.02565  [pdf, other

    eess.SP

    Deep Cooperation in ISAC System: Resource, Node and Infrastructure Perspectives

    Authors: Zhiqing Wei, Haotian Liu, Zhiyong Feng, Huici Wu, Fan Liu, Qixun Zhang, Yucong Du

    Abstract: With the emerging Integrated Sensing and Communication (ISAC) technique, exploiting the mobile communication system with multi-domain resources, multiple network elements, and large-scale infrastructures to realize cooperative sensing is a crucial approach satisfying the requirements of high-accuracy and large-scale sensing in IoE. In this article, the deep cooperation in ISAC system including thr… ▽ More

    Submitted 29 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 8 pages and 6 figures, Accepted by IEEE Internet of Things Magazine

  46. arXiv:2403.01265  [pdf, other

    cs.RO eess.SY

    Smooth Computation without Input Delay: Robust Tube-Based Model Predictive Control for Robot Manipulator Planning

    Authors: Yu Luo, Qie Sima, Tianying Ji, Fuchun Sun, Hua** Liu, Jianwei Zhang

    Abstract: Model Predictive Control (MPC) has exhibited remarkable capabilities in optimizing objectives and meeting constraints. However, the substantial computational burden associated with solving the Optimal Control Problem (OCP) at each triggering instant introduces significant delays between state sampling and control application. These delays limit the practicality of MPC in resource-constrained syste… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2103.09693

  47. arXiv:2403.00370  [pdf, other

    cs.CL cs.SD eess.AS

    Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview

    Authors: Heyang Liu, Yu Wang, Yanfeng Wang

    Abstract: End-to-end (E2E) approach is gradually replacing hybrid models for automatic speech recognition (ASR) tasks. However, the optimization of E2E models lacks an intuitive method for handling decoding shifts, especially in scenarios with a large number of domain-specific rare words that hold specific important meanings. Furthermore, the absence of knowledge-intensive speech datasets in academia has be… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  48. arXiv:2402.17200  [pdf, other

    cs.CV eess.IV

    Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain

    Authors: Qunliang Xing, Mai Xu, Shengxi Li, Xin Deng, Meisong Zheng, Huaida Liu, Ying Chen

    Abstract: Existing quality enhancement methods for compressed images focus on aligning the enhancement domain with the raw domain to yield realistic images. However, these methods exhibit a pervasive enhancement bias towards the compression domain, inadvertently regarding it as more realistic than the raw domain. This bias makes enhanced images closely resemble their compressed counterparts, thus degrading… ▽ More

    Submitted 19 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024

  49. arXiv:2402.15516  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model

    Authors: Haocheng Liu, Teysir Baoueb, Mathieu Fontaine, Jonathan Le Roux, Gael Richard

    Abstract: Diffusion models are receiving a growing interest for a variety of signal generation tasks such as speech or music synthesis. WaveGrad, for example, is a successful diffusion model that conditionally uses the mel spectrogram to guide a diffusion process for the generation of high-fidelity audio. However, such models face important challenges concerning the noise diffusion process for training and… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: Accepted at ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2024, Seoul (Korea), South Korea

  50. arXiv:2402.13798  [pdf, other

    eess.SY

    AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC

    Authors: Haobo Liu, Zhengyang Qian, Wei Wu, Hongwei Ren, Zhiwei Liu, Leibin Ni

    Abstract: Power consumption has become the major concern in neural network accelerators for edge devices. The novel non-volatile-memory (NVM) based computing-in-memory (CIM) architecture has shown great potential for better energy efficiency. However, most of the recent NVM-CIM solutions mainly focus on fixed-point calculation and are not applicable to floating-point (FP) processing. In this paper, we propo… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted by DATE 2024