Skip to main content

Showing 1–50 of 761 results for author: Liu, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00743  [pdf, other

    cs.MM cs.AI cs.CL eess.AS

    AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations

    Authors: Sheng Wu, Jiaxing Liu, Longbiao Wang, Dongxiao He, Xiaobao Wang, Jianwu Dang

    Abstract: Emotion Recognition in Conversations (ERC) is a popular task in natural language processing, which aims to recognize the emotional state of the speaker in conversations. While current research primarily emphasizes contextual modeling, there exists a dearth of investigation into effective multimodal fusion methods. We propose a novel framework called AIMDiT to solve the problem of multimodal fusion… ▽ More

    Submitted 12 April, 2024; originally announced July 2024.

  2. arXiv:2407.00297  [pdf

    eess.IV cs.CV

    UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

    Authors: Guanghao Zhu, Lin Liu, **g Zhang, Xiaohui Du, Ruqian Hao, Juanxiu Liu

    Abstract: Facial nerve segmentation is crucial for preoperative path planning in cochlear implantation surgery. Recently, researchers have proposed some segmentation methods, such as atlas-based and deep learning-based methods. However, since the facial nerve is a tubular organ with a diameter of only 1.0-1.5mm, it is challenging to locate and segment the facial nerve in CT scans. In this work, we propose a… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  3. arXiv:2407.00042  [pdf

    q-bio.NC cs.SI eess.SY

    Module control of network analysis in psychopathology

    Authors: Chunyu Pan, Quan Zhang, Yue Zhu, Shengzhou Kong, Juan Liu, Changsheng Zhang, Fei Wang, Xizhe Zhang

    Abstract: The network approach to characterizing psychopathology departs from traditional latent categorical and dimensional approaches. Causal interplay among symptoms contributed to dynamic psychopathology system. Therefore, analyzing the symptom clusters is critical for understanding mental disorders. Furthermore, despite extensive research studying the topological features of symptom networks, the contr… ▽ More

    Submitted 30 May, 2024; originally announced July 2024.

  4. arXiv:2406.19649  [pdf

    eess.IV cs.CV

    AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

    Authors: Guanghao Zhu, **g Zhang, Juanxiu Liu, Xiaohui Du, Ruqian Hao, Yong Liu, Lin Liu

    Abstract: Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling. However, many SSL approaches only pay attention to low-level consistency and overlook the significance of pseudo-label reliability. Therefore, in this work, we propose an adversarial self-training consistency framework (AstMatch). First… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  5. arXiv:2406.18310  [pdf, other

    cs.CV cs.LG eess.IV

    Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution

    Authors: Wenting Chen, Jie Liu, Tommy W. S. Chow, Yixuan Yuan

    Abstract: Pathology image are essential for accurately interpreting lesion cells in cytopathology screening, but acquiring high-resolution digital slides requires specialized equipment and long scanning times. Though super-resolution (SR) techniques can alleviate this problem, existing deep learning models recover pathology image in a black-box manner, which can lead to untruthful biological details and mis… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE TRANSACTIONS ON MEDICAL IMAGING (TMI)

  6. arXiv:2406.18069  [pdf, other

    eess.SP cs.AI cs.CL

    Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

    Authors: Zengding Liu, Chen Chen, Jiannong Cao, Minglei Pan, Jikui Liu, Nan Li, Fen Miao, Ye Li

    Abstract: Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood press… ▽ More

    Submitted 26 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  7. arXiv:2406.17932  [pdf, other

    cs.RO cs.MM cs.SD eess.AS

    SonicSense: Object Perception from In-Hand Acoustic Vibration

    Authors: Jiaxun Liu, Boyuan Chen

    Abstract: We introduce SonicSense, a holistic design of hardware and software to enable rich robot object perception through in-hand acoustic vibration sensing. While previous studies have shown promising results with acoustic sensing for object perception, current solutions are constrained to a handful of objects with simple geometries and homogeneous materials, single-finger sensing, and mixing training a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Our project website is at: http://generalroboticslab.com/SonicSense

  8. arXiv:2406.17286  [pdf

    cs.RO eess.SY

    Prioritized experience replay-based DDQN for Unmanned Vehicle Path Planning

    Authors: Liu Lipeng, Letian Xu, Jiabei Liu, Haopeng Zhao, Tongzhou Jiang, Tianyao Zheng

    Abstract: Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it wi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 4 pages, 6 figures, 2024 5th International Conference on Information Science, Parallel and Distributed Systems

  9. arXiv:2406.13275  [pdf, other

    cs.SD cs.CL eess.AS

    Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

    Authors: Jizhong Liu, Gang Li, Junbo Zhang, Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Yujun Wang, Bin Wang

    Abstract: Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language. Recently, the advancements in large language models (LLMs), with improvements in training approaches for audio encoders, have opened up possibilities for improving AAC. Thus, we explore enhancing AAC from three aspects: 1) a pre-trained audio encoder via consistent ensemble distillation (CED)… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  10. arXiv:2406.12628  [pdf, other

    eess.SY

    Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics

    Authors: Chenggang Cui, Jiaming Liu, Junkang Feng, Peifeng Hui, Amer M. Y. M. Ghias, Chuanlin Zhang

    Abstract: Power electronics, a critical component in modern power systems, face several challenges in control design, including model uncertainties, and lengthy and costly design cycles. This paper is aiming to propose a Large Language Models (LLMs) based multi-agent framework for objective-oriented control design in power electronics. The framework leverages the reasoning capabilities of LLMs and a multi-a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 6 pages, 6 figures

  11. arXiv:2406.11364  [pdf, other

    cs.SD eess.AS

    AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

    Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, **yi Fan

    Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  12. arXiv:2406.10932  [pdf, other

    cs.SD cs.AI eess.AS

    Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition

    Authors: Wenhan Yao, Jiangkun Yang, Yongqiang He, Jia Liu, Wei** Wen

    Abstract: Speech recognition is an essential start ring of human-computer interaction, and recently, deep learning models have achieved excellent success in this task. However, when the model training and private data provider are always separated, some security threats that make deep neural networks (DNNs) abnormal deserve to be researched. In recent years, the typical backdoor attacks have been researched… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  13. arXiv:2406.10325  [pdf, other

    cs.CL cs.LG eess.AS

    Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment

    Authors: Joseph Liu, Mahesh Kumar Nandwana, Janne Pylkkönen, Hannes Heikinheimo, Morgan McGuire

    Abstract: Toxicity classification for voice heavily relies on the semantic content of speech. We propose a novel framework that utilizes cross-modal learning to integrate the semantic embedding of text into a multilabel speech toxicity classifier during training. This enables us to incorporate textual information during training while still requiring only audio during inference. We evaluate this classifier… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  14. arXiv:2406.10223  [pdf, other

    cs.LG cs.SD eess.AS

    Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

    Authors: Nameer Hirschkind, Xiao Yu, Mahesh Kumar Nandwana, Joseph Liu, Eloi DuBois, Dao Le, Nicolas Thiebaut, Colin Sinclair, Kyle Spence, Charles Shang, Zoe Abrams, Morgan McGuire

    Abstract: We introduce DiffuseST, a low-latency, direct speech-to-speech translation system capable of preserving the input speaker's voice zero-shot while translating from multiple source languages into English. We experiment with the synthesizer component of the architecture, comparing a Tacotron-based synthesizer to a novel diffusion-based synthesizer. We find the diffusion-based synthesizer to improve M… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Published in Interspeech 2024

  15. arXiv:2406.09873  [pdf, other

    eess.AS cs.AI cs.SD

    Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

    Authors: Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui Chen, Lan Wang, Xunying Liu, Feng Tian

    Abstract: Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by interspeech 2024

  16. arXiv:2406.08928  [pdf, other

    cs.CV eess.IV

    Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer

    Authors: Guodong Sun, Junjie Liu, Mingxuan Liu, Moyun Liu, Yang Zhang

    Abstract: Self-supervised monocular depth estimation aims to infer depth information without relying on labeled data. However, the lack of labeled information poses a significant challenge to the model's representation, limiting its ability to capture the intricate details of the scene accurately. Prior information can potentially mitigate this issue, enhancing the model's understanding of scene structure a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 28 pages, 12 figures

  17. arXiv:2406.08196  [pdf, other

    cs.SD eess.AS

    FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

    Authors: Yuanjun Lv, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie

    Abstract: Vocoders reconstruct speech waveforms from acoustic features and play a pivotal role in modern TTS systems. Frequent-domain GAN vocoders like Vocos and APNet2 have recently seen rapid advancements, outperforming time-domain models in inference speed while achieving comparable audio quality. However, these frequency-domain vocoders suffer from large parameter sizes, thus introducing extra memory bu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 5 figures

  18. arXiv:2406.07061  [pdf, other

    eess.IV cs.CV

    Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments

    Authors: Gan Gao, Andrew H. Song, Fiona Wang, David Brenes, Rui Wang, Sarah S. L. Chow, Kevin W. Bishop, Lawrence D. True, Faisal Mahmood, Jonathan T. C. Liu

    Abstract: Accurate patient diagnoses based on human tissue biopsies are hindered by current clinical practice, where pathologists assess only a limited number of thin 2D tissue slices sectioned from 3D volumetric tissue. Recent advances in non-destructive 3D pathology, such as open-top light-sheet microscopy, enable comprehensive imaging of spatially heterogeneous tissue morphologies, offering the feasibili… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR CVMI 2024

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 6955-6965

  19. arXiv:2406.07012  [pdf, other

    cs.SD cs.CL eess.AS

    Bridging Language Gaps in Audio-Text Retrieval

    Authors: Zhiyong Yan, Heinrich Dinkel, Yongqing Wang, Jizhong Liu, Junbo Zhang, Yujun Wang, Bin Wang

    Abstract: Audio-text retrieval is a challenging task, requiring the search for an audio clip or a text caption within a database. The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data. To address these linguistic disparities, we propose a language enhancement (LE), using a multi… ▽ More

    Submitted 16 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: interspeech2024

  20. arXiv:2406.03157  [pdf, other

    eess.SP cs.LG

    A Combination Model Based on Sequential General Variational Mode Decomposition Method for Time Series Prediction

    Authors: Wei Chen, Yuanyuan Yang, Jianyu Liu

    Abstract: Accurate prediction of financial time series is a key concern for market economy makers and investors. The article selects online store sales and Australian beer sales as representatives of non-stationary, trending, and seasonal financial time series, and constructs a new SGVMD-ARIMA combination model in a non-linear combination way to predict financial time series. The ARIMA model, LSTM model, an… ▽ More

    Submitted 7 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  21. arXiv:2406.03144  [pdf, other

    eess.SP cs.LG

    A Combination Model for Time Series Prediction using LSTM via Extracting Dynamic Features Based on Spatial Smoothing and Sequential General Variational Mode Decomposition

    Authors: Jianyu Liu, Wei Chen, Yong Zhang, Zhenfeng Chen, Bin Wan, **wei Hu

    Abstract: In order to solve the problems such as difficult to extract effective features and low accuracy of sales volume prediction caused by complex relationships such as market sales volume in time series prediction, we proposed a time series prediction method of market sales volume based on Sequential General VMD and spatial smoothing Long short-term memory neural network (SS-LSTM) combination model. Fi… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  22. arXiv:2406.02640  [pdf, other

    eess.IV physics.med-ph physics.optics

    Ghost imaging-based Non-contact Heart Rate Detection

    Authors: Jianming Yu, Yuchen He, Bin Li, Hui Chen, Huaibin Zheng, Jianbin Liu, Zhuo Xu

    Abstract: Remote heart rate measurement is an increasingly concerned research field, usually using remote photoplethysmography (rPPG) to collect heart rate information through video data collection. However, in certain specific scenarios (such as low light conditions, intense lighting, and non-line-of-sight situations), traditional imaging methods fail to capture image information effectively, that may lead… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 4 pages, 6 figures

  23. arXiv:2406.00626  [pdf, other

    cs.MM cs.SD eess.AS

    Intelligent Text-Conditioned Music Generation

    Authors: Zhouyao Xie, Nikhil Yadala, Xinyi Chen, **g Xi Liu

    Abstract: CLIP (Contrastive Language-Image Pre-Training) is a multimodal neural network trained on (text, image) pairs to predict the most relevant text caption given an image. It has been used extensively in image generation by connecting its output with a generative model such as VQGAN, with the most notable example being OpenAI's DALLE-2. In this project, we apply a similar approach to bridge the gap bet… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  24. arXiv:2405.20502  [pdf, ps, other

    eess.SY math.DS math.OC

    Reach-Avoid Control Synthesis for a Quadrotor UAV with Formal Safety Guarantees

    Authors: Mohamed Serry, Haocheng Chang, Jun Liu

    Abstract: Reach-avoid specifications are one of the most common tasks in autonomous aerial vehicle (UAV) applications. Despite the intensive research and development associated with control of aerial vehicles, generating feasible trajectories though complex environments and tracking them with formal safety guarantees remain challenging. In this paper, we propose a control framework for a quadrotor UAV that… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  25. arXiv:2405.18356  [pdf, other

    eess.IV cs.CV

    Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

    Authors: Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan, Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang, Zongwei Zhou

    Abstract: The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to Medical Image Analysis

  26. arXiv:2405.15517  [pdf, other

    eess.IV cs.CV cs.LG

    Erase to Enhance: Data-Efficient Machine Unlearning in MRI Reconstruction

    Authors: Yuyang Xue, **gshuai Liu, Steven McDonagh, Sotirios A. Tsaftaris

    Abstract: Machine unlearning is a promising paradigm for removing unwanted data samples from a trained model, towards ensuring compliance with privacy regulations and limiting harmful biases. Although unlearning has been shown in, e.g., classification and recommendation systems, its potential in medical image-to-image translation, specifically in image recon-struction, has not been thoroughly investigated.… ▽ More

    Submitted 18 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: The paper is accpeted by MIDL 2024

  27. arXiv:2405.15163  [pdf, other

    quant-ph eess.SY

    Provably Quantum-Secure Microgrids through Enhanced Quantum Distributed Control

    Authors: Pouya Babahajiani, Peng Zhang, Ji Liu, Tzu-Chieh Wei

    Abstract: Distributed control of multi-inverter microgrids has attracted considerable attention as it can achieve the combined goals of flexible plug-and-play architecture guaranteeing frequency and voltage regulation while preserving power sharing among nonidentical distributed energy resources (DERs). However, it turns out that cybersecurity has emerged as a serious concern in distributed control schemes.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  28. arXiv:2405.13018  [pdf, other

    cs.CL cs.AI eess.AS

    Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings

    Authors: Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, **g Liu, Carol Espy-Wilson

    Abstract: Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec2.0-ba… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  29. arXiv:2405.12584  [pdf, other

    eess.IV cs.CV cs.LG

    Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?

    Authors: Ziqin Lin, Heng Li, Zinan Li, Huazhu Fu, Jiang Liu

    Abstract: Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supe… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures

  30. arXiv:2405.12408  [pdf, other

    cs.RO eess.SY

    Flexible Active Safety Motion Control for Robotic Obstacle Avoidance: A CBF-Guided MPC Approach

    Authors: **hao Liu, Jun Yang, Jianliang Mao, Tianqi Zhu, Qihang Xie, Yimeng Li, Xiangyu Wang, Shihua Li

    Abstract: A flexible active safety motion (FASM) control approach is proposed for the avoidance of dynamic obstacles and the reference tracking in robot manipulators. The distinctive feature of the proposed method lies in its utilization of control barrier functions (CBF) to design flexible CBF-guided safety criteria (CBFSC) with dynamically optimized decay rates, thereby offering flexibility and active saf… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 11 pages, 11 figures

  31. arXiv:2405.11064  [pdf, other

    eess.SP cs.CV

    TVCondNet: A Conditional Denoising Neural Network for NMR Spectroscopy

    Authors: Zihao Zou, Shirin Shoushtari, Jiaming Liu, Jialiang Zhang, Patrick Judge, Emilia Santana, Alison Lim, Marcus Foston, Ulugbek S. Kamilov

    Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy is a widely-used technique in the fields of bio-medicine, chemistry, and biology for the analysis of chemicals and proteins. The signals from NMR spectroscopy often have low signal-to-noise ratio (SNR) due to acquisition noise, which poses significant challenges for subsequent analysis. Recent work has explored the potential of deep learning (DL) for N… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  32. arXiv:2405.10825  [pdf, other

    eess.SY cs.LG

    Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

    Authors: Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili **, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu

    Abstract: Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks bas… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  33. arXiv:2405.08423  [pdf, other

    eess.IV cs.CV

    NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

    Authors: Yihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming Liu

    Abstract: Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  34. arXiv:2405.06516  [pdf, ps, other

    cs.IT eess.SP

    An Efficient Algorithm for Sum-Rate Maximization in Fluid Antenna-Assisted ISAC System

    Authors: Qian Zhang, Mingjie Shao, Tong Zhang, Gaojie Chen, Ju Liu

    Abstract: In this letter, we investigate the fluid antenna (FA)-assisted integrated sensing and communication (ISAC) system, where communication and radar sensing employ the co-waveform design. Specifically, we focus on the beamformer design and antenna position configuration to realize a higher communication rate while guaranteeing the minimum radar probing power. Different from existing beamformer algorit… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  35. arXiv:2405.03254  [pdf

    eess.AS

    Automatic Assessment of Dysarthria Using Audio-visual Vowel Graph Attention Network

    Authors: Xiaokang Liu, Xiaoxia Du, Juan Liu, Rongfeng Su, Manwa Lawrence Ng, Yumei Zhang, Yudong Yang, Shaofeng Zhao, Lan Wang, Nan Yan

    Abstract: Automatic assessment of dysarthria remains a highly challenging task due to high variability in acoustic signals and the limited data. Currently, research on the automatic assessment of dysarthria primarily focuses on two approaches: one that utilizes expert features combined with machine learning, and the other that employs data-driven deep learning methods to extract representations. Research ha… ▽ More

    Submitted 6 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures, 7 tables

  36. arXiv:2404.19615  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    SemiPL: A Semi-supervised Method for Event Sound Source Localization

    Authors: Yue Li, Baiqiao Yin, **fu Liu, Jiajun Wen, Jiaying Lin, Mengyuan Liu

    Abstract: In recent years, Event Sound Source Localization has been widely applied in various fields. Recent works typically relying on the contrastive learning framework show impressive performance. However, all work is based on large relatively simple datasets. It's also crucial to understand and analyze human behaviors (actions and interactions of people), voices, and sounds in chaotic events in many app… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  37. arXiv:2404.19182  [pdf, other

    eess.SP

    Robust Proximity Detection using On-Device Gait Monitoring

    Authors: Yuqian Hu, Guozhen Zhu, Beibei Wang, K. J. Ray Liu

    Abstract: Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: This work has been accepted in IEEE 9th World Forum on Internet of Things (WFIoT)

  38. arXiv:2404.13677  [pdf, other

    cs.CV eess.IV

    A Dataset and Model for Realistic License Plate Deblurring

    Authors: Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, **gxin Liu, Siqi Huang, Hongbin Liu

    Abstract: Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we int… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  39. arXiv:2404.11070  [pdf

    cs.CV eess.SP

    Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon

    Authors: **grong Wang, Bo Xu, Ronghe **, Shoujian Zhang, Kefu Gao, **gnan Liu

    Abstract: Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Full… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  40. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  41. arXiv:2404.09729  [pdf

    eess.SP cs.IT cs.LG stat.ME

    Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis

    Authors: Shuaicong Hu, Yanan Wang, Jian Liu, **gyu Lin, Shengmei Qin, Zhenning Nie, Zhifeng Yao, Wenjie Cai, Cuiwei Yang

    Abstract: Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG mor… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 16 pages, 12 figures

    ACM Class: I.5.2

  42. arXiv:2404.08285  [pdf

    cs.CV cs.AI eess.SY

    A Survey of Neural Network Robustness Assessment in Image Recognition

    Authors: Jie Wang, Jun Ai, Minyan Lu, Haoran Su, Dan Yu, Yutao Zhang, Junda Zhu, **gyu Liu

    Abstract: In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models.… ▽ More

    Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: Corrected typos and grammatical errors in Section 5

  43. arXiv:2404.07989  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.SD eess.AS

    Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

    Authors: Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

    Abstract: Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantl… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point

  44. arXiv:2404.07121  [pdf, other

    cs.IT eess.SP

    Digital Over-the-Air Computation: Achieving High Reliability via Bit-Slicing

    Authors: Jiawei Liu, Yi Gong, Kaibin Huang

    Abstract: 6G mobile networks aim to realize ubiquitous intelligence at the network edge via distributed learning, sensing, and data analytics. Their common operation is to aggregate high-dimensional data, which causes a communication bottleneck that cannot be resolved using traditional orthogonal multi-access schemes. A promising solution, called over-the-air computation (AirComp), exploits channels' wavefo… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  45. arXiv:2404.06393  [pdf, other

    cs.SD cs.AI eess.AS

    MuPT: A Generative Symbolic Music Pretrained Transformer

    Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

    Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  46. arXiv:2404.04916  [pdf, other

    eess.IV cs.CV cs.LG

    Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder

    Authors: Yiyang Ma, Wenhan Yang, Jiaying Liu

    Abstract: The images produced by diffusion models can attain excellent perceptual quality. However, it is challenging for diffusion models to guarantee distortion, hence the integration of diffusion models and image compression models still needs more comprehensive explorations. This paper presents a diffusion-based image compression method that employs a privileged end-to-end decoder model as correction, w… ▽ More

    Submitted 2 May, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by ICML 2024

  47. arXiv:2404.02663  [pdf

    eess.SP cs.IT

    Ground-to-UAV sub-Terahertz channel measurement and modeling

    Authors: Da Li, Peian Li, Jiabiao Zhao, Jianjian Liang, Jiacheng Liu, Guohao Liu, Yuanshuai Lei, Wenbo Liu, Jianqin Deng, Fuyong Liu, Jianjun Ma

    Abstract: Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless… ▽ More

    Submitted 28 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Submitted to Optics Express

  48. arXiv:2404.02661  [pdf

    physics.app-ph eess.SP

    Terahertz channel modeling based on surface sensing characteristics

    Authors: Jiayuan Cui, Da Li, Jiabiao Zhao, Jiacheng Liu, Guohao Liu, Xiangkun He, Yue Su, Fei Song, Peian Li, Jianjun Ma

    Abstract: The dielectric properties of environmental surfaces, including walls, floors and the ground, etc., play a crucial role in sha** the accuracy of terahertz (THz) channel modeling, thereby directly impacting the effectiveness of communication systems. Traditionally, acquiring these properties has relied on methods such as terahertz time-domain spectroscopy (THz-TDS) or vector network analyzers (VNA… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Submitted to Nano Communication Networks

  49. arXiv:2404.02555  [pdf

    eess.SY cs.LG

    An Interpretable Power System Transient Stability Assessment Method with Expert Guiding Neural-Regression-Tree

    Authors: Hanxuan Wang, Na Lu, Zixuan Wang, Jiacheng Liu, Jun Liu

    Abstract: Deep learning based transient stability assessment (TSA) has achieved great success, yet the lack of interpretability hinders its industrial application. Although a great number of studies have tried to explore the interpretability of network solutions, many problems still remain unsolved: (1) the difference between the widely accepted power system knowledge and the generated interpretive rules is… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  50. arXiv:2404.01468  [pdf, other

    eess.SY math.DS stat.AP

    Performance triggered adaptive model reduction for soil moisture estimation in precision irrigation

    Authors: Sarupa Debnath, Bernard T. Agyeman, Soumya R. Sahoo, Xunyuan Yin, **feng Liu

    Abstract: Accurate soil moisture information is crucial for develo** precise irrigation control strategies to enhance water use efficiency. Soil moisture estimation based on limited soil moisture sensors is crucial for obtaining comprehensive soil moisture information when dealing with large-scale agricultural fields. The major challenge in soil moisture estimation lies in the high dimensionality of the s… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.