Skip to main content

Showing 1–50 of 227 results for author: Liu, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17002  [pdf, other

    eess.SP cs.LG cs.NE stat.AP

    Benchmarking mortality risk prediction from electrocardiograms

    Authors: Platon Lukyanenko, Joshua Mayourian, Mingxuan Liu, John K. Triedman, Sunil J. Ghelani, William G. La Cava

    Abstract: Several recent high-impact studies leverage large hospital-owned electrocardiographic (ECG) databases to model and predict patient mortality. MIMIC-IV, released September 2023, is the first comparable public dataset and includes 800,000 ECGs from a U.S. hospital system. Previously, the largest public ECG dataset was Code-15, containing 345,000 ECGs collected during routine care in Brazil. These da… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 9 pages plus appendix, 2 figures

  2. arXiv:2406.12646  [pdf, other

    eess.IV cs.AI cs.CV

    An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

    Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Ya**g Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

    Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to MICCAI-2024

  3. arXiv:2406.11914  [pdf, other

    cs.LG cs.ET eess.SP

    Initial Investigation of Kolmogorov-Arnold Networks (KANs) as Feature Extractors for IMU Based Human Activity Recognition

    Authors: Mengxi Liu, Daniel Geißler, Dominique Nshimyimana, Sizhen Bian, Bo Zhou, Paul Lukowicz

    Abstract: In this work, we explore the use of a novel neural network architecture, the Kolmogorov-Arnold Networks (KANs) as feature extractors for sensor-based (specifically IMU) Human Activity Recognition (HAR). Where conventional networks perform a parameterized weighted sum of the inputs at each node and then feed the result into a statically defined nonlinearity, KANs perform non-linear computations rep… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: This paper is under review

  4. arXiv:2406.08928  [pdf, other

    cs.CV eess.IV

    Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer

    Authors: Guodong Sun, Junjie Liu, Mingxuan Liu, Moyun Liu, Yang Zhang

    Abstract: Self-supervised monocular depth estimation aims to infer depth information without relying on labeled data. However, the lack of labeled information poses a significant challenge to the model's representation, limiting its ability to capture the intricate details of the scene accurately. Prior information can potentially mitigate this issue, enhancing the model's understanding of scene structure a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 28 pages, 12 figures

  5. arXiv:2406.07498  [pdf, other

    cs.SD eess.AS

    RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's perfor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  6. arXiv:2406.02557  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    EVAN: Evolutional Video Streaming Adaptation via Neural Representation

    Authors: Mufan Liu, Le Yang, Yiling Xu, Ye-kui Wang, Jenq-Neng Hwang

    Abstract: Adaptive bitrate (ABR) using conventional codecs cannot further modify the bitrate once a decision has been made, exhibiting limited adaptation capability. This may result in either overly conservative or overly aggressive bitrate selection, which could cause either inefficient utilization of the network bandwidth or frequent re-buffering, respectively. Neural representation for video (NeRV), whic… ▽ More

    Submitted 15 April, 2024; originally announced June 2024.

    Comments: accepted by ICME (conference)

  7. arXiv:2406.01646  [pdf, other

    cs.LG cs.AI eess.SP

    iKAN: Global Incremental Learning with KAN for Human Activity Recognition Across Heterogeneous Datasets

    Authors: Mengxi Liu, Sizhen Bian, Bo Zhou, Paul Lukowicz

    Abstract: This work proposes an incremental learning (IL) framework for wearable sensor human activity recognition (HAR) that tackles two challenges simultaneously: catastrophic forgetting and non-uniform inputs. The scalable framework, iKAN, pioneers IL with Kolmogorov-Arnold Networks (KAN) to replace multi-layer perceptrons as the classifier that leverages the local plasticity and global stability of spli… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This work is submitted to Ubicomp/ISWC24 and is under review

  8. arXiv:2406.00085  [pdf, other

    eess.IV cs.LG q-bio.NC

    Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

    Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

    Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  9. arXiv:2405.17004  [pdf, other

    cs.CV eess.IV

    Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

    Authors: Yang Zhang, Mingying Li, Huilin Pan, Moyun Liu, Yang Zhou

    Abstract: Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neu… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 11 pages, 8 figures

  10. arXiv:2405.16980  [pdf, other

    cs.CV eess.IV

    DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking

    Authors: Hongtao Wang, Rongyu Feng, Liangyi Wu, Mutian Liu, Yinuo Cui, Chunxia Zhang, Zhenbo Guo

    Abstract: In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2405.14336  [pdf, other

    eess.IV

    I$^2$VC: A Unified Framework for Intra- & Inter-frame Video Compression

    Authors: Meiqin Liu, Chenming Xu, Yukai Gu, Chao Yao, Yao Zhao

    Abstract: Video compression aims to reconstruct seamless frames by encoding the motion and residual information from existing frames. Previous neural video compression methods necessitate distinct codecs for three types of frames (I-frame, P-frame and B-frame), which hinders a unified approach and generalization across different video contexts. Intra-codec techniques lack the advanced Motion Estimation and… ▽ More

    Submitted 1 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 19 pages, 10 figures

  12. arXiv:2405.09586  [pdf, other

    eess.IV cs.AI cs.CV

    Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation

    Authors: Kang Liu, Zhuoqi Ma, Mengmeng Liu, Zhicheng Jiao, Xiaolu Kang, Qiguang Miao, Kun Xie

    Abstract: The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal a… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  13. arXiv:2405.06178  [pdf, other

    eess.IV cs.LG q-bio.NC

    ACTION: Augmentation and Computation Toolbox for Brain Network Analysis with Functional MRI

    Authors: Yuqi Fang, Junhao Zhang, Linmin Wang, Qianqian Wang, Mingxia Liu

    Abstract: Functional magnetic resonance imaging (fMRI) has been increasingly employed to investigate functional brain activity. Many fMRI-related software/toolboxes have been developed, providing specialized algorithms for fMRI analysis. However, existing toolboxes seldom consider fMRI data augmentation, which is quite useful, especially in studies with limited or imbalanced data. Moreover, current studies… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, 5 tables

  14. arXiv:2405.02504  [pdf, other

    eess.IV cs.CV

    Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI

    Authors: Minhui Yu, Mengqi Wu, Ling Yue, Andrea Bozoki, Mingxia Liu

    Abstract: Magnetic resonance imaging (MRI) and positron emission tomography (PET) are increasingly used in multimodal analysis of neurodegenerative disorders. While MRI is broadly utilized in clinical settings, PET is less accessible. Many studies have attempted to use deep generative models to synthesize PET from MRI scans. However, they often suffer from unstable training and inadequately preserve brain f… ▽ More

    Submitted 8 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  15. arXiv:2405.01750  [pdf, other

    eess.IV cs.CV

    PointCompress3D -- A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems

    Authors: Walter Zimmer, Ramandika Pranamulia, Xingcheng Zhou, Mingyu Liu, Alois C. Knoll

    Abstract: In the context of Intelligent Transportation Systems (ITS), efficient data compression is crucial for managing large-scale point cloud data acquired by roadside LiDAR sensors. The demand for efficient storage, streaming, and real-time object detection capabilities for point cloud data is substantial. This work introduces PointCompress3D, a novel point cloud compression framework tailored specifica… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  16. arXiv:2404.19615  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    SemiPL: A Semi-supervised Method for Event Sound Source Localization

    Authors: Yue Li, Baiqiao Yin, **fu Liu, Jiajun Wen, Jiaying Lin, Mengyuan Liu

    Abstract: In recent years, Event Sound Source Localization has been widely applied in various fields. Recent works typically relying on the contrastive learning framework show impressive performance. However, all work is based on large relatively simple datasets. It's also crucial to understand and analyze human behaviors (actions and interactions of people), voices, and sounds in chaotic events in many app… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  17. arXiv:2404.16324  [pdf, other

    math.NA cs.LG eess.SP

    Improved impedance inversion by deep learning and iterated graph Laplacian

    Authors: Davide Bianchi, Florian Bossmann, Wenlong Wang, Mingming Liu

    Abstract: Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results. We present a hybrid method that combines deep learning with iterated graph Lap… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Report number: submitted to SEG Geophysics (June 2024)

  18. arXiv:2404.12062  [pdf, other

    cs.SD cs.CV cs.GR eess.AS

    MIDGET: Music Conditioned 3D Dance Generation

    Authors: **wu Wang, Wei Mao, Miaomiao Liu

    Abstract: In this paper, we introduce a MusIc conditioned 3D Dance GEneraTion model, named MIDGET based on Dance motion Vector Quantised Variational AutoEncoder (VQ-VAE) model and Motion Generative Pre-Training (GPT) model to generate vibrant and highquality dances that match the music rhythm. To tackle challenges in the field, we introduce three new components: 1) a pre-trained memory codebook based on the… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures Published in AI 2023: Advances in Artificial Intelligence

    Journal ref: In Australasian Joint Conference on Artificial Intelligence (pp. 277-288). Singapore: Springer Nature Singapore 2023

  19. Recovery from Adversarial Attacks in Cyber-physical Systems: Shallow, Deep and Exploratory Works

    Authors: Pengyuan Lu, Lin Zhang, Mengyu Liu, Kaustubh Sridhar, Fanxin Kong, Oleg Sokolsky, Insup Lee

    Abstract: Cyber-physical systems (CPS) have experienced rapid growth in recent decades. However, like any other computer-based systems, malicious attacks evolve mutually, driving CPS to undesirable physical states and potentially causing catastrophes. Although the current state-of-the-art is well aware of this issue, the majority of researchers have not focused on CPS recovery, the procedure we defined as r… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Journal ref: ACM Computing Surveys 1 (2024) 1-31

  20. arXiv:2404.01082  [pdf, other

    eess.IV

    The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

    Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Li** Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

    Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More

    Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 25 pages, 17 figures

  21. arXiv:2403.11061  [pdf, other

    eess.SP

    Beamforming Design for Double-Active-RIS-aided Communication Systems with Inter-Excitation

    Authors: Boshi Wang, Cunhua Pan, Hong Ren, Zhiyuan Yu, Yang Zhang, Mengyu Liu, Gui Zhou

    Abstract: In this paper, we investigate a double-active-reconfigurable intelligent surface (RIS)-aided downlink wireless communication system, where a multi-antenna base station (BS) serves multiple single-antenna users with both double reflection and single reflection links. Due to the signal amplification capability of active RISs, the mutual influence between active RISs, which is termed as the "inter-ex… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  22. arXiv:2403.08162  [pdf, other

    eess.IV cs.CV cs.LG

    Iterative Learning for Joint Image Denoising and Motion Artifact Correction of 3D Brain MRI

    Authors: Lintao Zhang, Mengqi Wu, Lihong Wang, David C. Steffens, Guy G. Potter, Mingxia Liu

    Abstract: Image noise and motion artifacts greatly affect the quality of brain MRI and negatively influence downstream medical image analysis. Previous studies often focus on 2D methods that process each volumetric MR image slice-by-slice, thus losing important 3D anatomical information. Additionally, these studies generally treat image denoising and artifact correction as two standalone tasks, without cons… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  23. arXiv:2402.17909  [pdf, other

    eess.IV physics.med-ph

    Simulation of Muon Tomography Projections to Image the Pyramids of Giza

    Authors: Mira Liu

    Abstract: Purpose: A geometric simulation of a possible two-plane detector was developed to test the abilities of the detector to generate high-resolution images of the Great Pyramid using muon tomography. Methods and Materials: Trajectory range, angular resolution, and acceptance of the detector were calculated with a simulation. Trajectories and the corresponding sinogram space covered were simulated firs… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 7 pages, 13 figures. Written as a rotation report for the fulfillment of MPHY 41800 Research in Advanced Tomographic Imaging Autumn 2019. Not submitted for peer-reviewed publication elsewhere

  24. arXiv:2402.09445  [pdf, other

    eess.SP cs.AI cs.LG cs.RO

    iMove: Exploring Bio-impedance Sensing for Fitness Activity Recognition

    Authors: Mengxi Liu, Vitor Fortes Rey, Yu Zhang, Lala Shakti Swarup Ray, Bo Zhou, Paul Lukowicz

    Abstract: Automatic and precise fitness activity recognition can be beneficial in aspects from promoting a healthy lifestyle to personalized preventative healthcare. While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive learning.To evaluate our methods, we conducted an experimen… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted by percom2024

  25. arXiv:2402.09434  [pdf, other

    eess.SP cs.LG

    Disentangling Imperfect: A Wavelet-Infused Multilevel Heterogeneous Network for Human Activity Recognition in Flawed Wearable Sensor Data

    Authors: Mengna Liu, Dong Xiang, Xu Cheng, Xiufeng Liu, Dalin Zhang, Shengyong Chen, Christian S. Jensen

    Abstract: The popularity and diffusion of wearable devices provides new opportunities for sensor-based human activity recognition that leverages deep learning-based algorithms. Although impressive advances have been made, two major challenges remain. First, sensor data is often incomplete or noisy due to sensor placement and other issues as well as data transmission failure, calling for imputation of missin… ▽ More

    Submitted 26 January, 2024; originally announced February 2024.

    Comments: 14 pages, 7 figures

  26. arXiv:2402.06875  [pdf, other

    eess.IV cs.CV

    Disentangled Latent Energy-Based Style Translation: An Image-Level Structural MRI Harmonization Framework

    Authors: Mengqi Wu, Lintao Zhang, Pew-Thian Yap, Hongtu Zhu, Mingxia Liu

    Abstract: Brain magnetic resonance imaging (MRI) has been extensively employed across clinical and research fields, but often exhibits sensitivity to site effects arising from non-biological variations such as differences in field strength and scanner vendors. Numerous retrospective MRI harmonization techniques have demonstrated encouraging outcomes in reducing the site effects at the image level. However,… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  27. arXiv:2402.05373  [pdf, other

    eess.IV cs.CV

    Unleashing the Infinity Power of Geometry: A Novel Geometry-Aware Transformer (GOAT) for Whole Slide Histopathology Image Analysis

    Authors: Mingxin Liu, Yunzan Liu, Pengbo Xu, Jiquan Ma

    Abstract: The histopathology analysis is of great significance for the diagnosis and prognosis of cancers, however, it has great challenges due to the enormous heterogeneity of gigapixel whole slide images (WSIs) and the intricate representation of pathological features. However, recent methods have not adequately exploited geometrical representation in WSIs which is significant in disease diagnosis. Theref… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 5 pages, 3 figures. Accepted by 21st IEEE International Symposium on Biomedical Imaging (ISBI 2024)

  28. arXiv:2402.04532  [pdf, other

    eess.SP

    Joint Beamforming Design for Double Active RIS-assisted Radar-Communication Coexistence Systems

    Authors: Mengyu Liu, Hong Ren, Cunhua Pan, Boshi Wang, Zhiyuan Yu, Ruisong Weng, Kangda Zhi, Yongchao He

    Abstract: Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigur… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  29. arXiv:2402.02634  [pdf, other

    cs.CV cs.LG eess.IV

    Key-Graph Transformer for Image Restoration

    Authors: Bin Ren, Yawei Li, **gyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 9 pages, 6 figures

  30. arXiv:2401.08476  [pdf, other

    cs.CR eess.SY

    Incentivizing Secure Software Development: The Role of Liability (Waiver) and Audit

    Authors: Ziyuan Huang, Gergely Biczók, Mingyan Liu

    Abstract: Misaligned incentives in secure software development have long been the focus of research in the economics of security. Product liability, a powerful legal framework in other industries, has been largely ineffective for software products until recent times. However, the rapid regulatory responses to recent global cyberattacks by both the United States and the European Union, together with the (rel… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 21 pages, 6 figures, submitted to the 23rd Workshop on the Economics of Information Security

    ACM Class: K.6.5; I.2.8

  31. arXiv:2401.06000  [pdf, other

    eess.SP cs.CV

    Body-Area Capacitive or Electric Field Sensing for Human Activity Recognition and Human-Computer Interaction: A Comprehensive Survey

    Authors: Sizhen Bian, Mengxi Liu, Bo Zhou, Paul Lukowicz, Michele Magno

    Abstract: Due to the fact that roughly sixty percent of the human body is essentially composed of water, the human body is inherently a conductive object, being able to, firstly, form an inherent electric field from the body to the surroundings and secondly, deform the distribution of an existing electric field near the body. Body-area capacitive sensing, also called body-area electric field sensing, is bec… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  32. arXiv:2401.05426  [pdf, other

    eess.SP cs.AI cs.LG

    CoSS: Co-optimizing Sensor and Sampling Rate for Data-Efficient AI in Human Activity Recognition

    Authors: Mengxi Liu, Zimin Zhao, Daniel Geißler, Bo Zhou, Sungho Suh, Paul Lukowicz

    Abstract: Recent advancements in Artificial Neural Networks have significantly improved human activity recognition using multiple time-series sensors. While employing numerous sensors with high-frequency sampling rates usually improves the results, it often leads to data inefficiency and unnecessary expansion of the ANN, posing a challenge for their practical deployment on edge devices. Addressing these iss… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepeted by the 2nd Workshop on Sustainable AI (AAAI24)

  33. arXiv:2401.04389  [pdf, other

    cs.SD eess.AS

    RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: submitted to ICASSP 2024

  34. arXiv:2401.02673  [pdf, other

    eess.AS cs.AI cs.SD

    A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

    Authors: Dongdi Zhao, Jianbo Ma, Lu Lu, **ke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang

    Abstract: Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  35. arXiv:2312.09760  [pdf, other

    eess.AS cs.SD

    U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

    Authors: Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

    Abstract: Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabu… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ASRU2023

  36. arXiv:2311.18074  [pdf, other

    eess.SY

    Game Projection and Robustness for Game-Theoretic Autonomous Driving

    Authors: Mushuang Liu, H. Eric Tseng, Dimitar Filev, Anouck Girard, Ilya Kolmanovsky

    Abstract: Game-theoretic approaches are envisioned to bring human-like reasoning skills and decision-making processes for autonomous vehicles (AVs). However, challenges including game complexity and incomplete information still remain to be addressed before they can be sufficiently practical for real-world use. Game complexity refers to the difficulties of solving a multi-player game, which include solution… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  37. arXiv:2311.15607  [pdf, other

    eess.IV cs.AI cs.CV

    Spatially Covariant Image Registration with Text Prompts

    Authors: Xiang Chen, Min Liu, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li, Hang Zhang

    Abstract: Medical images are often characterized by their structured anatomical representations and spatially inhomogeneous contrasts. Leveraging anatomical priors in neural networks can greatly enhance their utility in resource-constrained clinical settings. Prior research has harnessed such information for image segmentation, yet progress in deformable image registration has been modest. Our work introduc… ▽ More

    Submitted 5 February, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 13 pages, 8 figures, 6 tables

  38. arXiv:2311.12842  [pdf, other

    eess.IV cs.CV

    Multimodal Identification of Alzheimer's Disease: A Review

    Authors: Guian Fang, Mengsha Liu, Yi Zhong, Zhuolin Zhang, Jiehui Huang, Zhenchao Tang, Calvin Yu-Chian Chen

    Abstract: Alzheimer's disease is a progressive neurological disorder characterized by cognitive impairment and memory loss. With the increasing aging population, the incidence of AD is continuously rising, making early diagnosis and intervention an urgent need. In recent years, a considerable number of teams have applied computer-aided diagnostic techniques to early classification research of AD. Most studi… ▽ More

    Submitted 6 October, 2023; originally announced November 2023.

  39. arXiv:2311.06579  [pdf, other

    cs.RO eess.SY

    Five-Tiered Route Planner for Multi-AUV Accessing Fixed Nodes in Uncertain Ocean Environments

    Authors: Jiaxin Zhang, Meiqin Liu, Senlin Zhang, Ronghao Zheng, Shanling Dong

    Abstract: This article introduces a five-tiered route planner for accessing multiple nodes with multiple autonomous underwater vehicles (AUVs) that enables efficient task completion in stochastic ocean environments. First, the pre-planning tier solves the single-AUV routing problem to find the optimal giant route (GR), estimates the number of required AUVs based on GR segmentation, and allocates nodes for e… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  40. arXiv:2311.04049  [pdf, other

    eess.IV cs.CV

    3D EAGAN: 3D edge-aware attention generative adversarial network for prostate segmentation in transrectal ultrasound images

    Authors: Mengqing Liu, Xiao Shao, Li** Jiang, Kaizhi Wu

    Abstract: Automatic prostate segmentation in TRUS images has always been a challenging problem, since prostates in TRUS images have ambiguous boundaries and inhomogeneous intensity distribution. Although many prostate segmentation methods have been proposed, they still need to be improved due to the lack of sensibility to edge information. Consequently, the objective of this study is to devise a highly effe… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  41. A Review on AI Algorithms for Energy Management in E-Mobility Services

    Authors: Sen Yan, Maqsood Hussain Shah, Ji Li, Noel O'Connor, Mingming Liu

    Abstract: E-mobility, or electric mobility, has emerged as a pivotal solution to address pressing environmental and sustainability concerns in the transportation sector. The depletion of fossil fuels, escalating greenhouse gas emissions, and the imperative to combat climate change underscore the significance of transitioning to electric vehicles (EVs). This paper seeks to explore the potential of artificial… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 8 pages, 4 tables, 1 figure

  42. arXiv:2309.13905  [pdf, other

    eess.AS cs.SD

    AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

    Authors: Jianwei Yu, Hangting Chen, Yanyao Bian, Xiang Li, Yi Luo, **chuan Tian, Mengyang Liu, Jiayi Jiang, Shuai Wang

    Abstract: Recently, the utilization of extensive open-sourced text data has significantly advanced the performance of text-based large language models (LLMs). However, the use of in-the-wild large-scale speech data in the speech technology community remains constrained. One reason for this limitation is that a considerable amount of the publicly available speech data is compromised by background noise, spee… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  43. IBVC: Interpolation-driven B-frame Video Compression

    Authors: Chenming Xu, Meiqin Liu, Chao Yao, Weisi Lin, Yao Zhao

    Abstract: Learned B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. However, previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation or video frame interpolation. They suffer from inaccurate quantized motions and inefficient motion compens… ▽ More

    Submitted 14 March, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: Submitted to Pattern Recognition

  44. arXiv:2309.11845  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification

    Authors: Meng Liu, Ke Liang, Dayu Hu, Hao Yu, Yue Liu, Lingyuan Meng, Wenxuan Tu, Sihang Zhou, Xinwang Liu

    Abstract: Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inheren… ▽ More

    Submitted 26 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: This work has been accepted by ACM MM 2023 for publication

  45. arXiv:2309.11803  [pdf, other

    eess.SP

    A Comprehensive Study of PAPR Reduction Techniques for Deep Joint Source Channel Coding in OFDM Systems

    Authors: Maolin Liu, Wei Chen, Jialong Xu, Bo Ai

    Abstract: Recently, deep joint source channel coding (DJSCC) techniques have been extensively studied and have shown significant performance with limited bandwidth and low signal to noise ratio. Most DJSCC work considers discrete-time analog transmission, while combining it with orthogonal frequency division multiplexing (OFDM) creates serious high peak-to-average power ratio (PAPR) problem. This paper cond… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  46. arXiv:2309.10787  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

    Authors: Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee

    Abstract: Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, and generalization abilities of learned representations are unclear. To this end, we propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual a… ▽ More

    Submitted 19 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024; Evaluation Code: https://github.com/roger-tseng/av-superb Submission Platform: https://av.superbbenchmark.org

  47. arXiv:2309.10311  [pdf, other

    cs.RO eess.SY

    Resource-Efficient Cooperative Online Scalar Field Map** via Distributed Sparse Gaussian Process Regression

    Authors: Tianyi Ding, Ronghao Zheng, Senlin Zhang, Meiqin Liu

    Abstract: Cooperative online scalar field map** is an important task for multi-robot systems. Gaussian process regression is widely used to construct a map that represents spatial information with confidence intervals. However, it is difficult to handle cooperative online map** tasks because of its high computation and communication costs. This letter proposes a resource-efficient cooperative online fie… ▽ More

    Submitted 22 January, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  48. arXiv:2309.03380  [pdf, other

    eess.SY

    Cyber Recovery from Dynamic Load Altering Attacks: Linking Electricity, Transportation, and Cyber Networks

    Authors: Mengxiang Liu, Zhongda Chu, Fei Teng

    Abstract: To address the increasing vulnerability of power grids, significant attention has been focused on the attack detection and impact mitigation. However, it is still unclear how to effectively and quickly recover the cyber and physical networks from a cyberattack. In this context, this paper presents the first investigation of the Cyber Recovery from Dynamic load altering Attack (CRDA). Considering t… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  49. arXiv:2309.02563  [pdf, other

    eess.IV cs.CV

    Evaluation Kidney Layer Segmentation on Whole Slide Imaging using Convolutional Neural Networks and Transformers

    Authors: Muhao Liu, Chenyang Qi, Shunxing Bao, Quan Liu, Ruining Deng, Yu Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: The segmentation of kidney layer structures, including cortex, outer stripe, inner stripe, and inner medulla within human kidney whole slide images (WSI) plays an essential role in automated image analysis in renal pathology. However, the current manual segmentation process proves labor-intensive and infeasible for handling the extensive digital pathology images encountered at a large scale. In re… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  50. arXiv:2309.01944  [pdf, other

    eess.IV

    Duration-adaptive Video Highlight Pre-caching for Vehicular Communication Network

    Authors: Liang Xu, Deshi Li, Kaitao Meng, Mingliu Liu, Shuya Zhu

    Abstract: Video traffic in vehicular communication networks (VCNs) faces exponential growth. However, different segments of most videos reveal various attractiveness for viewers, and the pre-caching decision is greatly affected by the dynamic service duration that edge nodes can provide services for mobile vehicles driving along a road. In this paper, we propose an efficient video highlight pre-caching sche… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.