Skip to main content

Showing 1–50 of 92 results for author: Ma, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  2. arXiv:2406.04737  [pdf, other

    eess.SP eess.SY

    Fast-Fading Channel and Power Optimization of the Magnetic Inductive Cellular Network

    Authors: Honglei Ma, Erwu Liu, Zhijun Fang, Rui Wang, Yongbin Gao, Wenjun Yu, Dongming Zhang

    Abstract: The cellular network of magnetic Induction (MI) communication holds promise in long-distance underground environments. In the traditional MI communication, there is no fast-fading channel since the MI channel is treated as a quasi-static channel. However, for the vehicle (mobile) MI (VMI) communication, the unpredictable antenna vibration brings the remarkable fast-fading. As such fast-fading cann… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE TWC for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  3. arXiv:2405.18731  [pdf, other

    eess.SP cs.AI physics.comp-ph

    VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

    Authors: Ziqing Xing, Zhaoyang Zhang, Zirui Chen, Yusong Wang, Haoran Ma, Zhun Wei, Gang Bao

    Abstract: Recently, studies have shown the potential of integrating field-type iterative methods with deep learning (DL) techniques in solving inverse scattering problems (ISPs). In this article, we propose a novel Variational Born Iterative Network, namely, VBIM-Net, to solve the full-wave ISPs with significantly improved flexibility and inversion quality. The proposed VBIM-Net emulates the alternating upd… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages, 21 figures

  4. arXiv:2405.17702  [pdf

    eess.SY

    A Two-sided Model for EV Market Dynamics and Policy Implications

    Authors: Haoxuan Ma, Brian Yueshuai He, Tomas Kaljevic, Jiaqi Ma

    Abstract: The diffusion of Electric Vehicles (EVs) plays a pivotal role in mitigating greenhouse gas emissions, particularly in the U.S., where ambitious zero-emission and carbon neutrality objectives have been set. In pursuit of these goals, many states have implemented a range of incentive policies aimed at stimulating EV adoption and charging infrastructure development, especially public EV charging stat… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Conference preprint, 8 pages, 3 figures

  5. arXiv:2405.05787  [pdf, other

    cs.RO cs.CV eess.SY

    Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study

    Authors: Tianpeng Zhang, Sekeun Kim, Jerome Charton, Haitong Ma, Kyungsang Kim, Na Li, Quanzheng Li

    Abstract: The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate map** between CT image and robot, and (iii) ta… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  6. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  7. arXiv:2403.14059  [pdf

    eess.SY

    PE-GPT: A Physics-Informed Interactive Large Language Model for Power Converter Modulation Design

    Authors: Fanfan Lin, Junhua Liu, Xinze Li, Shuai Zhao, Bohui Zhao, Hao Ma, Xin Zhang

    Abstract: This paper proposes PE-GPT, a custom-tailored large language model uniquely adapted for power converter modulation design. By harnessing in-context learning and specialized tiered physics-informed neural networks, PE-GPT guides users through text-based dialogues, recommending actionable modulation parameters. The effectiveness of PE-GPT is validated through a practical design case involving dual a… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  8. arXiv:2403.05937  [pdf, other

    cs.CV eess.IV

    Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding

    Authors: Cunhui Dong, Haichuan Ma, Haotian Zhang, Changsheng Gao, Li Li, Dong Liu

    Abstract: Neural network-based image coding has been develo** rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, t… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  9. arXiv:2402.17455  [pdf, ps, other

    eess.AS

    CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

    Authors: Hao Ma, Zhiyuan Peng, Xu Li, Mingjie Shao, Xixin Wu, Ju Liu

    Abstract: Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world recordings. This can be achieved by language-queried target sound extraction (TSE), which typically consists of two components: a query network that converts user queries into conditional embeddings, and a separation network that extracts the target sound accordingly. Existing methods commonly train models f… ▽ More

    Submitted 8 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  10. arXiv:2401.13220  [pdf, other

    eess.IV cs.CV

    Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation

    Authors: Saiyang Na, Yuzhi Guo, Feng Jiang, Hehuan Ma, Junzhou Huang

    Abstract: In the rapidly evolving field of AI research, foundational models like BERT and GPT have significantly advanced language and vision tasks. The advent of pretrain-prompting models such as ChatGPT and Segmentation Anything Model (SAM) has further revolutionized image segmentation. However, their applications in specialized areas, particularly in nuclei segmentation within medical imaging, reveal a k… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  11. arXiv:2401.01496  [pdf, other

    eess.IV cs.AI cs.CV

    From Pixel to Slide image: Polarization Modality-based Pathological Diagnosis Using Representation Learning

    Authors: Jia Dong, Yao Yao, Yang Dong, Hui Ma

    Abstract: Thyroid cancer is the most common endocrine malignancy, and accurately distinguishing between benign and malignant thyroid tumors is crucial for develo** effective treatment plans in clinical practice. Pathologically, thyroid tumors pose diagnostic challenges due to improper specimen sampling. In this study, we have designed a three-stage model using representation learning to integrate pixel-le… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  12. arXiv:2312.16607  [pdf, other

    eess.IV cs.CV stat.ML

    A Polarization and Radiomics Feature Fusion Network for the Classification of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma

    Authors: Jia Dong, Yao Yao, Liyan Lin, Yang Dong, Jiachen Wan, Ran Peng, Chao Li, Hui Ma

    Abstract: Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mu… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  13. arXiv:2312.15863  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

    Authors: Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, Jiang** Yin

    Abstract: Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at t… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024, full paper with oral presentation). Cover our preliminary study: arXiv:2212.14538

  14. arXiv:2312.08079  [pdf, other

    cs.CL cs.SD eess.AS

    Extending Whisper with prompt tuning to target-speaker ASR

    Authors: Hao Ma, Zhiyuan Peng, Mingjie Shao, **g Li, Ju Liu

    Abstract: Target-speaker automatic speech recognition (ASR) aims to transcribe the desired speech of a target speaker from multi-talker overlapped utterances. Most of the existing target-speaker ASR (TS-ASR) methods involve either training from scratch or fully fine-tuning a pre-trained model, leading to significant training costs and becoming inapplicable to large foundation models. This work leverages pro… ▽ More

    Submitted 11 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: ICASSP 2024

  15. arXiv:2310.00413  [pdf, other

    cs.CV cs.LG eess.IV

    SSIF: Learning Continuous Image Representation for Spatial-Spectral Super-Resolution

    Authors: Gengchen Mai, Ni Lao, Weiwei Sun, Yuchi Ma, Jiaming Song, Chenlin Meng, Hongxu Ma, **meng Rao, Ziyuan Li, Stefano Ermon

    Abstract: Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolu… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    MSC Class: 68T07; 68T45 ACM Class: I.4.10; I.2.10; I.4.6

  16. arXiv:2309.02259  [pdf, ps, other

    cs.IT eess.SP

    Design of a New CIM-DCSK-Based Ambient Backscatter Communication System

    Authors: Ruipeng Yang, Yi Fang, **** Chen, Huan Ma

    Abstract: To improve the data rate in differential chaos shift keying (DCSK) based ambient backscatter communication (AmBC) system, we propose a new AmBC system based on code index modulation (CIM), referred to as CIM-DCSK-AmBC system. In the proposed system, the CIM-DCSK signal transmitted in the direct link is used as the radio frequency source of the backscatter link. The signal format in the backscatter… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  17. arXiv:2308.14562  [pdf, other

    cs.RO eess.SY

    Data-Efficient Online Learning of Ball Placement in Robot Table Tennis

    Authors: Philip Tobuschat, Hao Ma, Dieter Büchler, Bernhard Schölkopf, Michael Muehlebach

    Abstract: We present an implementation of an online optimization algorithm for hitting a predefined target when returning **-pong balls with a table tennis robot. The online algorithm optimizes over so-called interception policies, which define the manner in which the robot arm intercepts the ball. In our case, these are composed of the state of the robot arm (position and velocity) at interception time.… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 7 pages, 6 figures, to be published in proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

  18. arXiv:2308.13072  [pdf

    eess.IV cs.CV

    Full-dose Whole-body PET Synthesis from Low-dose PET Using High-efficiency Denoising Diffusion Probabilistic Model: PET Consistency Model

    Authors: Shaoyan Pan, Elham Abouei, Junbo Peng, Joshua Qian, Jacob F Wynne, Tonghe Wang, Chih-Wei Chang, Justin Roper, Jonathon A Nye, Hui Mao, Xiaofeng Yang

    Abstract: Objective: Positron Emission Tomography (PET) has been a commonly used imaging modality in broad clinical applications. One of the most important tradeoffs in PET imaging is between image quality and radiation dose: high image quality comes with high radiation exposure. Improving image quality is desirable for all clinical applications while minimizing radiation exposure is needed to reduce risk t… ▽ More

    Submitted 16 April, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  19. arXiv:2307.07434  [pdf, other

    cs.CV eess.IV

    Combining multitemporal optical and SAR data for LAI imputation with BiLSTM network

    Authors: W. Zhao, F. Yin, H. Ma, Q. Wu, J. Gomez-Dans, P. Lewis

    Abstract: The Leaf Area Index (LAI) is vital for predicting winter wheat yield. Acquisition of crop conditions via Sentinel-2 remote sensing images can be hindered by persistent clouds, affecting yield predictions. Synthetic Aperture Radar (SAR) provides all-weather imagery, and the ratio between its cross- and co-polarized channels (C-band) shows a high correlation with time series LAI over winter wheat re… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  20. arXiv:2306.12085  [pdf, other

    cs.CV eess.IV

    HSR-Diff:Hyperspectral Image Super-Resolution via Conditional Diffusion Models

    Authors: Chanyue Wu, Dong Wang, Hanyu Mao, Ying Li

    Abstract: Despite the proven significance of hyperspectral images (HSIs) in performing various computer vision tasks, its potential is adversely affected by the low-resolution (LR) property in the spatial domain, resulting from multiple physical factors. Inspired by recent advancements in deep generative models, we propose an HSI Super-resolution (SR) approach with Conditional Diffusion Models (HSR-Diff) th… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  21. arXiv:2305.19467  [pdf

    eess.IV cs.CV

    Synthetic CT Generation from MRI using 3D Transformer-based Denoising Diffusion Model

    Authors: Shaoyan Pan, Elham Abouei, Jacob Wynne, Tonghe Wang, Richard L. J. Qiu, Yuheng Li, Chih-Wei Chang, Junbo Peng, Justin Roper, Pretesh Patel, David S. Yu, Hui Mao, Xiaofeng Yang

    Abstract: Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  22. arXiv:2305.15189  [pdf, other

    cs.RO cs.LG eess.SY

    Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts

    Authors: Jan Achterhold, Philip Tobuschat, Hao Ma, Dieter Buechler, Michael Muehlebach, Joerg Stueckler

    Abstract: In this paper, we present a method for table tennis ball trajectory filtering and prediction. Our gray-box approach builds on a physical model. At the same time, we use data to learn parameters of the dynamics model, of an extended Kalman filter, and of a neural model that infers the ball's initial condition. We demonstrate superior prediction performance of our approach over two black-box approac… ▽ More

    Submitted 12 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted for publication at the 5th Annual Conference on Learning for Dynamics and Control (L4DC) 2023 (camera-ready). With supplementary material

  23. arXiv:2305.10000  [pdf, other

    cs.IT eess.SP

    Over-the-Air Federated Learning in MIMO Cloud-RAN Systems

    Authors: Haoming Ma, Xiaojun Yuan, Zhi Ding

    Abstract: To address the limitations of traditional over-the-air federated learning (OA-FL) such as limited server coverage and low resource utilization, we propose an OA-FL in MIMO cloud radio access network (MIMO Cloud-RAN) framework, where edge devices upload (or download) model parameters to the cloud server (CS) through access points (APs). Specifically, in every training round, there are three stages:… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  24. arXiv:2305.05548  [pdf, ps, other

    eess.SP cs.LG

    CIT-EmotionNet: CNN Interactive Transformer Network for EEG Emotion Recognition

    Authors: Wei Lu, Hua Ma, Tien-** Tan

    Abstract: Emotion recognition using Electroencephalogram (EEG) signals has emerged as a significant research challenge in affective computing and intelligent interaction. However, effectively combining global and local features of EEG signals to improve performance in emotion recognition is still a difficult task. In this study, we propose a novel CNN Interactive Transformer Network for EEG Emotion Recognit… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

    Comments: 10 pages,3 tables

  25. arXiv:2305.05433  [pdf, other

    quant-ph cs.LG eess.SY

    Tomography of Quantum States from Structured Measurements via quantum-aware transformer

    Authors: Hailan Ma, Zhenhong Sun, Daoyi Dong, Chunlin Chen, Herschel Rabitz

    Abstract: Quantum state tomography (QST) is the process of reconstructing the state of a quantum system (mathematically described as a density matrix) through a series of different measurements, which can be solved by learning a parameterized function to translate experimentally measured statistics into physical density matrices. However, the specific structure of quantum measurements for characterizing a q… ▽ More

    Submitted 17 November, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  26. arXiv:2305.00385  [pdf

    eess.IV cs.CV

    Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI

    Authors: Yuheng Li, Jacob Wynne, **g Wang, Richard L. J. Qiu, Justin Roper, Shaoyan Pan, Ashesh B. Jani, Tian Liu, Pretesh R. Patel, Hui Mao, Xiaofeng Yang

    Abstract: Biparametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learni… ▽ More

    Submitted 17 March, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

  27. arXiv:2305.00042  [pdf

    eess.IV cs.CV

    Cycle-guided Denoising Diffusion Probability Model for 3D Cross-modality MRI Synthesis

    Authors: Shaoyan Pan, Chih-Wei Chang, Junbo Peng, Jiahan Zhang, Richard L. J. Qiu, Tonghe Wang, Justin Roper, Tian Liu, Hui Mao, Xiaofeng Yang

    Abstract: This study aims to develop a novel Cycle-guided Denoising Diffusion Probability Model (CG-DDPM) for cross-modality MRI synthesis. The CG-DDPM deploys two DDPMs that condition each other to generate synthetic images from two different MRI pulse sequences. The two DDPMs exchange random latent noise in the reverse processes, which helps to regularize both DDPMs and generate matching images in two mod… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  28. arXiv:2304.13471  [pdf, other

    eess.IV cs.CV

    OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution

    Authors: Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Qiufang Ma, Xuhan Sheng, Ming Cheng, Haoyu Ma, Shijie Zhao, Jian Zhang, Junlin Li, Li Zhang

    Abstract: 360° omnidirectional images have gained research attention due to their immersive and interactive experience, particularly in AR/VR applications. However, they suffer from lower angular resolution due to being captured by fisheye lenses with the same sensor size for capturing planar images. To solve the above issues, we propose a two-stage framework for 360° omnidirectional image superresolution.… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPRW 2023

  29. arXiv:2304.11374  [pdf, ps, other

    cs.LG cs.DC cs.NI eess.SY

    Towards Carbon-Neutral Edge Computing: Greening Edge AI by Harnessing Spot and Future Carbon Markets

    Authors: Huirong Ma, Zhi Zhou, Xiaoxi Zhang, Xu Chen

    Abstract: Provisioning dynamic machine learning (ML) inference as a service for artificial intelligence (AI) applications of edge devices faces many challenges, including the trade-off among accuracy loss, carbon emission, and unknown future costs. Besides, many governments are launching carbon emission rights (CER) for operators to reduce carbon emissions further to reverse climate change. Facing these cha… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE Internet of Things Journal, 2023

  30. arXiv:2303.15944  [pdf, other

    cs.LG cs.SD eess.AS

    Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding

    Authors: Haiquan Mao, Feng Hong, Man-wai Mak

    Abstract: Recent studies have shown that pseudo labels can contribute to unsupervised domain adaptation (UDA) for speaker verification. Inspired by the self-training strategies that use an existing classifier to label the unlabeled data for retraining, we propose a cluster-guided UDA framework that labels the target domain data by clustering and combines the labeled source domain data and pseudo-labeled tar… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  31. arXiv:2303.11543  [pdf, other

    eess.SP

    DeepMA: End-to-end Deep Multiple Access for Wireless Image Transmission in Semantic Communication

    Authors: Wenyu Zhang, Kaiyuan Bai, Sherali Zeadally, Haijun Zhang, Hua Shao, Hui Ma, Victor C. M. Leung

    Abstract: Semantic communication is a new paradigm that exploits deep learning models to enable end-to-end communications processes, and recent studies have shown that it can achieve better noise resiliency compared with traditional communication schemes in a low signal-to-noise (SNR) regime. To achieve multiple access in semantic communication, we propose a deep learning-based multiple access (DeepMA) meth… ▽ More

    Submitted 27 June, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

  32. arXiv:2303.08671  [pdf, other

    cs.NI eess.SY

    A Dual-Cluster-Head Based Medium Access Control for Large-Scale UAV Ad-Hoc Networks

    Authors: Xinru Zhao, Zhiqing Wei, Yingying Zou, Hao Ma, Yanpeng Cui, Zhiyong Feng

    Abstract: Unmanned Aerial Vehicle (UAV) ad hoc network has achieved significant growth for its flexibility, extensibility, and high deployability in recent years. The application of clustering scheme for UAV ad hoc network is imperative to enhance the performance of throughput and energy efficiency. In conventional clustering scheme, a single cluster head (CH) is always assigned in each cluster. However, th… ▽ More

    Submitted 26 February, 2023; originally announced March 2023.

    Comments: 10 pages, 12 figures, journal

  33. arXiv:2302.14312  [pdf, other

    quant-ph cs.LG eess.SY

    Auxiliary Task-based Deep Reinforcement Learning for Quantum Control

    Authors: Shumin Zhou, Hailan Ma, Sen Kuang, Daoyi Dong

    Abstract: Due to its property of not requiring prior knowledge of the environment, reinforcement learning has significant potential for quantum control problems. In this work, we investigate the effectiveness of continuous control policies based on deep deterministic policy gradient. To solve the sparse reward signal in quantum learning control problems, we propose an auxiliary task-based deep reinforcement… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: 13 pages, 11 figures

  34. arXiv:2212.07009  [pdf

    physics.optics eess.IV

    Piston sensing for sparse aperture systems via all-optical diffractive neural network

    Authors: Xiafei Ma, Zongliang Xie, Haotong Ma, Ge Ren

    Abstract: It is a crucial issue to realize real-time piston correction in the area of sparse aperture imaging. This paper introduces an optical diffractive neural network-based piston sensing method, which can achieve light-speed sensing. By using detectable intensity to represent pistons, the proposed method is capable of converting complex amplitude distribution of the imaging optical field into piston va… ▽ More

    Submitted 29 June, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: 5 pages, 6 figures

  35. arXiv:2211.13119  [pdf, other

    eess.SP eess.SY

    Performance of Cooperative Detection in Joint Communication-Sensing Vehicular Network: A Data Analytic and Stochastic Geometry Approach

    Authors: Hao Ma, Zhiqing Wei, Zening Li, Fan Ning, Xu Chen, Zhiyong Feng

    Abstract: The increasing complexity of urban environments introduces additional uncertainty to the deployment of the autonomous vehicular network. A novel road infrastructure cooperative detection model using Joint Communication and Sensing (JCS) technology is proposed in this article to simultaneously achieve high-efficient communication and obstacle detection for urban autonomous vehicles. To suppress the… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  36. arXiv:2211.12080  [pdf, other

    cs.SD eess.AS

    Robust Training for Speaker Verification against Noisy Labels

    Authors: Zhihua Fang, Liang He, Hanhan Ma, Xiaochen Guo, Lin Li

    Abstract: The deep learning models used for speaker verification rely heavily on large amounts of data and correct labeling. However, noisy (incorrect) labels often occur, which degrades the performance of the system. In this paper, we propose a novel two-stage learning method to filter out noisy labels from speaker datasets. Since a DNN will first fit data with clean labels, we first train the model with a… ▽ More

    Submitted 25 May, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted by INTERSPEECH 2023

  37. arXiv:2211.06769  [pdf, other

    eess.IV cs.CV

    Realistic Bokeh Effect Rendering on Mobile GPUs, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, ** Zhang, Feng Zhang, Gaocheng Yu, Zhe Ma, Hongbin Wang, Minsu Kwon, Haotian Qian, Wentao Tong, Pan Mu, Zi** Wang, Guang**g Yan, Brian Lee, Lei Fei, Huai** Chen, Hyebin Cho, Byeongjun Kwon, Munchurl Kim, Mingyang Qian, Huixin Ma, Yanan Li, Xiaotao Wang, Lei Lei

    Abstract: As mobile cameras with compact optics are unable to produce a strong bokeh effect, lots of interest is now devoted to deep learning-based solutions for this task. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based bokeh effect rendering approach that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale EBB!… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.03885; text overlap with arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.05256, arXiv:2211.05910

  38. arXiv:2211.06073  [pdf, other

    cs.SD cs.CL eess.AS

    SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

    Authors: Jiangyan Yi, Chenglong Wang, Jianhua Tao, Chu Yuan Zhang, Cunhang Fan, Zhengkun Tian, Haoxin Ma, Ruibo Fu

    Abstract: Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio. These datasets leave out a scenario, in which the acoustic scene of an original audio is manipulated with a forged one. It will pose a major threat to our society i… ▽ More

    Submitted 4 April, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: Accepted by Pattern Recognition, 1 April 2024

  39. arXiv:2210.15122  [pdf

    eess.SP

    Experimental Comparison of SNR and RSSI for LoRa-ESL Based on Machine Clustering and Arithmetic Distribution

    Authors: Malak Abid Ali Khan, Hongbin Ma, Syed Muhammad Aamir, Cekderi Anil Baris

    Abstract: LoRa lacks the sensing capabilities of channel status. Received signal strength indicator (RSSI) decreases due to collision, interference, and near-far effect while for signal-to-noise ratio (SNR), the packets are rejected by decreasing the transmission power (TP) at a higher spreading factor (SF). To overcome these challenges in the case of electric shelf label (ESL) to minimize the dependency on… ▽ More

    Submitted 13 December, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

  40. arXiv:2210.07553  [pdf, other

    cs.RO cs.LG eess.SY

    Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

    Authors: Dongjie Yu, Wenjun Zou, Yujie Yang, Haitong Ma, Shengbo Eben Li, **gliang Duan, Jianyu Chen

    Abstract: Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robustness against the model uncertainties remains an i… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: 12 pages, 6 figures

  41. arXiv:2209.06823  [pdf

    cs.CV eess.IV

    DEANet: Decomposition Enhancement and Adjustment Network for Low-Light Image Enhancement

    Authors: Yonglong Jiang, Liangliang Li, Yuan Xue, Hongbing Ma

    Abstract: Images obtained under low-light conditions will seriously affect the quality of the images. Solving the problem of poor low-light image quality can effectively improve the visual quality of images and better improve the usability of computer vision. In addition, it has very important applications in many fields. This paper proposes a DEANet based on Retinex for low-light image enhancement. It comb… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 8 pages, 7 figures

  42. arXiv:2209.02604  [pdf, other

    cs.MM cs.AI cs.CV cs.SD eess.AS

    Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

    Authors: Yihe Liu, Ziqi Yuan, Huisheng Mao, Zhiyun Liang, Wanqiuyue Yang, Yuanzhe Qiu, Tie Cheng, Xiaoteng Li, Hua Xu, Kai Gao

    Abstract: Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Un… ▽ More

    Submitted 21 August, 2022; originally announced September 2022.

    Comments: 16pages, 7 figures, accepted by ICMI 2022

  43. arXiv:2208.09646  [pdf, other

    cs.SD cs.AI eess.AS

    An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

    Authors: Xinrui Yan, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu

    Abstract: Many effective attempts have been made for fake audio detection. However, they can only provide detection results but no countermeasures to curb this harm. For many related practical applications, what model or algorithm generated the fake audio also is needed. Therefore, We propose a new problem for detecting vocoder fingerprints of fake audio. Experiments are conducted on the datasets synthesize… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

    Comments: Accepted by ACM Multimedia 2022 Workshop: First International Workshop on Deepfake Detection for Audio Multimedia

  44. arXiv:2208.09618  [pdf, other

    cs.SD cs.AI eess.AS

    Fully Automated End-to-End Fake Audio Detection

    Authors: Chenglong Wang, Jiangyan Yi, Jianhua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

    Abstract: The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure. However, artificial adjustment of the parameters can have a relatively obvious influence on the results. It is almost impossible to manually set the best set of parameters. Therefore this paper proposes a fully automated end-toen… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

  45. arXiv:2208.05163  [pdf, other

    cs.CV cs.LG eess.IV

    Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

    Authors: Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

    Abstract: Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, thi… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: Published in FPL2022

  46. arXiv:2207.12308  [pdf, other

    cs.SD eess.AS

    CFAD: A Chinese Dataset for Fake Audio Detection

    Authors: Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Ruibo Fu

    Abstract: Fake audio detection is a growing concern and some relevant datasets have been designed for research. However, there is no standard public Chinese dataset under complex conditions.In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (CFAD) for studying more generalized detection methods. Twelve mainstream speech-generation techniques are used to generate fake… ▽ More

    Submitted 18 July, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: FAD renamed as CFAD

  47. arXiv:2206.07997  [pdf, ps, other

    eess.SP

    Reconfigurable Intelligent Surface-aided $M$-ary FM-DCSK System: a New Design for Noncoherent Chaos-based Communication

    Authors: Huan Ma, Yi Fang, **** Chen, Yonghui Li

    Abstract: In this paper, we propose two reconfigurable intelligent surface-aided $M$-ary frequency-modulated differential chaos shift keying (RIS-$M$-FM-DCSK) schemes. In scheme I, the RIS is regarded as a transmitter at the source to incorporate the $M$-ary phase-shift-keying ($M$-PSK) symbols into the FM chaotic signal and to reflect the resultant $M$-ary FM chaotic signal toward the destination. The info… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  48. Design of a Reconfigurable Intelligent Surface-Assisted FM-DCSK-SWIPT Scheme with Non-linear Energy Harvesting Model

    Authors: Yi Fang, Yiwei Tao, Huan Ma, Yonghui Li, Mohsen Guizani

    Abstract: In this paper, we propose a reconfigurable intelligent surface (RIS)-assisted frequency-modulated (FM) differential chaos shift keying (DCSK) scheme with simultaneous wireless information and power transfer (SWIPT), called RIS-FM-DCSK-SWIPT scheme, for low-power, low-cost, and high-reliability wireless communication networks. In particular, the proposed scheme is developed under a non-linear energ… ▽ More

    Submitted 14 March, 2023; v1 submitted 14 May, 2022; originally announced May 2022.

    Comments: accepted by IEEE Transactions on Communications

  49. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, **gyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, **shan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  50. arXiv:2204.08466  [pdf, other

    eess.IV cs.AI cs.CV physics.med-ph

    Robust PCA Unrolling Network for Super-resolution Vessel Extraction in X-ray Coronary Angiography

    Authors: Binjie Qin, Haohao Mao, Yiming Liu, Jun Zhao, Yisong Lv, Yueqi Zhu, Song Ding, Xu Chen

    Abstract: Although robust PCA has been increasingly adopted to extract vessels from X-ray coronary angiography (XCA) images, challenging problems such as inefficient vessel-sparsity modelling, noisy and dynamic background artefacts, and high computational cost still remain unsolved. Therefore, we propose a novel robust PCA unrolling network with sparse feature selection for super-resolution XCA vessel imagi… ▽ More

    Submitted 23 April, 2022; v1 submitted 16 April, 2022; originally announced April 2022.