Skip to main content

Showing 1–50 of 53 results for author: Yan, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.07498  [pdf, other

    cs.SD eess.AS

    RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's perfor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  2. arXiv:2405.20336  [pdf, other

    cs.CV cs.SD eess.AS

    RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

    Authors: Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan

    Abstract: In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address these two modalities in isolation. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rap** vocals, lyrics, and high-quality 3D holistic bo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project website: https://vis-www.cs.umass.edu/RapVerse

  3. Misaka: Interactive Swarm Testbed for Smart Grid Distributed Algorithm Test and Evaluation

    Authors: Tingliang Zhang, Haiwang Zhong, Zhenfei Tan, Xinfei Yan

    Abstract: In this paper, we present Misaka, a visualized swarm testbed for smart grid algorithm evaluation, also an extendable open-source open-hardware platform for develo** tabletop tangible swarm interfaces. The platform consists of a collection of custom-designed 3 omni-directional wheels robots each 10 cm in diameter, high accuracy localization through a microdot pattern overlaid on top of the activi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Journal ref: 2020 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia)

  4. arXiv:2404.16476  [pdf, ps, other

    eess.SP

    A Novel Channel Coding Scheme for Digital Multiple Access Computing

    Authors: Xiao**g Yan, Saeed Razavikia, Carlo Fischione

    Abstract: In this paper, we consider the ChannelComp framework, which facilitates the computation of desired functions by multiple transmitters over a common receiver using digital modulations across a multiple access channel. While ChannelComp currently offers a broad framework for computation by designing digital constellations for over-the-air computation and employing symbol-level encoding, encoding the… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: accepted version to the IEEE 2024 ICC conference

  5. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  6. arXiv:2404.09226  [pdf, other

    eess.IV cs.CV cs.LG

    Breast Cancer Image Classification Method Based on Deep Transfer Learning

    Authors: Weimin Wang, Min Gao, Mingxuan Xiao, Xu Yan, Yufeng Li

    Abstract: To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attenti… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  7. arXiv:2404.08713  [pdf, other

    eess.IV cs.LG q-bio.QM

    Survival Prediction Across Diverse Cancer Types Using Neural Networks

    Authors: Xu Yan, Weimin Wang, MingXuan Xiao, Yufeng Li, Min Gao

    Abstract: Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  8. arXiv:2404.08279  [pdf, other

    eess.IV cs.CV cs.LG

    Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example

    Authors: MingXuan Xiao, Yufeng Li, Xu Yan, Min Gao, Weimin Wang

    Abstract: Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  9. arXiv:2404.05217  [pdf, other

    eess.SY

    Network-Constrained Unit Commitment with Flexible Temporal Resolution

    Authors: Zekuan Yu, Haiwang Zhong, Guangchun Ruan, Xinfei Yan

    Abstract: Modern network-constrained unit commitment (NCUC) bears a heavy computational burden due to the ever-growing model scale. This situation becomes more challenging when detailed operational characteristics, complicated constraints, and multiple objectives are considered. We propose a novel simplification method to determine the flexible temporal resolution for acceleration and near-optimal solutions… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 11 pages, 10 figures. Accepted by IEEE Transactions on Power Systems

  10. arXiv:2404.05149  [pdf, other

    cs.ET eess.SP

    Intelligent Reflecting Surface Aided Target Localization With Unknown Transceiver-IRS Channel State Information

    Authors: Taotao Ji, Meng Hua, Xuanhong Yan, Chunguo Li, Yongming Huang, Luxi Yang

    Abstract: Integrating wireless sensing capabilities into base stations (BSs) has become a widespread trend in the future beyond fifth-generation (B5G)/sixth-generation (6G) wireless networks. In this paper, we investigate intelligent reflecting surface (IRS) enabled wireless localization, in which an IRS is deployed to assist a BS in locating a target in its non-line-of-sight (NLoS) region. In particular, w… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  11. arXiv:2402.15738  [pdf, other

    cs.CR eess.SY

    Privacy-Preserving State Estimation in the Presence of Eavesdroppers: A Survey

    Authors: Xinhao Yan, Guanzhong Zhou, Daniel E. Quevedo, Carlos Murguia, Bo Chen, Hailong Huang

    Abstract: Networked systems are increasingly the target of cyberattacks that exploit vulnerabilities within digital communications, embedded hardware, and software. Arguably, the simplest class of attacks -- and often the first type before launching destructive integrity attacks -- are eavesdrop** attacks, which aim to infer information by collecting system data and exploiting it for malicious purposes. A… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 16 pages, 5 figures, 4 tables

  12. arXiv:2402.09372  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

    Authors: Jiancheng Yang, Rui Shi, Liang **, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

    Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

  13. arXiv:2402.09101  [pdf, other

    eess.IV cs.CV

    DestripeCycleGAN: Stripe Simulation CycleGAN for Unsupervised Infrared Image Destri**

    Authors: Shiqi Yang, Hanlin Qin, Shuai Yuan, Xiang Yan, Hossein Rahmani

    Abstract: CycleGAN has been proven to be an advanced approach for unsupervised image restoration. This framework consists of two generators: a denoising one for inference and an auxiliary one for modeling noise to fulfill cycle-consistency constraints. However, when applied to the infrared destri** task, it becomes challenging for the vanilla auxiliary generator to consistently produce vertical noise unde… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  14. arXiv:2402.00028  [pdf, other

    cs.GR cs.CV eess.IV

    Neural Rendering and Its Hardware Acceleration: A Review

    Authors: Xinkai Yan, Jieting Xu, Yuchi Huo, Hujun Bao

    Abstract: Neural rendering is a new image and video generation method based on deep learning. It combines the deep learning model with the physical knowledge of computer graphics, to obtain a controllable and realistic scene model, and realize the control of scene attributes such as lighting, camera parameters, posture and so on. On the one hand, neural rendering can not only make full use of the advantages… ▽ More

    Submitted 6 January, 2024; originally announced February 2024.

  15. arXiv:2401.04389  [pdf, other

    cs.SD eess.AS

    RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: submitted to ICASSP 2024

  16. arXiv:2401.03697  [pdf, other

    cs.SD eess.AS

    An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

    Authors: Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie

    Abstract: This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. Specifically, our approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-en… ▽ More

    Submitted 6 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  17. arXiv:2312.13523  [pdf

    physics.med-ph eess.IV

    High-resolution myelin-water fraction and quantitative relaxation map** using 3D ViSTa-MR fingerprinting

    Authors: Congyu Liao, Xiaozhi Cao, Siddharth Srinivasan Iyer, Sophie Schauman, Zihan Zhou, Xiaoqian Yan, Quan Chen, Zhitao Li, Nan Wang, Ting Gong, Zhe Wu, Hongjian He, Jianhui Zhong, Yang Yang, Adam Kerr, Kalanit Grill-Spector, Kawin Setsompop

    Abstract: Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous map** of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time. Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MR… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 38 pages, 12 figures and 1 table

    Journal ref: Magnetic Resonance in Medicine 2023

  18. arXiv:2311.04383  [pdf, other

    cs.RO eess.SY

    Active Collision Avoidance System for E-Scooters in Pedestrian Environment

    Authors: Xuke Yan, Dan Shen

    Abstract: In the dense fabric of urban areas, electric scooters have rapidly become a preferred mode of transportation. As they cater to modern mobility demands, they present significant safety challenges, especially when interacting with pedestrians. In general, e-scooters are suggested to be ridden in bike lanes/sidewalks or share the road with cars at the maximum speed of about 15-20 mph, which is more f… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Submitted to SAE 2024

  19. arXiv:2310.04715  [pdf, other

    eess.AS cs.SD

    An Exploration of Task-decoupling on Two-stage Neural Post Filter for Real-time Personalized Acoustic Echo Cancellation

    Authors: Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie

    Abstract: Deep learning based techniques have been popularly adopted in acoustic echo cancellation (AEC). Utilization of speaker representation has extended the frontier of AEC, thus attracting many researchers' interest in personalized acoustic echo cancellation (PAEC). Meanwhile, task-decoupling strategies are widely adopted in speech enhancement. To further explore the task-decoupling approach, we propos… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: accepted to ASRU 2023

  20. arXiv:2309.06780  [pdf, other

    cs.SD eess.AS

    Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

    Authors: Chu Yuan Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Xinrui Yan

    Abstract: Recent strides in neural speech synthesis technologies, while enjoying widespread applications, have nonetheless introduced a series of challenges, spurring interest in the defence against the threat of misuse and abuse. Notably, source attribution of synthesized speech has value in forensics and intellectual property protection, but prior work in this area has certain limitations in scope. To add… ▽ More

    Submitted 15 June, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by CCL 2024

  21. arXiv:2308.02776  [pdf, other

    cs.CV eess.IV

    Dual Degradation-Inspired Deep Unfolding Network for Low-Light Image Enhancement

    Authors: Huake Wang, Xingsong Hou, Xiaoyang Yan

    Abstract: Although low-light image enhancement has achieved great stride based on deep enhancement models, most of them mainly stress on enhancement performance via an elaborated black-box network and rarely explore the physical significance of enhancement models. Towards this issue, we propose a Dual degrAdation-inSpired deep Unfolding network, termed DASUNet, for low-light image enhancement. Specifically,… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 12 pages, 13 figures

  22. arXiv:2307.09728  [pdf, other

    cs.CV eess.IV

    Uncertainty-Driven Multi-Scale Feature Fusion Network for Real-time Image Deraining

    Authors: Ming Tong, Xuefeng Yan, Yongzhen Wang

    Abstract: Visual-based measurement systems are frequently affected by rainy weather due to the degradation caused by rain streaks in captured images, and existing imaging devices struggle to address this issue in real-time. While most efforts leverage deep networks for image deraining and have made progress, their large parameter sizes hinder deployment on resource-constrained devices. Additionally, these d… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  23. arXiv:2305.13774  [pdf, other

    cs.SD eess.AS

    ADD 2023: the Second Audio Deepfake Detection Challenge

    Authors: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li

    Abstract: Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on s… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  24. arXiv:2304.04106  [pdf, other

    eess.IV cs.CV

    MedGen3D: A Deep Generative Framework for Paired 3D Image and Mask Generation

    Authors: Kun Han, Yifeng Xiong, Chenyu You, Pooya Khosravi, Shanlin Sun, Xiangyi Yan, James Duncan, Xiaohui Xie

    Abstract: Acquiring and annotating sufficient labeled data is crucial in develo** accurate and robust learning-based models, but obtaining such data can be challenging in many medical image segmentation tasks. One promising solution is to synthesize realistic data with ground-truth mask annotations. However, no prior studies have explored generating complete 3D volumetric images with masks. In this paper,… ▽ More

    Submitted 4 July, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: Accepted by MICCAI 2023. Project Page: https://krishan999.github.io/MedGen3D/

  25. arXiv:2303.06811  [pdf, other

    eess.AS

    The NPU-Elevoc Personalized Speech Enhancement System for ICASSP2023 DNS Challenge

    Authors: Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie

    Abstract: This paper describes our NPU-Elevoc personalized speech enhancement system (NAPSE) for the 5th Deep Noise Suppression Challenge at ICASSP 2023. Based on the superior two-stage model TEA-PSE 2.0, our system particularly explores better strategy for speaker embedding fusion, optimizes the model training pipeline, and leverages adversarial training and multi-scale loss. According to the results, our… ▽ More

    Submitted 15 March, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

  26. arXiv:2301.01887  [pdf, other

    eess.SP cs.HC

    A Novel Exploitative and Explorative GWO-SVM Algorithm for Smart Emotion Recognition

    Authors: Xucun Yan, Zihuai Lin, Zhiyun Lin, Branka Vucetic

    Abstract: Emotion recognition or detection is broadly utilized in patient-doctor interactions for diseases such as schizophrenia and autism and the most typical techniques are speech detection and facial recognition. However, features extracted from these behavior-based emotion recognitions are not reliable since humans can disguise their emotions. Recording voices or tracking facial expressions for a long… ▽ More

    Submitted 4 January, 2023; originally announced January 2023.

  27. arXiv:2212.12661  [pdf, other

    eess.SY

    Transmission Congestion Management with Generalized Generation Shift Distribution Factors

    Authors: Shutong Pu, Guangchun Ruan, Xinfei Yan, Haiwang Zhong

    Abstract: A major concern in modern power systems is that the popularity and fluctuating characteristics of renewable energy may cause more and more transmission congestion events. Traditional congestion management modeling involves AC or DC power flow equations, while the former equation always accompanies great amount of computation, and the latter cannot consider voltage amplitude and reactive power. The… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

    Comments: 5 pages, 4 figures. Accepted by conference: ICPES 2022

  28. arXiv:2210.06973  [pdf, other

    eess.SP

    Contrastive Psudo-supervised Classification for Intra-Pulse Modulation of Radar Emitter Signals Using data augmentation

    Authors: HanCong Feng, XinHai Yan, KaiLi Jiang, XinYu Zhao, Bin Tang

    Abstract: The automatic classification of radar waveform is a fundamental technique in electronic countermeasures (ECM).Recent supervised deep learning-based methods have achieved great success in a such classification task.However, those methods require enough labeled samples to work properly and in many circumstances, it is not available.To tackle this problem, in this paper, we propose a three-stages dee… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  29. arXiv:2209.13915  [pdf, ps, other

    eess.SP

    Joint Optimization of Resource Allocation and Trajectory Control for Mobile Group Users in Fixed-Wing UAV-Enabled Wireless Network

    Authors: Xuezhen Yan, Xuming Fang, Cailian Deng, Xianbin Wang

    Abstract: Owing to the controlling flexibility and cost-effectiveness, fixed-wing unmanned aerial vehicles (UAVs) are expected to serve as flying base stations (BSs) in the air-ground integrated network. By exploiting the mobility of UAVs, controllable coverage can be provided for mobile group users (MGUs) under challenging scenarios or even somewhere without communication infrastructure. However, in such d… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: 30 pages, 9 figures

  30. arXiv:2208.10489  [pdf, other

    cs.SD cs.AI eess.AS

    System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

    Authors: Xinrui Yan, Jiangyan Yi, Chenglong Wang, Jianhua Tao, Junzuo Zhou, Hao Gu, Ruibo Fu

    Abstract: The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious content manipulation. Therefore, many studies have emerged to detect the so-called deepfake audio. However, existing works focus on the binary detection of real audio and fake audio. In real-world scenarios such as model copyright protection and digital evidence forensics, it is needed to… ▽ More

    Submitted 15 September, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

    Comments: 13 pages, 4 figures. Submit to IEEE Transactions on Audio, Speech and Language Processing (TASLP). arXiv admin note: text overlap with arXiv:2208.09646

  31. arXiv:2208.09646  [pdf, other

    cs.SD cs.AI eess.AS

    An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

    Authors: Xinrui Yan, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu

    Abstract: Many effective attempts have been made for fake audio detection. However, they can only provide detection results but no countermeasures to curb this harm. For many related practical applications, what model or algorithm generated the fake audio also is needed. Therefore, We propose a new problem for detecting vocoder fingerprints of fake audio. Experiments are conducted on the datasets synthesize… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

    Comments: Accepted by ACM Multimedia 2022 Workshop: First International Workshop on Deepfake Detection for Audio Multimedia

  32. arXiv:2207.12308  [pdf, other

    cs.SD eess.AS

    CFAD: A Chinese Dataset for Fake Audio Detection

    Authors: Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Ruibo Fu

    Abstract: Fake audio detection is a growing concern and some relevant datasets have been designed for research. However, there is no standard public Chinese dataset under complex conditions.In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (CFAD) for studying more generalized detection methods. Twelve mainstream speech-generation techniques are used to generate fake… ▽ More

    Submitted 18 July, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: FAD renamed as CFAD

  33. arXiv:2203.12787  [pdf, other

    eess.SP

    Design of an Internet of Things System for Smart Hospitals

    Authors: Jichao Leng, Xucun Yan, Zihuai Lin

    Abstract: With the fast advancement of smart devices and Internet of Things (IoT) technologies, certain established situations are opening up new avenues of exploration. Particularly in the sphere of healthcare, the diverse and big population, the complicated and professional data, and the stringent environmental requirements for certain medical scenes and equipment all impose exceptionally high standards o… ▽ More

    Submitted 6 April, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

  34. arXiv:2202.08433  [pdf, ps, other

    cs.SD cs.LG eess.AS

    ADD 2022: the First Audio Deep Synthesis Detection Challenge

    Authors: Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

    Abstract: Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake gam… ▽ More

    Submitted 26 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  35. arXiv:2202.06284  [pdf, ps, other

    eess.SP

    Significant Low-dimensional Spectral-temporal Features for Seizure Detection

    Authors: Xucun Yan, Dong** Yang, Zihuai Lin, Branka Vucetic

    Abstract: Seizure onset detection in electroencephalography (EEG) signals is a challenging task due to the non-stereotyped seizure activities as well as their stochastic and non-stationary characteristics in nature. Joint spectral-temporal features are believed to contain sufficient and powerful feature information for absence seizure detection. However, the resulting high-dimensional features involve redun… ▽ More

    Submitted 13 February, 2022; originally announced February 2022.

  36. arXiv:2201.10083  [pdf, other

    eess.SP

    A Wearable ECG Monitor for Deep Learning Based Real-Time Cardiovascular Disease Detection

    Authors: Peng Wang, Zihuai Lin, Xucun Yan, Zijiao Chen, Ming Ding, Yang Song, Lu Meng

    Abstract: Cardiovascular disease has become one of the most significant threats endangering human life and health. Recently, Electrocardiogram (ECG) monitoring has been transformed into remote cardiac monitoring by Holter surveillance. However, the widely used Holter can bring a great deal of discomfort and inconvenience to the individuals who carry them. We developed a new wireless ECG patch in this work a… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  37. arXiv:2110.10403  [pdf, other

    eess.IV cs.CV cs.LG

    AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

    Authors: Xiangyi Yan, Hao Tang, Shanlin Sun, Haoyu Ma, Deying Kong, Xiaohui Xie

    Abstract: Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the U-Net model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transform… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

  38. arXiv:2108.01522  [pdf, other

    eess.IV

    CSMCNet: Scalable Video Compressive Sensing Reconstruction with Interpretable Motion Estimation

    Authors: Bowen Huang, Xiao Yan, **jia Zhou, Yibo Fan

    Abstract: Most deep network methods for compressive sensing reconstruction suffer from the black-box characteristic of DNN. In this paper, a deep neural network with interpretable motion estimation named CSMCNet is proposed. The network is able to realize high-quality reconstruction of video compressive sensing by unfolding the iterative steps of optimization based algorithms. A DNN based, multi-hypothesis… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: 12 pages, 10 pages, 5 tables

  39. arXiv:2103.01661  [pdf, other

    eess.AS cs.CL cs.SD

    Incorporating VAD into ASR System by Multi-task Learning

    Authors: Meng Li, Xia Yan, Feng Lin

    Abstract: When we use End-to-end automatic speech recognition (E2E-ASR) system for real-world applications, a voice activity detection (VAD) system is usually needed to improve the performance and to reduce the computational cost by discarding non-speech parts in the audio. Usually ASR and VAD systems are trained and utilized independently to each other. In this paper, we present a novel multi-task learning… ▽ More

    Submitted 30 September, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: 5 pages, 2 figures

  40. arXiv:2101.02828  [pdf, other

    eess.SY cs.RO

    Distributionally Consistent Simulation of Naturalistic Driving Environment for Autonomous Vehicle Testing

    Authors: Xintao Yan, Shuo Feng, Haowei Sun, Henry X. Liu

    Abstract: Microscopic traffic simulation provides a controllable, repeatable, and efficient testing environment for autonomous vehicles (AVs). To evaluate AVs' safety performance unbiasedly, the probability distributions of environment statistics in the simulated naturalistic driving environment (NDE) need to be consistent with those from the real-world driving environment. However, although human driving b… ▽ More

    Submitted 1 July, 2022; v1 submitted 7 January, 2021; originally announced January 2021.

    Comments: 13 pages, 11 figures

  41. arXiv:2010.03780  [pdf, other

    eess.IV

    CS-MCNet:A Video Compressive Sensing Reconstruction Network with Interpretable Motion Compensation

    Authors: Bowen Huang, **jia Zhou, Xiao Yan, Ming'e **g, Rentao Wan, Yibo Fan

    Abstract: In this paper, a deep neural network with interpretable motion compensation called CS-MCNet is proposed to realize high-quality and real-time decoding of video compressive sensing. Firstly, explicit multi-hypothesis motion compensation is applied in our network to extract correlation information of adjacent frames(as shown in Fig. 1), which improves the recover performance. And then, a residual mo… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: 15pages, ACCV2020 accepted paper

  42. arXiv:2007.06151  [pdf, other

    eess.IV cs.CV

    MS-NAS: Multi-Scale Neural Architecture Search for Medical Image Segmentation

    Authors: Xingang Yan, Weiwen Jiang, Yiyu Shi, Cheng Zhuo

    Abstract: The recent breakthroughs of Neural Architecture Search (NAS) have motivated various applications in medical image segmentation. However, most existing work either simply rely on hyper-parameter tuning or stick to a fixed network backbone, thereby limiting the underlying search space to identify more efficient architecture. This paper presents a Multi-Scale NAS (MS-NAS) framework that is featured w… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

  43. An Iterative Graph Spectral Subtraction Method for Speech Enhancement

    Authors: Xue Yan, Zhen Yang, Tingting Wang, Haiyan Guo

    Abstract: In this paper, we investigate the application of graph signal processing (GSP) theory in speech enhancement. We first propose a set of shift operators to construct graph speech signals, and then analyze their spectrum in the graph Fourier domain. By leveraging the differences between the spectrum of graph speech and graph noise signals, we further propose the graph spectral subtraction (GSS) metho… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Journal ref: SPECOM_SPECOM_2020_15

  44. arXiv:2005.11626  [pdf, other

    cs.CV cs.LG eess.IV

    ShapeAdv: Generating Shape-Aware Adversarial 3D Point Clouds

    Authors: Kibok Lee, Zhuoyuan Chen, Xinchen Yan, Raquel Urtasun, Ersin Yumer

    Abstract: We introduce ShapeAdv, a novel framework to study shape-aware adversarial perturbations that reflect the underlying shape variations (e.g., geometric deformations and structural differences) in the 3D point cloud space. We develop shape-aware adversarial 3D point cloud attacks by leveraging the learned latent space of a point cloud auto-encoder where the adversarial noise is applied in the latent… ▽ More

    Submitted 23 May, 2020; originally announced May 2020.

    Comments: 3D Point Clouds, Adversarial Learning

  45. arXiv:2003.08525   

    eess.SP cs.CV

    Extremal Region Analysis based Deep Learning Framework for Detecting Defects

    Authors: Zelin Deng, Xiaolong Yan, Shengjun Zhang, Colleen P. Bailey

    Abstract: A maximally stable extreme region (MSER) analysis based convolutional neural network (CNN) for unified defect detection framework is proposed in this paper. Our proposed framework utilizes the generality and stability of MSER to generate the desired defect candidates. Then a specific trained binary CNN classifier is adopted over the defect candidates to produce the final defect set. Defect dataset… ▽ More

    Submitted 22 May, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: Unsatisfied with results

  46. arXiv:2002.04705  [pdf, other

    eess.IV

    Smart Cameras

    Authors: David J. Brady, Minghao Hu, Chengyu Wang, Xuefei Yan, Lu Fang, Yiwnheng Zhu, Yang Tan, Ming Cheng, Zhan Ma

    Abstract: We review camera architecture in the age of artificial intelligence. Modern cameras use physical components and software to capture, compress and display image data. Over the past 5 years, deep learning solutions have become superior to traditional algorithms for each of these functions. Deep learning enables 10-100x reduction in electrical sensor power per pixel, 10x improvement in depth of field… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

  47. arXiv:2002.00529  [pdf, ps, other

    eess.SP

    Genetic Algorithm Optimized Support Vector Machine in NOMA-Based Satellite Networks with Imperfect CSI

    Authors: Xiaojuan Yan, Kang An, Cheng-Xiang Wang, Wei-** Zhu, Yusheng Li, Zhiqiang Feng

    Abstract: With the help of a power-domain non-orthogonal multiple access (NOMA) scheme, satellite networks can simultaneously serve multiple users within limited time/spectrum resource block. However, the existence of channel estimation errors inevitably degrade the judgment on users' channel state information (CSI) accuracy, thus affecting the user pairing processing and suppressing the superiority of the… ▽ More

    Submitted 2 February, 2020; originally announced February 2020.

  48. arXiv:2001.08869  [pdf, other

    cs.CV cs.LG eess.IV

    Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation

    Authors: Yifei Chen, Haoyu Ma, Deying Kong, Xiangyi Yan, Jianbao Wu, Wei Fan, Xiaohui Xie

    Abstract: Hand pose estimation is more challenging than body pose estimation due to severe articulation, self-occlusion and high dexterity of the hand. Current approaches often rely on a popular body pose algorithm, such as the Convolutional Pose Machine (CPM), to learn 2D keypoint features. These algorithms cannot adequately address the unique challenges of hand pose estimation, because they are trained so… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

    Comments: The paper has be accepted and will be presented at 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). The code is freely available at https://github.com/HowieMa/NSRMhand

  49. arXiv:1911.08030  [pdf, other

    cs.LG cs.NE eess.SP

    Driver Identification Based on Vehicle Telematics Data using LSTM-Recurrent Neural Network

    Authors: Abenezer Girma, Xuyang Yan, Abdollah Homaifar

    Abstract: Despite advancements in vehicle security systems, over the last decade, auto-theft rates have increased, and cyber-security attacks on internet-connected and autonomous vehicles are becoming a new threat. In this paper, a deep learning model is proposed, which can identify drivers from their driving behaviors based on vehicle telematics data. The proposed Long-Short-Term-Memory (LSTM) model predic… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

    Comments: IEEE ICTAI 2019

  50. arXiv:1908.10903  [pdf, other

    eess.IV

    Compressive Sampling for Array Cameras

    Authors: Xuefei Yan, David J. Brady, Jianqiang Wang, Chao Huang, Zian Li, Songsong Yan, Di Liu, Zhan Ma

    Abstract: While design of high performance lenses and image sensors has long been the focus of camera development, the size, weight and power of image data processing components is currently the primary barrier to radical improvements in camera resolution. Here we show that Deep-Learning- Aided Compressive Sampling (DLACS) can reduce operating power on camera-head electronics by 20x. Traditional compressive… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.