Skip to main content

Showing 1–50 of 51 results for author: Peng, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.09222  [pdf, other

    eess.SP

    A Robust Semantic Communication System for Image

    Authors: Xiang Peng, Zhi** Qin, Xiaoming Tao, Jianhua Lu, Khaled B. Letaief

    Abstract: Semantic communications have gained significant attention as a promising approach to address the transmission bottleneck, especially with the continuous development of 6G techniques. Distinct from the well investigated physical channel impairments, this paper focuses on semantic impairments in image, particularly those arising from adversarial perturbations. Specifically, we propose a novel metric… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 6 pages

  2. arXiv:2402.15919  [pdf, other

    cs.CV cs.GR cs.LG eess.IV physics.optics

    Learning to See Through Dazzle

    Authors: Xiaopeng Peng, Erin F. Fleet, Abbie T. Watnik, Grover A. Swartzlander

    Abstract: Machine vision is susceptible to laser dazzle, where intense laser light can blind and distort its perception of the environment through oversaturation or permanent damage to sensor pixels. Here we employ a wavefront-coded phase mask to diffuse the energy of laser light and introduce a sandwich generative adversarial network (SGAN) to restore images from complex image degradations, such as varying… ▽ More

    Submitted 4 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  3. arXiv:2401.16889  [pdf, other

    cs.RO cs.AI eess.SY

    Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

    Authors: Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

    Abstract: This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jum** and standing. Our RL-based controller incorporates a n… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  4. arXiv:2401.15953  [pdf, other

    cs.SD eess.AS

    Masked Audio Modeling with CLAP and Multi-Objective Learning

    Authors: Yifei Xin, Xiulian Peng, Yan Lu

    Abstract: Most existing masked audio modeling (MAM) methods learn audio representations by masking and reconstructing local spectrogram patches. However, the reconstruction loss mainly accounts for the signal-level quality of the reconstructed spectrogram and is still limited in extracting high-level audio semantics. In this paper, we propose to enhance the semantic modeling of MAM by distilling cross-modal… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by Interspeech2023

  5. arXiv:2401.08121  [pdf, other

    cs.LG cs.AI eess.SY

    CycLight: learning traffic signal cooperation with a cycle-level strategy

    Authors: Gengyue Han, Xiaohan Liu, Xianyue Peng, Hao Wang, Yu Han

    Abstract: This study introduces CycLight, a novel cycle-level deep reinforcement learning (RL) approach for network-level adaptive traffic signal control (NATSC) systems. Unlike most traditional RL-based traffic controllers that focus on step-by-step decision making, CycLight adopts a cycle-level strategy, optimizing cycle length and splits simultaneously using Parameterized Deep Q-Networks (PDQN) algorithm… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  6. arXiv:2312.17024  [pdf, other

    cs.DS cs.IT eess.IV eess.SP

    Selective Run-Length Encoding

    Authors: Xutan Peng, Yi Zhang, Dejia Peng, Jiafa Zhu

    Abstract: Run-Length Encoding (RLE) is one of the most fundamental tools in data compression. However, its compression power drops significantly if there lacks consecutive elements in the sequence. In extreme cases, the output of the encoder may require more space than the input (aka size inflation). To alleviate this issue, using combinatorics, we quantify RLE's space savings for a given input distribution… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted at DCC 2024

  7. arXiv:2311.17631  [pdf, other

    eess.SY cs.CR cs.LG

    Q-learning Based Optimal False Data Injection Attack on Probabilistic Boolean Control Networks

    Authors: Xianlun Peng, Yang Tang, Fangfei Li, Yang Liu

    Abstract: In this paper, we present a reinforcement learning (RL) method for solving optimal false data injection attack problems in probabilistic Boolean control networks (PBCNs) where the attacker lacks knowledge of the system model. Specifically, we employ a Q-learning (QL) algorithm to address this problem. We then propose an improved QL algorithm that not only enhances learning efficiency but also obta… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  8. arXiv:2310.10856  [pdf

    eess.SY cs.LG cs.MA

    Joint Optimization of Traffic Signal Control and Vehicle Routing in Signalized Road Networks using Multi-Agent Deep Reinforcement Learning

    Authors: Xianyue Peng, Hang Gao, Gengyue Han, Hao Wang, Michael Zhang

    Abstract: Urban traffic congestion is a critical predicament that plagues modern road networks. To alleviate this issue and enhance traffic efficiency, traffic signal control and vehicle routing have proven to be effective measures. In this paper, we propose a joint optimization approach for traffic signal control and vehicle routing in signalized road networks. The objective is to enhance network performan… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  9. arXiv:2310.08981  [pdf, other

    cs.SD cs.MM eess.AS

    Low-latency Speech Enhancement via Speech Token Generation

    Authors: Huaying Xue, Xiulian Peng, Yan Lu

    Abstract: Existing deep learning based speech enhancement mainly employ a data-driven approach, which leverage large amounts of data with a variety of noise types to achieve noise removal from noisy signal. However, the high dependence on the data limits its generalization on the unseen complex noises in real-life environment. In this paper, we focus on the low-latency scenario and regard speech enhancement… ▽ More

    Submitted 23 January, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: 5 pages, ICASSP2024(accepted)

  10. arXiv:2309.09776  [pdf, other

    eess.IV

    MAD: Meta Adversarial Defense Benchmark

    Authors: X. Peng, D. Zhou, G. Sun, J. Shi, L. Wu

    Abstract: Adversarial training (AT) is a prominent technique employed by deep learning models to defend against adversarial attacks, and to some extent, enhance model robustness. However, there are three main drawbacks of the existing AT-based defense methods: expensive computational cost, low generalization ability, and the dilemma between the original model and the defense model. To this end, we propose a… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 12 pages, 11 figures,IEEE Transactions on Neural Networks and Learning Systems

  11. arXiv:2308.13789  [pdf

    eess.SP

    Sensiverse: A dataset for ISAC study

    Authors: Jia** Luo, Baojian Zhou, Yang Yu, ** Zhang, Xiaohui Peng, Jianglei Ma, Peiying Zhu, Jianmin Lu, Wen Tong

    Abstract: In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research. In this paper, we present the method of generating Sensiverse, including the acquisition and formatting of the 3D scene models, the generation of the channel data and associations with Tx/Rx deployment. The file structure and usage of the… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  12. arXiv:2308.10142  [pdf, ps, other

    eess.IV cs.CV

    Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction

    Authors: Jie Zeng, Zeyu Han, Xingchen Peng, Jianghong Xiao, Peng Wang, Yan Wang

    Abstract: Recently, deep learning (DL) has automated and accelerated the clinical radiation therapy (RT) planning significantly by predicting accurate dose maps. However, most DL-based dose map prediction methods are data-driven and not applicable for cervical cancer where only a small amount of data is available. To address this problem, this paper proposes to transfer the rich knowledge learned from anoth… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted and presented in ISBI 2023. To be published in Proceedings

  13. arXiv:2306.08630  [pdf, other

    eess.IV cs.CV

    High-Dimensional MR Reconstruction Integrating Subspace and Adaptive Generative Models

    Authors: Ruiyang Zhao, Xi Peng, Varun A. Kelkar, Mark A. Anastasio, Fan Lam

    Abstract: We present a novel method that integrates subspace modeling with an adaptive generative image prior for high-dimensional MR image reconstruction. The subspace model imposes an explicit low-dimensional representation of the high-dimensional images, while the generative image prior serves as a spatial constraint on the "contrast-weighted" images or the spatial coefficients of the subspace model. A f… ▽ More

    Submitted 16 June, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  14. ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression

    Authors: Yixin Wan, Yuan Zhou, Xiulian Peng, Kai-Wei Chang, Yan Lu

    Abstract: Noise suppression (NS) models have been widely applied to enhance speech quality. Recently, Deep Learning-Based NS, which we denote as Deep Noise Suppression (DNS), became the mainstream NS method due to its excelling performance over traditional ones. However, DNS models face 2 major challenges for supporting the real-world applications. First, high-performing DNS models are usually large in size… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: This paper was accepted to Interspeech 2023 Main Conference

    Journal ref: Proceedings of INTERSPEECH 2023

  15. arXiv:2304.10780  [pdf, other

    cs.CV eess.IV

    Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction

    Authors: Binbin Huang, Xingyue Peng, Siyuan Shen, Suan Xia, Ruiqian Li, Yanhua Yu, Yuehan Wang, Shenghua Gao, Wenzheng Chen, Shiying Li, **gyi Yu

    Abstract: We introduce Omni-LOS, a neural computational imaging method for conducting holistic shape reconstruction (HSR) of complex objects utilizing a Single-Photon Avalanche Diode (SPAD)-based time-of-flight sensor. As illustrated in Fig. 1, our method enables new capabilities to reconstruct near-$360^\circ$ surrounding geometry of an object from a single scan spot. In such a scenario, traditional line-o… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  16. arXiv:2302.13284  [pdf, other

    cs.SD eess.AS

    Contrast-PLC: Contrastive Learning for Packet Loss Concealment

    Authors: Huaying Xue, Xiulian Peng, Yan Lu

    Abstract: Packet loss concealment (PLC) is challenging in concealing missing contents both plausibly and naturally when there are only limited available context to use. Recently deep-learning based PLC algorithms have demonstrated their superiority over traditional counterparts; but their concealment ability is still mostly limited to a maximum of 120ms loss. Even with strong GAN-based generative models, it… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: 5 pages, ICASSP 2023(Accepted)

  17. arXiv:2302.13063  [pdf, other

    eess.AS cs.LG cs.SD

    Time-Variance Aware Real-Time Speech Enhancement

    Authors: Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu

    Abstract: Time-variant factors often occur in real-world full-duplex communication applications. Some of them are caused by the complex environment such as non-stationary environmental noises and varying acoustic path while some are caused by the communication system such as the dynamic delay between the far-end and near-end signals. Current end-to-end deep neural network (DNN) based methods usually model t… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  18. arXiv:2302.11558  [pdf, other

    cs.SD eess.AS

    Improving Speech Enhancement via Event-based Query

    Authors: Yifei Xin, Xiulian Peng, Yan Lu

    Abstract: Existing deep learning based speech enhancement (SE) methods either use blind end-to-end training or explicitly incorporate speaker embedding or phonetic information into the SE network to enhance speech quality. In this paper, we perceive speech and noises as different types of sound events and propose an event-based query method for SE. Specifically, representative speech embeddings that can dis… ▽ More

    Submitted 24 February, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP2023

  19. arXiv:2302.10657  [pdf, other

    cs.SD cs.MM eess.AS

    DasFormer: Deep Alternating Spectrogram Transformer for Multi/Single-Channel Speech Separation

    Authors: Shuo Wang, Xiangyu Kong, Xiulian Peng, Mahmood Movassagh, Vinod Prakash, Yan Lu

    Abstract: For the task of speech separation, previous study usually treats multi-channel and single-channel scenarios as two research tracks with specialized solutions developed respectively. Instead, we propose a simple and unified architecture - DasFormer (Deep alternating spectrogram transFormer) to handle both of them in the challenging reverberant environments. Unlike frame-wise sequence modeling, each… ▽ More

    Submitted 14 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: 5 pages, accepted by ICASSP2023

  20. arXiv:2302.10377  [pdf, other

    eess.AS cs.SD

    Real-time speech enhancement with dynamic attention span

    Authors: Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu

    Abstract: For real-time speech enhancement (SE) including noise suppression, dereverberation and acoustic echo cancellation, the time-variance of the audio signals becomes a severe challenge. The causality and memory usage limit that only the historical information can be used for the system to capture the time-variant characteristics. We propose to adaptively change the receptive field according to the inp… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: ICASSP 2023 (Accepted)

  21. arXiv:2302.09450  [pdf, other

    cs.RO cs.AI eess.SY

    Robust and Versatile Bipedal Jum** Control through Reinforcement Learning

    Authors: Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

    Abstract: This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world. We present a reinforcement learning framework for training a robot to accomplish a large variety of jum** tasks, such as jum** to different locations and directions. To improve performance on these challenging tasks, we d… ▽ More

    Submitted 31 May, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Accepted in Robotics: Science and Systems 2023 (RSS 2023). The accompanying video is at https://youtu.be/aAPSZ2QFB-E

  22. arXiv:2212.04148  [pdf, other

    cs.CV eess.IV

    Relationship Quantification of Image Degradations

    Authors: Wenxin Wang, Boyun Li, Yuanbiao Gou, Peng Hu, Wangmeng Zuo, Xi Peng

    Abstract: In this paper, we study two challenging but less-touched problems in image restoration, namely, i) how to quantify the relationship between image degradations and ii) how to improve the performance of a specific restoration task using the quantified relationship. To tackle the first challenge, we proposed a Degradation Relationship Index (DRI) which is defined as the mean drop rate difference in t… ▽ More

    Submitted 5 August, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

  23. arXiv:2211.11960  [pdf, other

    cs.SD cs.LG eess.AS

    Disentangled Feature Learning for Real-Time Neural Speech Coding

    Authors: Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu

    Abstract: Recently end-to-end neural audio/speech coding has shown its great potential to outperform traditional signal analysis based audio codecs. This is mostly achieved by following the VQ-VAE paradigm where blind features are learned, vector-quantized and coded. In this paper, instead of blind end-to-end learning, we propose to learn disentangled features for real-time neural speech coding. Specificall… ▽ More

    Submitted 24 February, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: ICASSP 2023 (Accepted)

  24. arXiv:2210.04435  [pdf, other

    cs.RO cs.AI eess.SY

    Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning

    Authors: Xiaoyu Huang, Zhongyu Li, Yanzhen Xiang, Yiming Ni, Yufeng Chi, Yunhao Li, Lizhi Yang, Xue Bin Peng, Koushil Sreenath

    Abstract: We present a reinforcement learning (RL) framework that enables quadrupedal robots to perform soccer goalkee** tasks in the real world. Soccer goalkee** using quadrupeds is a challenging problem, that combines highly dynamic locomotion with precise and fast non-prehensile object (ball) manipulation. The robot needs to react to and intercept a potentially flying ball using dynamic locomotion ma… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: First two authors contributed equally. Accompanying video is at https://youtu.be/iX6OgG67-ZQ

  25. arXiv:2208.01160  [pdf, other

    cs.RO cs.AI eess.SY

    Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills using a Quadrupedal Robot

    Authors: Yandong Ji, Zhongyu Li, Yinan Sun, Xue Bin Peng, Sergey Levine, Glen Berseth, Koushil Sreenath

    Abstract: We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Develo** algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: Accepted to 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

  26. arXiv:2207.08363  [pdf, other

    cs.SD cs.LG eess.AS

    Latent-Domain Predictive Neural Speech Coding

    Authors: Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

    Abstract: Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural audio/speech codecs employ either acoustic features or learned blind features with a convolutional neural network for encoding, by which there are still temporal redundancies within encoded features. This paper introduces latent-domai… ▽ More

    Submitted 25 May, 2023; v1 submitted 17 July, 2022; originally announced July 2022.

    Comments: Accepted by IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING (TASLP)

  27. arXiv:2207.03067  [pdf, other

    cs.SD cs.LG eess.AS

    Cross-Scale Vector Quantization for Scalable Neural Speech Coding

    Authors: Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

    Abstract: Bitrate scalability is a desirable feature for audio coding in real-time communications. Existing neural audio codecs usually enforce a specific bitrate during training, so different models need to be trained for each target bitrate, which increases the memory footprint at the sender and the receiver side and transcoding is often needed to support multiple receivers. In this paper, we introduce a… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: INTERSPEECH 2022(Accepted)

  28. arXiv:2207.01197  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation

    Authors: Xiaoyu Wang, Xiangyu Kong, Xiulian Peng, Yan Lu

    Abstract: In this paper we propose a multi-modal multi-correlation learning framework targeting at the task of audio-visual speech separation. Although previous efforts have been extensively put on combining audio and visual modalities, most of them solely adopt a straightforward concatenation of audio and visual features. To exploit the real useful information behind these two modalities, we define two key… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 5 pages, accepted by interspeech2022

  29. arXiv:2207.00993  [pdf, other

    cs.SD cs.MM eess.AS

    Towards Error-Resilient Neural Speech Coding

    Authors: Huaying Xue, Xiulian Peng, Xue Jiang, Yan Lu

    Abstract: Neural audio coding has shown very promising results recently in the literature to largely outperform traditional codecs but limited attention has been paid on its error resilience. Neural codecs trained considering only source coding tend to be extremely sensitive to channel noises, especially in wireless channels with high error rate. In this paper, we investigate how to elevate the error resili… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: 5 pages, Interspeech 2022(Accepted)

  30. An Indoor Environment Sensing and Localization System via mmWave Phased Array

    Authors: Yifei Sun, Jie Li, Tong Zhang, Rui Wang, Xiaohui Peng, Tony Xiao Han, Haisheng Tan

    Abstract: An indoor layout sensing and localization system in 60GHz millimeter wave (mmWave) band, named mmReality, is elaborated in this paper. The mmReality system consists of one transmitter and one mobile receiver, each with a phased array and a single radio frequency (RF) chain. To reconstruct the room layout, the pilot signal is delivered from the transmitter to the receiver via different pairs of tra… ▽ More

    Submitted 9 January, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: Paper accepted for publication in Journal of Communications and Information Networks, 2022

  31. arXiv:2206.02596  [pdf, other

    eess.SP

    A Robust Deep Learning Enabled Semantic Communication System for Text

    Authors: Xiang Peng, Zhi** Qin, Danlan Huang, Xiaoming Tao, Jianhua Lu, Guangyi Liu, Chengkang Pan

    Abstract: With the advent of the 6G era, the concept of semantic communication has attracted increasing attention. Compared with conventional communication systems, semantic communication systems are not only affected by physical noise existing in the wireless communication environment, e.g., additional white Gaussian noise, but also by semantic noise due to the source and the nature of deep learning-based… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: 6 pages

  32. arXiv:2205.10619  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A Pilot Study of Relating MYCN-Gene Amplification with Neuroblastoma-Patient CT Scans

    Authors: Zihan Zhang, Xiang Xiang, Xuehua Peng, Jianbo Shao

    Abstract: Neuroblastoma is one of the most common cancers in infants, and the initial diagnosis of this disease is difficult. At present, the MYCN gene amplification (MNA) status is detected by invasive pathological examination of tumor samples. This is time-consuming and may have a hidden impact on children. To handle this problem, we adopt multiple machine learning (ML) algorithms to predict the presence… ▽ More

    Submitted 21 May, 2022; originally announced May 2022.

  33. arXiv:2203.04313  [pdf, other

    eess.IV cs.CV

    Multi-Scale Adaptive Network for Single Image Denoising

    Authors: Yuanbiao Gou, Peng Hu, Jiancheng Lv, Joey Tianyi Zhou, Xi Peng

    Abstract: Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing cross-scale complementarity. However, existing architectures treat different scale features equally without considering the scale-specific characteristics, \textit{i.e.}, the within-scale characteristics are ignored in the architecture design. In this paper, we reveal this missing piece for multi-scale arc… ▽ More

    Submitted 29 October, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Journal ref: the Thirty-Sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022)

  34. arXiv:2201.09429  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Neural Speech Coding for Real-Time Communications

    Authors: Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu

    Abstract: Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC). This paper proposes the TFNet, an end-to-end neural speech codec with low latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that has seldom been investigated in audio coding. An interleaved structure is proposed… ▽ More

    Submitted 15 February, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

    Comments: ICASSP 2022 (Accepted)

  35. arXiv:2111.12869  [pdf, other

    cs.SD eess.AS

    Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

    Authors: Wangkai **, Junyu Liu, Jianfeng Ren, Xiangjun Peng

    Abstract: The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlap** events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

    Comments: Under reviewed in ICASSP 2022

  36. arXiv:2107.11650  [pdf, other

    eess.IV eess.SP

    Accelerated MRI Reconstruction with Separable and Enhanced Low-Rank Hankel Regularization

    Authors: Xinlin Zhang, Hengfa Lu, Di Guo, Zongying Lai, Huihui Ye, Xi Peng, Bo Zhao, Xiaobo Qu

    Abstract: The combination of the sparse sampling and the low-rank structured matrix reconstruction has shown promising performance, enabling a significant reduction of the magnetic resonance imaging data acquisition time. However, the low-rank structured approaches demand considerable memory consumption and are time-consuming due to a noticeable number of matrix operations performed on the huge-size block H… ▽ More

    Submitted 24 July, 2021; originally announced July 2021.

    Comments: 17 pages, 17 figures

  37. arXiv:2107.09621  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Integrated Sensing and Communication from Learning Perspective: An SDP3 Approach

    Authors: Guoliang Li, Shuai Wang, Jie Li, Rui Wang, Fan Liu, Xiaohui Peng, Tony Xiao Han, Chengzhong Xu

    Abstract: Characterizing the sensing and communication performance tradeoff in integrated sensing and communication (ISAC) systems is challenging in the applications of learning-based human motion recognition. This is because of the large experimental datasets and the black-box nature of deep neural networks. This paper presents SDP3, a Simulation-Driven Performance Predictor and oPtimizer, which consists o… ▽ More

    Submitted 14 February, 2023; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: 13 pages, 9 figures, 3 tables, submitted to IEEE for possible publication

  38. arXiv:2107.05160  [pdf, other

    cs.CV eess.IV

    Spatial and Temporal Networks for Facial Expression Recognition in the Wild Videos

    Authors: Shuyi Mao, Xinqi Fan, Xiaojiang Peng

    Abstract: The paper describes our proposed methodology for the seven basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2021. In this task, facial expression recognition (FER) methods aim to classify the correct expression category from a diverse background, but there are several challenges. First, to adapt the model to in-the-wild scenarios, we use the knowl… ▽ More

    Submitted 11 July, 2021; originally announced July 2021.

  39. arXiv:2104.10378  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Wireless Sensing With Deep Spectrogram Network and Primitive Based Autoregressive Hybrid Channel Model

    Authors: Guoliang Li, Shuai Wang, Jie Li, Rui Wang, Xiaohui Peng, Tony Xiao Han

    Abstract: Human motion recognition (HMR) based on wireless sensing is a low-cost technique for scene understanding. Current HMR systems adopt support vector machines (SVMs) and convolutional neural networks (CNNs) to classify radar signals. However, whether a deeper learning model could improve the system performance is currently not known. On the other hand, training a machine learning model requires a lar… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

    Comments: 12 pages, 5 pages, submitted to IEEE SPAWC 2021

  40. arXiv:2104.03759  [pdf, other

    eess.AS cs.SD

    Phoneme-based Distribution Regularization for Speech Enhancement

    Authors: Ya**g Liu, Xiulian Peng, Zhiwei Xiong, Yan Lu

    Abstract: Existing speech enhancement methods mainly separate speech from noises at the signal level or in the time-frequency domain. They seldom pay attention to the semantic information of a corrupted signal. In this paper, we aim to bridge this gap by extracting phoneme identities to help speech enhancement. Specifically, we propose a phoneme-based distribution regularization (PbDr) for speech enhancemen… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: ICASSP 2021 (Accepted)

  41. arXiv:2103.14295  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots

    Authors: Zhongyu Li, Xuxin Cheng, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

    Abstract: Develo** robust walking controllers for bipedal robots is a challenging endeavor. Traditional model-based locomotion controllers require simplifying assumptions and careful modelling; any small errors can result in unstable control. To address these challenges for bipedal locomotion, we present a model-free reinforcement learning framework for training robust locomotion policies in simulation, w… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: To appear on 2021 International Conference on Robotics and Automation (ICRA 2021)

  42. arXiv:2012.09408  [pdf, other

    eess.AS cs.SD eess.SP

    Interactive Speech and Noise Modeling for Speech Enhancement

    Authors: Chengyu Zheng, Xiulian Peng, Yuan Zhang, Sriram Srinivasan, Yan Lu

    Abstract: Speech enhancement is challenging because of the diversity of background noise types. Most of the existing methods are focused on modelling the speech rather than the noise. In this paper, we propose a novel idea to model speech and noise simultaneously in a two-branch convolutional neural network, namely SN-Net. In SN-Net, the two branches predict speech and noise, respectively. Instead of inform… ▽ More

    Submitted 14 April, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: AAAI 2021 (Accepted)

  43. arXiv:2009.05236  [pdf, other

    cs.CV cs.LG eess.IV

    An Efficient Quantitative Approach for Optimizing Convolutional Neural Networks

    Authors: Yuke Wang, Boyuan Feng, Xueqiao Peng, Yufei Ding

    Abstract: With the increasing popularity of deep learning, Convolutional Neural Networks (CNNs) have been widely applied in various domains, such as image classification and object detection, and achieve stunning success in terms of their high accuracy over the traditional statistical methods. To exploit the potential of CNN models, a huge amount of research and industry efforts have been devoted to optimiz… ▽ More

    Submitted 15 September, 2021; v1 submitted 11 September, 2020; originally announced September 2020.

  44. arXiv:2006.06278   

    eess.IV cs.CV

    DSU-net: Dense SegU-net for automatic head-and-neck tumor segmentation in MR images

    Authors: Pin Tang, Chen Zu, Mei Hong, Rui Yan, Xingchen Peng, Jianghong Xiao, Xi Wu, Jiliu Zhou, Lu** Zhou, Yan Wang

    Abstract: Precise and accurate segmentation of the most common head-and-neck tumor, nasopharyngeal carcinoma (NPC), in MRI sheds light on treatment and regulatory decisions making. However, the large variations in the lesion size and shape of NPC, boundary ambiguity, as well as the limited available annotated samples conspire NPC segmentation in MRI towards a challenging task. In this paper, we propose a De… ▽ More

    Submitted 19 December, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: This research needs to be advanced in the future

  45. arXiv:2005.11684  [pdf

    eess.SP

    Deep Learning-based Modulation Detection for NOMA Systems

    Authors: Wenwu Xie, Jian Xiao, **xia Yang, Xin Peng, Chao Yu, Peng Zhu

    Abstract: Since the signal with strong power should be demodulated first for successive interference cancellation (SIC) demodulation in non-orthogonal multiple access (NOMA) systems, the base station (BS) should inform the near user terminal (UT), which has allocated higher power, of modulation mode of the far user terminal. To avoid unnecessary signaling overhead in this process, a blind detection algorith… ▽ More

    Submitted 16 October, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

  46. arXiv:2003.09739  [pdf

    eess.SP

    New Security Challenges on Machine Learning Inference Engine: Chip Cloning and Model Reverse Engineering

    Authors: Shanshi Huang, Xiaochen Peng, Hongwu Jiang, Yandong Luo, Shimeng Yu

    Abstract: Machine learning inference engine is of great interest to smart edge computing. Compute-in-memory (CIM) architecture has shown significant improvements in throughput and energy efficiency for hardware acceleration. Emerging non-volatile memory technologies offer great potential for instant on and off by dynamic power gating. Inference engine is typically pre-trained by the cloud and then being dep… ▽ More

    Submitted 21 March, 2020; originally announced March 2020.

  47. arXiv:1912.02549  [pdf, other

    eess.SP cs.NI

    Deep Anomaly Detection in Packet Payload

    Authors: Jiaxin Liu, Xucheng Song, Yingjie Zhou, Xi Peng, Yanru Zhang, Pei Liu, Dapeng Wu

    Abstract: With the widespread adoption of cloud services, especially the extensive deployment of plenty of Web applications, it is important and challenging to detect anomalies from the packet payload. For example, the anomalies in the packet payload can be expressed as a number of specific strings which may cause attacks. Although some approaches have achieved remarkable progress, they are with limited app… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

    Comments: Neurocomputing, 2021

    Journal ref: J. Liu, X. Song, Y. Zhou, X. Peng, Y. Zhang, P. Liu, D. Wu and C. Zhu, Deep Anomaly Detection in Packet Payload, Neurocomputing, 2021

  48. Deformable Non-local Network for Video Super-Resolution

    Authors: Hua Wang, Dewei Su, Chuangchuang Liu, Longcun **, Xianfang Sun, Xinyi Peng

    Abstract: The video super-resolution (VSR) task aims to restore a high-resolution (HR) video frame by using its corresponding low-resolution (LR) frame and multiple neighboring frames. At present, many deep learning-based VSR methods rely on optical flow to perform frame alignment. The final recovery results will be greatly affected by the accuracy of optical flow. However, optical flow estimation cannot be… ▽ More

    Submitted 21 December, 2019; v1 submitted 23 September, 2019; originally announced September 2019.

    Journal ref: IEEE Access, vol. 7, pp. 177734-177744, 2019

  49. arXiv:1902.11015  [pdf, other

    eess.SY cs.MA cs.RO

    Mobile Formation Coordination and Tracking Control for Multiple Non-holonomic Vehicles

    Authors: Xiuhui Peng, Zhiyong Sun, Kexin Guo, Zhiyong Geng

    Abstract: This paper addresses forward motion control for trajectory tracking and mobile formation coordination for a group of non-holonomic vehicles on SE(2). Firstly, by constructing an intermediate attitude variable which involves vehicles' position information and desired attitude, the translational and rotational control inputs are designed in two stages to solve the trajectory tracking problem. Second… ▽ More

    Submitted 28 February, 2019; originally announced February 2019.

  50. arXiv:1810.01248  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    A Lightweight Music Texture Transfer System

    Authors: Xutan Peng, Chen Li, Zhi Cai, Faqiang Shi, Yidan Liu, Jianxin Li

    Abstract: Deep learning researches on the transformation problems for image and text have raised great attention. However, present methods for music feature transfer using neural networks are far from practical application. In this paper, we initiate a novel system for transferring the texture of music, and release it as an open source project. Its core algorithm is composed of a converter which represents… ▽ More

    Submitted 4 August, 2021; v1 submitted 27 September, 2018; originally announced October 2018.

    Comments: This version (v3) is identical with v1; v2 should no longer be cited in the literature due to incorrect author list