Skip to main content

Showing 1–50 of 73 results for author: Du, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00297  [pdf

    eess.IV cs.CV

    UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

    Authors: Guanghao Zhu, Lin Liu, **g Zhang, Xiaohui Du, Ruqian Hao, Juanxiu Liu

    Abstract: Facial nerve segmentation is crucial for preoperative path planning in cochlear implantation surgery. Recently, researchers have proposed some segmentation methods, such as atlas-based and deep learning-based methods. However, since the facial nerve is a tubular organ with a diameter of only 1.0-1.5mm, it is challenging to locate and segment the facial nerve in CT scans. In this work, we propose a… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  2. arXiv:2406.19649  [pdf

    eess.IV cs.CV

    AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

    Authors: Guanghao Zhu, **g Zhang, Juanxiu Liu, Xiaohui Du, Ruqian Hao, Yong Liu, Lin Liu

    Abstract: Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling. However, many SSL approaches only pay attention to low-level consistency and overlook the significance of pseudo-label reliability. Therefore, in this work, we propose an adversarial self-training consistency framework (AstMatch). First… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  4. arXiv:2406.04354  [pdf, other

    eess.AS

    QiandaoEar22: A high quality noise dataset for identifying specific ship from multiple underwater acoustic targets using ship-radiated noise

    Authors: Xiaoyang Du, Feng Hong

    Abstract: Target identification of ship-radiated noise is a crucial area in underwater target recognition. However, there is currently a lack of multi-target ship datasets that accurately represent real-world underwater acoustic conditions. To tackle this issue, we conducted experimental data acquisition, resulting in the release of QiandaoEar22 \textemdash a comprehensive underwater acoustic multi-target d… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  5. arXiv:2406.04353  [pdf, other

    eess.AS cs.SD

    Introducing the Brand New QiandaoEar22 Dataset for Specific Ship Identification Using Ship-Radiated Noise

    Authors: Xiaoyang Du, Feng Hong

    Abstract: Target identification of ship-radiated noise is a crucial area in underwater target recognition. However, there is currently a lack of multi-target ship datasets that accurately represent real-world underwater acoustic conditions. To ntackle this issue, we release QiandaoEar22 \textemdash an underwater acoustic multi-target dataset, which can be download on https://ieee-dataport.org/documents/qian… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  6. arXiv:2405.03254  [pdf

    eess.AS

    Automatic Assessment of Dysarthria Using Audio-visual Vowel Graph Attention Network

    Authors: Xiaokang Liu, Xiaoxia Du, Juan Liu, Rongfeng Su, Manwa Lawrence Ng, Yumei Zhang, Yudong Yang, Shaofeng Zhao, Lan Wang, Nan Yan

    Abstract: Automatic assessment of dysarthria remains a highly challenging task due to high variability in acoustic signals and the limited data. Currently, research on the automatic assessment of dysarthria primarily focuses on two approaches: one that utilizes expert features combined with machine learning, and the other that employs data-driven deep learning methods to extract representations. Research ha… ▽ More

    Submitted 6 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures, 7 tables

  7. arXiv:2404.13598  [pdf, other

    cs.NI eess.SP

    An Integrated Communication and Computing Scheme for Wi-Fi Networks based on Generative AI and Reinforcement Learning

    Authors: Xinyang Du, Xuming Fang

    Abstract: The continuous evolution of future mobile communication systems is heading towards the integration of communication and computing, with Mobile Edge Computing (MEC) emerging as a crucial means of implementing Artificial Intelligence (AI) computation. MEC could enhance the computational performance of wireless edge networks by offloading computing-intensive tasks to MEC servers. However, in edge com… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper has been submitted to GlobeCom 2024 and is currently under review

  8. arXiv:2404.10605  [pdf, other

    cs.IT eess.SY

    UAV Trajectory Optimization for Sensing Exploiting Target Location Distribution Map

    Authors: Xiangming Du, Shuowen Zhang, Liang Liu

    Abstract: In this paper, we study the trajectory optimization of a cellular-connected unmanned aerial vehicle (UAV) which aims to sense the location of a target while maintaining satisfactory communication quality with the ground base stations (GBSs). In contrast to most existing works which assumed the target's location is known, we focus on a more challenging scenario where the exact location of the targe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: to appear in IEEE Vehicular Technology Conference (VTC) Spring, 2024

  9. arXiv:2404.06393  [pdf, other

    cs.SD cs.AI eess.AS

    MuPT: A Generative Symbolic Music Pretrained Transformer

    Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

    Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  10. arXiv:2403.15446  [pdf

    eess.SP cs.RO

    Shape Sensing for Continuum Robotics using Optoelectronic Sensors with Convex Reflectors

    Authors: Dalia Osman, Xinli Du, Timothy Minton, Yohan Noh

    Abstract: Three-dimensional shape sensing in soft and continuum robotics is a crucial aspect for stable actuation and control in fields such as Minimally Invasive surgery, as the estimation of complex curvatures while using continuum robotic tools is required to manipulate through fragile paths. This challenge has been addressed using a range of different sensing techniques, for example, Fibre Bragg grating… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  11. arXiv:2402.17785  [pdf, other

    cs.SD cs.AI eess.AS

    ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

    Authors: Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu

    Abstract: Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Eval… ▽ More

    Submitted 6 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  12. arXiv:2401.17133  [pdf, other

    cs.SD cs.AI cs.CR cs.LG cs.MM eess.AS

    A Proactive and Dual Prevention Mechanism against Illegal Song Covers empowered by Singing Voice Conversion

    Authors: Guangke Chen, Yedi Zhang, Fu Song, Ting Wang, Xiaoning Du, Yang Liu

    Abstract: Singing voice conversion (SVC) automates song covers by converting one singer's singing voice into another target singer's singing voice with the original lyrics and melody. However, it raises serious concerns about copyright and civil right infringements to multiple entities. This work proposes SongBsAb, the first proactive approach to mitigate unauthorized SVC-based illegal song covers. SongBsAb… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  13. arXiv:2401.08136  [pdf, other

    eess.SY

    Bias-Compensated State of Charge and State of Health Joint Estimation for Lithium Iron Phosphate Batteries

    Authors: Baozhao Yi, Xinhao Du, Jiawei Zhang, Xiaogang Wu, Qiuhao Hu, Weiran Jiang, Xiaosong Hu, Ziyou Song

    Abstract: Accurate estimation of the state of charge (SOC) and state of health (SOH) is crucial for the safe and reliable operation of batteries. Voltage measurement bias highly affects state estimation accuracy, especially in Lithium Iron Phosphate (LFP) batteries, which are susceptible due to their flat open-circuit voltage (OCV) curves. This work introduces a bias-compensated algorithm to reliably estima… ▽ More

    Submitted 12 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 9 pages and 8 figures

  14. arXiv:2401.02662  [pdf, other

    cs.NI eess.SP

    GainNet: Coordinates the Odd Couple of Generative AI and 6G Networks

    Authors: Ning Chen, Jie Yang, Zhipeng Cheng, Xuwei Fan, Zhang Liu, Bangzhen Huang, Yifeng Zhao, Lianfen Huang, Xiaojiang Du, Mohsen Guizani

    Abstract: The rapid expansion of AI-generated content (AIGC) reflects the iteration from assistive AI towards generative AI (GAI) with creativity. Meanwhile, the 6G networks will also evolve from the Internet-of-everything to the Internet-of-intelligence with hybrid heterogeneous network architectures. In the future, the interplay between GAI and the 6G will lead to new opportunities, where GAI can learn th… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 10 pages, 5 figures, 1 table

  15. arXiv:2312.16014  [pdf, other

    cs.CV eess.IV

    Passive Non-Line-of-Sight Imaging with Light Transport Modulation

    Authors: Jiarui Zhang, Ruixu Geng, Xiaolong Du, Yan Chen, Houqiang Li, Yang Hu

    Abstract: Passive non-line-of-sight (NLOS) imaging has witnessed rapid development in recent years, due to its ability to image objects that are out of sight. The light transport condition plays an important role in this task since changing the conditions will lead to different imaging models. Existing learning-based NLOS methods usually train independent models for different light transport conditions, whi… ▽ More

    Submitted 26 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  16. arXiv:2311.04942  [pdf, other

    eess.IV cs.CV

    CSAM: A 2.5D Cross-Slice Attention Module for Anisotropic Volumetric Medical Image Segmentation

    Authors: Alex Ling Yu Hung, Haoxin Zheng, Kai Zhao, Xiaoxi Du, Kaifeng Pang, Qi Miao, Steven S. Raman, Demetri Terzopoulos, Kyunghyun Sung

    Abstract: A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D meth… ▽ More

    Submitted 26 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

  17. arXiv:2311.03815  [pdf, other

    cs.NI eess.SP

    Integrated Sensing, Communication, and Computing for Cost-effective Multimodal Federated Perception

    Authors: Ning Chen, Zhipeng Cheng, Xuwei Fan, Bangzhen Huang, Yifeng Zhao, Lianfen Huang, Xiaojiang Du, Mohsen Guizani

    Abstract: Federated learning (FL) is a classic paradigm of 6G edge intelligence (EI), which alleviates privacy leaks and high communication pressure caused by traditional centralized data processing in the artificial intelligence of things (AIoT). The implementation of multimodal federated perception (MFP) services involves three sub-processes, including sensing-based multimodal data generation, communicati… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  18. arXiv:2310.13882  [pdf

    eess.SP

    NMR Spectra Denoising with Vandermonde Constraints

    Authors: Di Guo, Runmin Xu, **yu Wu, Mei** Lin, Xiaofeng Du, Xiaobo Qu

    Abstract: Nuclear magnetic resonance (NMR) spectroscopy serves as an important tool to analyze chemicals and proteins in bioengineering. However, NMR signals are easily contaminated by noise during the data acquisition, which can affect subsequent quantitative analysis. Therefore, denoising NMR signals has been a long-time concern. In this work, we propose an optimization model-based iterative denoising met… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 10 pages, 9 figures

  19. arXiv:2310.10159  [pdf, other

    cs.SD cs.CL eess.AS

    Joint Music and Language Attention Models for Zero-shot Music Tagging

    Authors: Xingjian Du, Zhesong Yu, Jiaju Lin, Bilei Zhu, Qiuqiang Kong

    Abstract: Music tagging is a task to predict the tags of music recordings. However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags. In this work, we propose a zero-shot music tagging system modeled by a joint music and language attention (JMLA) model to address the open-set music tagging problem. The JMLA model consists of an audio… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: \begin{keywords} Music tagging, joint music and language attention models, Music Foundation Model. \end{keywords}

  20. arXiv:2308.12770  [pdf, other

    cs.SD cs.CL eess.AS

    WavMark: Watermarking for Audio Generation

    Authors: Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, Furu Wei

    Abstract: Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism. Alongside its potential benefits, this powerful technology introduces notable risks, including voice fraud and speaker impersonation. Unlike the conventional approach of solely relying on passive methods for detecting synthetic… ▽ More

    Submitted 7 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  21. arXiv:2307.08556  [pdf, other

    stat.ML cs.LG eess.IV

    Machine-Learning-based Colorectal Tissue Classification via Acoustic Resolution Photoacoustic Microscopy

    Authors: Shangqing Tong, Peng Ge, Yanan Jiao, Zhaofu Ma, Ziye Li, Longhai Liu, Feng Gao, Xiaohui Du, Fei Gao

    Abstract: Colorectal cancer is a deadly disease that has become increasingly prevalent in recent years. Early detection is crucial for saving lives, but traditional diagnostic methods such as colonoscopy and biopsy have limitations. Colonoscopy cannot provide detailed information within the tissues affected by cancer, while biopsy involves tissue removal, which can be painful and invasive. In order to impro… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  22. arXiv:2306.01120  [pdf, other

    eess.SY

    Frequency-dependent Switching Control for Disturbance Attenuation of Linear Systems

    Authors: **g**g Zhang, Jan Heiland, Peter Benner, Xin Du

    Abstract: The generalized Kalman-Yakubovich-Popov lemma as established by Iwasaki and Hara in 2005 marks a milestone in the analysis and synthesis of linear systems from a finite-frequency perspective. Given a pre-specified frequency band, it allows us to produce passive controllers with excellent in-band disturbance attenuation performance at the expense of some of the out-of-band performance. This paper f… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  23. arXiv:2305.07447  [pdf, other

    cs.SD eess.AS

    Universal Source Separation with Weakly Labelled Data

    Authors: Qiuqiang Kong, Ke Chen, Haohe Liu, Xingjian Du, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Mark D. Plumbley

    Abstract: Universal source separation (USS) is a fundamental research task for computational auditory scene analysis, which aims to separate mono recordings into individual source tracks. There are three potential challenges awaiting the solution to the audio source separation task. First, previous audio source separation systems mainly focus on separating one or a limited number of specific sources. There… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  24. arXiv:2305.07220  [pdf, other

    eess.SP

    Physical-layer Adversarial Robustness for Deep Learning-based Semantic Communications

    Authors: Guoshun Nan, Zhichun Li, **li Zhai, Qimei Cui, Gong Chen, Xin Du, Xuefei Zhang, Xiaofeng Tao, Zhu Han, Tony Q. S. Quek

    Abstract: End-to-end semantic communications (ESC) rely on deep neural networks (DNN) to boost communication efficiency by only transmitting the semantics of data, showing great potential for high-demand mobile applications. We argue that central to the success of ESC is the robust interpretation of conveyed semantics at the receiver side, especially for security-critical applications such as automatic driv… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: 17 pages, 28 figures, accepted by IEEE jsac

  25. arXiv:2303.11692  [pdf, other

    cs.SD cs.IR eess.AS

    ByteCover3: Accurate Cover Song Identification on Short Queries

    Authors: Xingjian Du, Zijie Wang, Xia Liang, Huidong Liang, Bilei Zhu, Zejun Ma

    Abstract: Deep learning based methods have become a paradigm for cover song identification (CSI) in recent years, where the ByteCover systems have achieved state-of-the-art results on all the mainstream datasets of CSI. However, with the burgeon of short videos, many real-world applications require matching short music excerpts to full-length music tracks in the database, which is still under-explored and w… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepeted by ICASSP 2023

  26. arXiv:2303.02657  [pdf, ps, other

    cs.LG cs.AI cs.IT cs.NI eess.SP

    Sparsity-Aware Intelligent Massive Random Access Control in Open RAN: A Reinforcement Learning Based Approach

    Authors: Xiao Tang, Sicong Liu, Xiaojiang Du, Mohsen Guizani

    Abstract: Massive random access of devices in the emerging Open Radio Access Network (O-RAN) brings great challenge to the access control and management. Exploiting the bursting nature of the access requests, sparse active user detection (SAUD) is an efficient enabler towards efficient access management, but the sparsity might be deteriorated in case of uncoordinated massive access requests. To dynamically… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

    Comments: This paper has been submitted to IEEE Journal on Selected Areas in Communications

  27. arXiv:2212.09930  [pdf, other

    eess.SY

    Frequency-limited H$_2$ Model Order Reduction Based on Relative Error

    Authors: Umair Zulfiqar, Xin Du, Qiuyan Song, Zhi-Hua Xiao, Victor Sreeram

    Abstract: Frequency-limited model order reduction aims to approximate a high-order model with a reduced-order model that maintains high fidelity within a specific frequency range. Beyond this range, a decrease in accuracy is acceptable due to the nature of the problem. The quality of the reduced-order model is typically evaluated using absolute or relative measures of approximation error. Relative error, wh… ▽ More

    Submitted 24 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: text overlap with arXiv:2212.08247

  28. arXiv:2212.08247  [pdf, other

    eess.SY

    Relative Error-based Time-limited H2 Model Order Reduction via Oblique Projection

    Authors: Umair Zulfiqar, Xin Du, Qiuyan Song, Zhi-Hua Xiao, Victor Sreeram

    Abstract: In time-limited model order reduction, a reduced-order approximation of the original high-order model is obtained that accurately approximates the original model within the desired limited time interval. Accuracy outside that time interval is not that important. The error incurred when a reduced-order model is used as a surrogate for the original model can be quantified in absolute or relative ter… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

  29. arXiv:2209.11455  [pdf, other

    eess.IV cs.CV

    Modular Degradation Simulation and Restoration for Under-Display Camera

    Authors: Yang Zhou, Yuda Song, Xin Du

    Abstract: Under-display camera (UDC) provides an elegant solution for full-screen smartphones. However, UDC captured images suffer from severe degradation since sensors lie under the display. Although this issue can be tackled by image restoration networks, these networks require large-scale image pairs for training. To this end, we propose a modular network dubbed MPGNet trained using the generative advers… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

  30. arXiv:2205.05939  [pdf, other

    eess.SY

    NLOS Error Mitigation Using Weighted Least Squares and Kalman Filter in UWB Positioning

    Authors: Ruixin Fan, Xin Du

    Abstract: In wireless positioning systems, non-line-of-sight (NLOS) is a challenging problem. NLOS causes great ranging bias and location error, so NLOS mitigation is essential for high accuracy positioning. In this letter, we propose the Weighted-Least-Squares Robust Kalman Filter (WLS-RKF) for NLOS identification and mitigation. WLS-RKF employs a hypothesis test based on Mahalanobis distance for NLOS iden… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: 6 pages, 5 figures

  31. arXiv:2205.05036  [pdf, other

    cs.NI eess.SP

    Multi-agent Reinforcement Learning for Dynamic Resource Management in 6G in-X Subnetworks

    Authors: Xiao Du, Ting Wang, Qiang Feng, Chenhui Ye, Tao Tao, Yuanming Shi, Mingsong Chen

    Abstract: The 6G network enables a subnetwork-wide evolution, resulting in a "network of subnetworks". However, due to the dynamic mobility of wireless subnetworks, the data transmission of intra-subnetwork and inter-subnetwork will inevitably interfere with each other, which poses a great challenge to radio resource management. Moreover, most of the existing approaches require the instantaneous channel gai… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

  32. arXiv:2204.03356  [pdf, other

    eess.SY

    Alternating Direction Based Sequential Boolean Quadratic Programming Method for Transmit Antenna Selection

    Authors: Shijie Zhu, Xu Du

    Abstract: The wireless mobile communication system is updated and iterated on the whole almost every decade. It is now in the development period of the application scenarios of the fifth generation mobile communication system (5G). Unfortunately, 5G relies on plenty of small base stations with a large number of antennas that consume a lot of energy. In this paper, a novel Boolean variable quadratic programm… ▽ More

    Submitted 20 September, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  33. arXiv:2203.14011  [pdf, other

    eess.SY

    Approximations for Optimal Experimental Design in Power System Parameter Estimation

    Authors: Xu Du, Alexander Engelmann, Timm Faulwasser, Boris Houska

    Abstract: This paper is about computationally tractable methods for power system parameter estimation and Optimal Experiment Design (OED). Here, the main motivation is that OED has the potential to significantly increase the accuracy of power system parameter estimates, for example, if only a few batches of data are available. The problem is, however, that solving the exact OED problem for larger power grid… ▽ More

    Submitted 16 September, 2022; v1 submitted 26 March, 2022; originally announced March 2022.

  34. arXiv:2202.00874  [pdf, other

    cs.SD cs.AI cs.IR cs.LG eess.AS

    HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

    Authors: Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Audio classification is an important task of map** audio samples into their corresponding labels. Recently, the transformer model with self-attention mechanisms has been adopted in this field. However, existing audio transformers require large GPU memories and long training time, meanwhile relying on pretrained vision models to achieve high performance, which limits the model's scalability in au… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Comments: Preprint version for ICASSP 2022, Singapore

  35. arXiv:2112.07891  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

    Authors: Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a univ… ▽ More

    Submitted 12 February, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Preprint version for Association for the Advancement of Artificial Intelligence Conference, AAAI 2022

  36. arXiv:2110.10755  [pdf, other

    eess.IV cs.CV

    Toward Real-world Image Super-resolution via Hardware-based Adaptive Degradation Models

    Authors: Rui Ma, Johnathan Czernik, Xian Du

    Abstract: Most single image super-resolution (SR) methods are developed on synthetic low-resolution (LR) and high-resolution (HR) image pairs, which are simulated by a predetermined degradation operation, e.g., bicubic downsampling. However, these methods only learn the inverse process of the predetermined operation, so they fail to super resolve the real-world LR images; the true formulation deviates from… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

  37. arXiv:2109.01696  [pdf, other

    cs.CV cs.LG eess.IV

    Revisiting 3D ResNets for Video Recognition

    Authors: Xianzhi Du, Yeqing Li, Yin Cui, Rui Qian, **g Li, Irwan Bello

    Abstract: A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition. This short note studies effective training and scaling strategies for video recognition models. We propose a simple scaling strategy for 3D ResNets, in combination with improved training strategies and minor architectural changes. The resulting models, termed… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: 6 pages

  38. arXiv:2107.06185  [pdf

    eess.SY

    A new method for vehicle system safety design based on data mining with uncertainty modeling

    Authors: ** Du, Binhui Jiang, Feng Zhu

    Abstract: In this research, a new data mining-based design approach has been developed for designing complex mechanical systems such as a crashworthy passenger car with uncertainty modeling. The method allows exploring the big crash simulation dataset to design the vehicle at multi-levels in a top-down manner (main energy absorbing system, components, and geometric features) and derive design rules based on… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: 38 pages, 21 figures, 6 tables

  39. arXiv:2106.11411  [pdf, other

    cs.SD eess.AS

    Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

    Authors: Yuanbo Hou, Zhesong Yu, Xia Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Dick Botteldooren

    Abstract: Many previous audio-visual voice-related works focus on speech, ignoring the singing voice in the growing number of musical video streams on the Internet. For processing diverse musical video data, voice activity detection is a necessary step. This paper attempts to detect the speech and singing voices of target performers in musical video streams using audiovisual information. To integrate inform… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted by INTERSPEECH 2021

  40. arXiv:2106.11277  [pdf

    cs.LG cs.CV cs.RO eess.IV

    Attention-based Neural Network for Driving Environment Complexity Perception

    Authors: Ce Zhang, Azim Eskandarian, Xuelai Du

    Abstract: Environment perception is crucial for autonomous vehicle (AV) safety. Most existing AV perception algorithms have not studied the surrounding environment complexity and failed to include the environment complexity parameter. This paper proposes a novel attention-based neural network model to predict the complexity level of the surrounding driving environment. The proposed model takes naturalistic… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted by 2021 IEEE Intelligent Transportation Systems Conference

  41. arXiv:2102.09971  [pdf, other

    cs.SD eess.AS

    Speech enhancement with weakly labelled data from AudioSet

    Authors: Qiuqiang Kong, Haohe Liu, Xingjian Du, Li Chen, Rui Xia, Yuxuan Wang

    Abstract: Speech enhancement is a task to improve the intelligibility and perceptual quality of degraded speech signal. Recently, neural networks based methods have been applied to speech enhancement. However, many neural network based methods require noisy and clean speech pairs for training. We propose a speech enhancement framework that can be trained with large-scale weakly labelled AudioSet dataset. We… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 5 pages

  42. arXiv:2102.09966  [pdf, ps, other

    cs.SD eess.AS

    CatNet: music source separation system with mix-audio augmentation

    Authors: Xuchen Song, Qiuqiang Kong, Xingjian Du, Yuxuan Wang

    Abstract: Music source separation (MSS) is the task of separating a music piece into individual sources, such as vocals and accompaniment. Recently, neural network based methods have been applied to address the MSS problem, and can be categorized into spectrogram and time-domain based methods. However, there is a lack of research of using complementary information of spectrogram and time-domain inputs for M… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 5 pages

  43. arXiv:2102.03603  [pdf, other

    eess.SY

    On frequency- and time-limited H2-optimal model order reduction

    Authors: Umair Zulfiqar, Victor Sreeram, Xin Du

    Abstract: In this paper, the problems of frequency-limited and time-limited H2-optimal model order reduction of linear time-invariant systems are considered within the oblique projection framework. It is shown that it is inherently not possible to satisfy all the necessary conditions for the local minimizer in the oblique projection framework. The conditions for exact satisfaction of the optimality conditio… ▽ More

    Submitted 13 September, 2021; v1 submitted 6 February, 2021; originally announced February 2021.

  44. arXiv:2101.06745  [pdf, other

    eess.SY

    Frequency-weighted H2-optimal model order reduction via oblique projection

    Authors: Umair Zulfiqar, Victor Sreeram, Mian Ilyas Ahmad, Xin Du

    Abstract: In projection-based model order reduction, a reduced-order approximation of the original full-order system is obtained by projecting it onto a reduced subspace that contains its dominant characteristics. The problem of frequency-weighted H2-optimal model order reduction is to construct a local optimum in terms of the H2-norm of the weighted error transfer function. In this paper, a projection-base… ▽ More

    Submitted 2 May, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

  45. arXiv:2011.11020  [pdf, other

    eess.IV cs.CV cs.LG q-bio.BM

    Cryo-ZSSR: multiple-image super-resolution based on deep internal learning

    Authors: Qinwen Huang, Ye Zhou, Xiaochen Du, Reed Chen, Jianyou Wang, Cynthia Rudin, Alberto Bartesaghi

    Abstract: Single-particle cryo-electron microscopy (cryo-EM) is an emerging imaging modality capable of visualizing proteins and macro-molecular complexes at near-atomic resolution. The low electron-doses used to prevent sample radiation damage, result in images where the power of the noise is 100 times greater than the power of the signal. To overcome the low-SNRs, hundreds of thousands of particle project… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

    Comments: 11 pages, 4 figures

  46. arXiv:2011.03988  [pdf, other

    eess.SY

    Online power system parameter estimation and optimal operation

    Authors: Xu Du, Alexander Engelmann, Timm Faulwasser, Boris Houska

    Abstract: The integration of renewables into electrical grids calls for optimization-based control schemes requiring reliable grid models. Classically, parameter estimation and optimization-based control is often decoupled, which leads to high system operation cost in the estimation procedure. The present work proposes a method for simultaneously minimizing grid operation cost and optimally estimating line… ▽ More

    Submitted 18 March, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

  47. arXiv:2010.14022  [pdf, other

    cs.SD cs.LG eess.AS

    ByteCover: Cover Song Identification via Multi-Loss Training

    Authors: Xingjian Du, Zhesong Yu, Bilei Zhu, Xiaoou Chen, Zejun Ma

    Abstract: We present in this paper ByteCover, which is a new feature learning method for cover song identification (CSI). ByteCover is built based on the classical ResNet model, and two major improvements are designed to further enhance the capability of the model for CSI. In the first improvement, we introduce the integration of instance normalization (IN) and batch normalization (BN) to build IBN blocks,… ▽ More

    Submitted 23 April, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

  48. arXiv:2010.13540  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Contrastive Unsupervised Learning for Audio Fingerprinting

    Authors: Zhesong Yu, Xingjian Du, Bilei Zhu, Zejun Ma

    Abstract: The rise of video-sharing platforms has attracted more and more people to shoot videos and upload them to the Internet. These videos mostly contain a carefully-edited background audio track, where serious speech change, pitch shifting and various types of audio effects may involve, and existing audio identification systems may fail to recognize the audio. To solve this problem, in this paper, we i… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: 5 pages

  49. arXiv:2009.08798  [pdf, other

    eess.SP cs.HC cs.LG stat.AP

    Designing Compact Features for Remote Stroke Rehabilitation Monitoring using Wearable Accelerometers

    Authors: Xi Chen, Yu Guan, Jian Qing Shi, Xiu-Li Du, Janet Eyre

    Abstract: Stroke is known as a major global health problem, and for stroke survivors it is key to monitor the recovery levels. However, traditional stroke rehabilitation assessment methods (such as the popular clinical assessment) can be subjective and expensive, and it is also less convenient for patients to visit clinics in a high frequency. To address this issue, in this work based on wearable sensing an… ▽ More

    Submitted 24 December, 2022; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: 32 pages, accepted for publication in CCF Transactions on Pervasive Computing and Interaction

  50. arXiv:2005.09848  [pdf

    eess.SP

    Convolutional Neural Network for Behavioral Modeling and Predistortion of Wideband Power Amplifiers

    Authors: Xin Hu, Zhijun Liu, Xiaofei Yu, Yulong Zhao, Wenhua Chen, Biao Hu, Xuekun Du, Xiang Li, Mohamed Helaoui, Weidong Wang, Fadhel M. Ghannouchi

    Abstract: In this paper, we propose a novel behavior model for wideband PAs using a real-valued time-delay convolutional neural network (RVTDCNN). The input data of the model are sorted and arranged as the graph composed of the in-phase and quadrature (I/Q) components and envelope-dependent terms of current and past signals. We design a pre-designed filter using the convolutional layer to extract the basis… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.