Skip to main content

Showing 1–20 of 20 results for author: Hao, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.06230  [pdf

    eess.IV

    Fire in SRRN: Next-Gen 3D Temperature Field Reconstruction Technology

    Authors: Shenxiang Feng, Xiaojian Hao, Xiaodong Huang, Pan Pei, Tong Wei, Chenyang Xu

    Abstract: In aerospace and energy engineering, accurate 3D combustion field temperature measurement is critical. The resolution of traditional methods based on algebraic iteration is limited by the initial voxel division. This study introduces a novel method for reconstructing three-dimensional temperature fields using the Spatial Radiation Representation Network (SRRN). This method utilizes the flame therm… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2403.09693  [pdf, other

    eess.SP

    A Constrained Deep Reinforcement Learning Optimization for Reliable Network Slicing in a Blockchain-Secured Low-Latency Wireless Network

    Authors: Xin Hao, Phee Lep Yeoh, Changyang She, Yao Yu, Branka Vucetic, Yonghui Li

    Abstract: Network slicing (NS) is a promising technology that supports diverse requirements for next-generation low-latency wireless communication networks. However, the tampering attack is a rising issue of jeopardizing NS service-provisioning. To resist tampering attacks in NS networks, we propose a novel optimization framework for reliable NS resource allocation in a blockchain-secured low-latency wirele… ▽ More

    Submitted 16 February, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2312.08016

  3. arXiv:2310.07284  [pdf, other

    eess.AS cs.CL

    Ty** to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction

    Authors: Xiang Hao, Jibin Wu, Jianwei Yu, Chenglin Xu, Kay Chen Tan

    Abstract: Humans possess an extraordinary ability to selectively focus on the sound source of interest amidst complex acoustic environments, commonly referred to as cocktail party scenarios. In an attempt to replicate this remarkable auditory attention capability in machines, target speaker extraction (TSE) models have been developed. These models leverage the pre-registered cues of the target speaker to ex… ▽ More

    Submitted 14 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Under review, https://github.com/haoxiangsnr/llm-tse

  4. arXiv:2306.08998  [pdf, other

    cs.SD cs.CV eess.AS

    Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023

    Authors: Yuqi Li, Yizhi Luo, Xiaoshuai Hao, Chuanguang Yang, Zhulin An, Dantong Song, Wei Yi

    Abstract: In this report, we describe the technical details of our submission to the EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023, by Team "AcieLee" (username: Yuqi\_Li). The task is to classify the audio caused by interactions between objects, or from events of the camera wearer. We conducted exhaustive experiments and found learning rate step decay, backbone frozen, label smoothing and f… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  5. arXiv:2305.09302  [pdf, other

    cs.CV cs.AI eess.AS

    Pink-Eggs Dataset V1: A Step Toward Invasive Species Management Using Deep Learning Embedded Solutions

    Authors: Di Xu, Yang Zhao, Xiang Hao, Xin Meng

    Abstract: We introduce a novel dataset consisting of images depicting pink eggs that have been identified as Pomacea canaliculata eggs, accompanied by corresponding bounding box annotations. The purpose of this dataset is to aid researchers in the analysis of the spread of Pomacea canaliculata species by utilizing deep learning techniques, as well as supporting other investigative pursuits that require visu… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Report number: 02

  6. arXiv:2303.07621  [pdf, other

    eess.AS cs.SD

    Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

    Authors: Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie

    Abstract: In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise,… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  7. arXiv:2212.09019  [pdf, other

    eess.AS eess.SP

    Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement

    Authors: Xiang Hao, Xiaofei Li

    Abstract: FullSubNet is our recently proposed real-time single-channel speech enhancement network that achieves outstanding performance on the Deep Noise Suppression (DNS) Challenge dataset. A number of variants of FullSubNet have been proposed, but they all focus on the structure design towards better performance and are rarely concerned with computational efficiency. For many speech enhancement applicatio… ▽ More

    Submitted 6 March, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

  8. arXiv:2203.16054  [pdf, other

    cs.SD cs.LG eess.AS

    Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers

    Authors: Zhenhao **, Xiang Hao, Xiangdong Su

    Abstract: The vast majority of speech separation methods assume that the number of speakers is known in advance, hence they are specific to the number of speakers. By contrast, a more realistic and challenging task is to separate a mixture in which the number of speakers is unknown. This paper formulates the speech separation with the unknown number of speakers as a multi-pass source extraction problem and… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  9. arXiv:2111.08857  [pdf, other

    cs.LG cs.AI cs.MA cs.RO eess.SY

    SEIHAI: A Sample-efficient Hierarchical AI for the MineRL Competition

    Authors: Hangyu Mao, Chao Wang, Xiaotian Hao, Yihuan Mao, Yiming Lu, Chengjie Wu, Jianye Hao, Dong Li, **zhong Tang

    Abstract: The MineRL competition is designed for the development of reinforcement learning and imitation learning algorithms that can efficiently leverage human demonstrations to drastically reduce the number of environment interactions needed to solve the complex \emph{ObtainDiamond} task with sparse rewards. To address the challenge, in this paper, we present \textbf{SEIHAI}, a \textbf{S}ample-\textbf{e}f… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: The winner solution of NeurIPS 2020 MineRL competition (https://www.aicrowd.com/challenges/neurips-2020-minerl-competition/leaderboards). The paper has been accepted by DAI 2021 (the third International Conference on Distributed Artificial Intelligence)

  10. arXiv:2105.06779  [pdf, other

    eess.IV cs.CV

    DARNet: Dual-Attention Residual Network for Automatic Diagnosis of COVID-19 via CT Images

    Authors: Jun Shi, Huite Yi, Shulan Ruan, Zhaohui Wang, Xiaoyu Hao, Hong An, Wei Wei

    Abstract: The ongoing global pandemic of Coronavirus Disease 2019 (COVID-19) poses a serious threat to public health and the economy. Rapid and accurate diagnosis of COVID-19 is crucial to prevent the further spread of the disease and reduce its mortality. Chest Computed tomography (CT) is an effective tool for the early diagnosis of lung diseases including pneumonia. However, detecting COVID-19 from CT is… ▽ More

    Submitted 30 August, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: 7 pages, 4 figures,

  11. arXiv:2010.15521  [pdf, other

    eess.AS cs.SD eess.SP

    UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition

    Authors: Xiang Hao, Xiangdong Su, Zhiyu Wang, Hui Zhang, Batushiren

    Abstract: Speech enhancement at extremely low signal-to-noise ratio (SNR) condition is a very challenging problem and rarely investigated in previous works. This paper proposes a robust speech enhancement approach (UNetGAN) based on U-Net and generative adversarial learning to deal with this problem. This approach consists of a generator network and a discriminator network, which operate directly in the tim… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Comments: Published in Interspeech 2019

  12. arXiv:2010.15508  [pdf, other

    eess.AS cs.SD eess.SP

    FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

    Authors: Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li

    Abstract: This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several cont… ▽ More

    Submitted 23 January, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

    Comments: 5 pages, submitted to 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

  13. arXiv:2006.16312  [pdf, other

    cs.LG cs.DS cs.IR eess.SY stat.ML

    Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

    Authors: Xiaotian Hao, Zhaoqing Peng, Yi Ma, Guan Wang, Junqi **, Jianye Hao, Shan Chen, Rongquan Bai, Mingzhou Xie, Miao Xu, Zhenzhe Zheng, Chuan Yu, Han Li, Jian Xu, Kun Gai

    Abstract: In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing adver… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: accepted by ICML 2020

  14. arXiv:2006.06196  [pdf, other

    cs.CV cs.LG eess.IV

    An Edge Information and Mask Shrinking Based Image Inpainting Approach

    Authors: Huali Xu, Xiangdong Su, Meng Wang, Xiang Hao, Guanglai Gao

    Abstract: In the image inpainting task, the ability to repair both high-frequency and low-frequency information in the missing regions has a substantial influence on the quality of the restored image. However, existing inpainting methods usually fail to consider both high-frequency and low-frequency information simultaneously. To solve this problem, this paper proposes edge information and mask shrinking ba… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: Accepted by ICME2020

  15. SNR-Based Teachers-Student Technique for Speech Enhancement

    Authors: Xiang Hao, Xiangdong Su, Zhiyu Wang, Qiang Zhang, Huali Xu, Guanglai Gao

    Abstract: It is very challenging for speech enhancement methods to achieves robust performance under both high signal-to-noise ratio (SNR) and low SNR simultaneously. In this paper, we propose a method that integrates an SNR-based teachers-student technique and time-domain U-Net to deal with this problem. Specifically, this method consists of multiple teacher models and a student model. We first train the t… ▽ More

    Submitted 29 October, 2020; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: Published in 2020 IEEE International Conference on Multimedia and Expo (ICME 2020)

  16. Sub-Band Knowledge Distillation Framework for Speech Enhancement

    Authors: Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li

    Abstract: In single-channel speech enhancement, methods based on full-band spectral features have been widely studied. However, only a few methods pay attention to non-full-band spectral features. In this paper, we explore a knowledge distillation framework based on sub-band spectral map** for single-channel speech enhancement. Specifically, we divide the full frequency band into multiple sub-bands and pr… ▽ More

    Submitted 29 October, 2020; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: Published in Interspeech 2020

  17. arXiv:2004.14774  [pdf, other

    cs.CV cs.LG cs.RO eess.IV stat.ML

    IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report

    Authors: Qi She, Fan Feng, Qi Liu, Rosa H. M. Chan, Xinyue Hao, Chuanlin Lan, Qihan Yang, Vincenzo Lomonaco, German I. Parisi, Heechul Bae, Eoin Brophy, Baoquan Chen, Gabriele Graffieti, Vidit Goel, Hyonyoung Han, Sathursan Kanagarajah, Somesh Kumar, Siew-Kei Lam, Tin Lun Lam, Liang Ma, Davide Maltoni, Lorenzo Pellegrini, Duvindu Piyasena, Shiliang Pu, Debdoot Sheet , et al. (11 additional authors not shown)

    Abstract: This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: 9 pages, 11 figures, 3 tables, accepted into IEEE Robotics and Automation Magazine. arXiv admin note: text overlap with arXiv:1911.06487

  18. arXiv:1909.07554  [pdf, other

    eess.SP cs.LG stat.ML

    Gated Recurrent Units Learning for Optimal Deployment of Visible Light Communications Enabled UAVs

    Authors: Yining Wang, Mingzhe Chen, Zhaohui Yang, Xue Hao, Tao Luo, Walid Saad

    Abstract: In this paper, the problem of optimizing the deployment of unmanned aerial vehicles (UAVs) equipped with visible light communication (VLC) capabilities is studied. In the studied model, the UAVs can simultaneously provide communications and illumination to service ground users. Ambient illumination increases the interference over VLC links while reducing the illumination threshold of the UAVs. The… ▽ More

    Submitted 16 September, 2019; originally announced September 2019.

    Comments: This paper has been accepted by the 2019 IEEE Global Communications Conference

  19. arXiv:1905.06312  [pdf, other

    cs.CV cs.LG eess.IV

    BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading

    Authors: Ziyuan Zhao, Kerui Zhang, Xuejie Hao, **g Tian, Matthew Chin Heng Chua, Li Chen, Xin Xu

    Abstract: Diabetic retinopathy (DR) is a common retinal disease that leads to blindness. For diagnosis purposes, DR image grading aims to provide automatic DR grade classification, which is not addressed in conventional research methods of binary DR image classification. Small objects in the eye images, like lesions and microaneurysms, are essential to DR grading in medical imaging, but they could easily be… ▽ More

    Submitted 1 July, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: Accepted at ICIP 2019

    Journal ref: 2019 IEEE International Conference on Image Processing (ICIP)

  20. arXiv:1404.1592  [pdf, other

    math.OC cs.LG eess.SY

    The Power of Online Learning in Stochastic Network Optimization

    Authors: Longbo Huang, Xin Liu, Xiaohong Hao

    Abstract: In this paper, we investigate the power of online learning in stochastic network optimization with unknown system statistics {\it a priori}. We are interested in understanding how information and learning can be efficiently incorporated into system control techniques, and what are the fundamental benefits of doing so. We propose two \emph{Online Learning-Aided Control} techniques, $\mathtt{OLAC}$… ▽ More

    Submitted 29 July, 2014; v1 submitted 6 April, 2014; originally announced April 2014.