Skip to main content

Showing 1–28 of 28 results for author: Bu, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19959  [pdf, other

    cs.SD eess.AS

    RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

    Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

    Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.10304  [pdf, other

    cs.CL

    Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

    Authors: Ming Gao, Hang Chen, Jun Du, Xin Xu, Hongxiao Guo, Hui Bu, Jianxing Yang, Ming Li, Chin-Hui Lee

    Abstract: Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: to be published in Interspeech 2024

  3. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2405.03644  [pdf, other

    cs.CR cs.AI

    When LLMs Meet Cybersecurity: A Systematic Literature Review

    Authors: Jie Zhang, Haoyu Bu, Hui Wen, Yu Chen, Lun Li, Hongsong Zhu

    Abstract: The rapid advancements in large language models (LLMs) have opened new avenues across various fields, including cybersecurity, which faces an ever-evolving threat landscape and need for innovative technologies. Despite initial explorations into the application of LLMs in cybersecurity, there is a lack of a comprehensive overview of this research area. This paper bridge this gap by providing a syst… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 36 pages, 7 figures

  5. arXiv:2401.03473  [pdf, ps, other

    cs.SD cs.AI eess.AS

    ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

    Abstract: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours… ▽ More

    Submitted 20 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  6. arXiv:2401.01735  [pdf, other

    cs.GT

    Economics Arena for Large Language Models

    Authors: Shangmin Guo, Haoran Bu, Haochuan Wang, Yi Ren, Dianbo Sui, Yuming Shang, Siting Lu

    Abstract: Large language models (LLMs) have been extensively used as the backbones for general-purpose agents, and some economics literature suggest that LLMs are capable of playing various types of economics games. Following these works, to overcome the limitation of evaluating LLMs using static benchmarks, we propose to explore competitive games as an evaluation for LLMs to incorporate multi-players and d… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  7. arXiv:2312.06454  [pdf, other

    eess.IV cs.CV cs.LG

    Point Transformer with Federated Learning for Predicting Breast Cancer HER2 Status from Hematoxylin and Eosin-Stained Whole Slide Images

    Authors: Bao Li, Zhenyu Liu, Lizhi Shao, Bensheng Qiu, Hong Bu, Jie Tian

    Abstract: Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation a… ▽ More

    Submitted 27 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  8. arXiv:2309.13573  [pdf, other

    cs.SD eess.AS

    The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

    Authors: Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

    Abstract: With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tr… ▽ More

    Submitted 5 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: 8 pages, Accepted by ASRU2023

  9. arXiv:2307.02709  [pdf

    cs.AI

    Validation of the Practicability of Logical Assessment Formula for Evaluations with Inaccurate Ground-Truth Labels

    Authors: Yongquan Yang, Hong Bu

    Abstract: Logical assessment formula (LAF) is a new theory proposed for evaluations with inaccurate ground-truth labels (IAGTLs) to assess the predictive models for various artificial intelligence applications. However, the practicability of LAF for evaluations with IAGTLs has not yet been validated in real-world practice. In this paper, to address this issue, we applied LAF to tumour segmentation for breas… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.11567

  10. arXiv:2306.10805  [pdf

    physics.med-ph cs.CV eess.IV

    Experts' cognition-driven ensemble deep learning for external validation of predicting pathological complete response to neoadjuvant chemotherapy from histological images in breast cancer

    Authors: Yongquan Yang, Fengling Li, Yani Wei, Yuanyuan Zhao, **g Fu, Xiuli Xiao, Hong Bu

    Abstract: In breast cancer imaging, there has been a trend to directly predict pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) from histological images based on deep learning (DL). However, it has been a commonly known problem that the constructed DL-based models numerically have better performances in internal validation than in external validation. The primary reason for this situat… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  11. arXiv:2304.07295  [pdf

    q-bio.QM cs.AI eess.IV

    Experts' cognition-driven safe noisy labels learning for precise segmentation of residual tumor in breast cancer

    Authors: Yongquan Yang, Jie Chen, Yani Wei, Mohammad Alobaidi, Hong Bu

    Abstract: Precise segmentation of residual tumor in breast cancer (PSRTBC) after neoadjuvant chemotherapy is a fundamental key technique in the treatment process of breast cancer. However, achieving PSRTBC is still a challenge, since the breast cancer tissue and tumor cells commonly have complex and varied morphological changes after neoadjuvant chemotherapy, which inevitably increases the difficulty to pro… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  12. arXiv:2211.01585  [pdf, other

    cs.SD eess.AS

    The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results

    Authors: Ao Zhang, Fan Yu, Kaixun Huang, Lei Xie, Longbiao Wang, Eng Siong Chng, Hui Bu, Binbin Zhang, Wei Chen, Xin Xu

    Abstract: This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC). We first address the necessity of the challenge and then introduce the associated dataset collected from a new-energy vehicle (NEV) covering a variety of cockpit acoustic conditions and linguistic contents. We then describe the track arrangement and the baseline system. Specifically, w… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Accepted by ISCSLP2022

  13. arXiv:2205.03850  [pdf, other

    cs.CR cs.LG eess.SY

    SeqNet: An Efficient Neural Network for Automatic Malware Detection

    Authors: Jiawei Xu, Wenxuan Fu, Haoyu Bu, Zhi Wang, Lingyun Ying

    Abstract: Malware continues to evolve rapidly, and more than 450,000 new samples are captured every day, which makes manual malware analysis impractical. However, existing deep learning detection models need manual feature engineering or require high computational overhead for long training processes, which might be laborious to select feature space and difficult to retrain for mitigating model aging. There… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

  14. arXiv:2202.03647  [pdf, other

    cs.SD eess.AS

    Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

    Authors: Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu

    Abstract: The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge (M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech technologies. The M2MeT challenge has particularly set up two tracks, speaker diarization (track 1) and multi-speaker automatic speech recognition (ASR) (track 2). Along with the challenge, we released 120 hours of real-recorded Ma… ▽ More

    Submitted 25 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  15. One-Step Abductive Multi-Target Learning with Diverse Noisy Samples and Its Application to Tumour Segmentation for Breast Cancer

    Authors: Yongquan Yang, Fengling Li, Yani Wei, Jie Chen, Ning Chen, Mohammad H. Alobaidi, Hong Bu

    Abstract: Recent studies have demonstrated the effectiveness of the combination of machine learning and logical reasoning, including data-driven logical reasoning, knowledge driven machine learning and abductive learning, in inventing advanced technologies for different artificial intelligence applications. One-step abductive multi-target learning (OSAMTL), an approach inspired by abductive learning, via si… ▽ More

    Submitted 12 April, 2024; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: The final published version (81 pages)

    Journal ref: Expert Systems with Applications, 2024

  16. arXiv:2110.07393  [pdf, other

    cs.SD eess.AS

    M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

    Authors: Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie, Siqi Zheng, Zhihao Du, Weilong Huang, Pengcheng Guo, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu

    Abstract: Recent development of speech processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Specifically, two typical tasks, speaker diarization and multi-speaker automatic speech recognition hav… ▽ More

    Submitted 25 February, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  17. arXiv:2110.03370  [pdf, other

    cs.SD cs.CL

    WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

    Authors: Binbin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di Wu, Zhendong Peng

    Abstract: In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total. We collect the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition… ▽ More

    Submitted 23 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

  18. arXiv:2104.03603  [pdf, other

    cs.SD eess.AS

    AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario

    Authors: Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, **gdong Chen

    Abstract: In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical ap… ▽ More

    Submitted 10 August, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Accepted by Interspeech 2021

  19. arXiv:2104.00960  [pdf, other

    eess.AS cs.SD

    INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing

    Authors: Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang

    Abstract: The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. The challenge consists of two separate tasks: 1) Task 1 is multi-channel speech enhancement with single microphone array and focusing on practical application with real-time requirement and 2) Task 2 is multi-channel speech enhancement with multiple distribu… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: 5 pages, submitted to INTERSPEECH 2021

  20. arXiv:2011.11879  [pdf

    eess.IV cs.CV cs.LG

    Blind deblurring for microscopic pathology images using deep learning networks

    Authors: Cheng Jiang, Jun Liao, Pei Dong, Zhaoxuan Ma, De Cai, Guoan Zheng, Yue** Liu, Hong Bu, Jianhua Yao

    Abstract: Artificial Intelligence (AI)-powered pathology is a revolutionary step in the world of digital pathology and shows great promise to increase both diagnosis accuracy and efficiency. However, defocus and motion blur can obscure tissue or cell characteristics hence compromising AI algorithms'accuracy and robustness in analyzing the images. In this paper, we demonstrate a deep-learning-based approach… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

  21. arXiv:2011.02198  [pdf, other

    cs.SD eess.AS

    IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

    Authors: Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez

    Abstract: The IEEE Spoken Language Technology Workshop (SLT) 2021 Alpha-mini Speech Challenge (ASC) is intended to improve research on keyword spotting (KWS) and sound source location (SSL) on humanoid robots. Many publications report significant improvements in deep learning based KWS and SSL on open source datasets in recent years. For deep learning model training, it is necessary to expand the data cover… ▽ More

    Submitted 14 November, 2020; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted at IEEE SLT 2021

  22. arXiv:2010.11567  [pdf, other

    cs.SD eess.AS

    AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines

    Authors: Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, Ming Li

    Abstract: In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided… ▽ More

    Submitted 22 April, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

  23. arXiv:2005.08046  [pdf, other

    eess.AS cs.SD

    The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

    Authors: Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li

    Abstract: The INTERSPEECH 2020 Far-Field Speaker Verification Challenge (FFSVC 2020) addresses three different research problems under well-defined conditions: far-field text-dependent speaker verification from single microphone array, far-field text-independent speaker verification from single microphone array, and far-field text-dependent speaker verification from distributed microphone arrays. All three… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: Submitted to INTERSPEECH 2020

  24. arXiv:2002.00387  [pdf, other

    cs.SD eess.AS

    The FFSVC 2020 Evaluation Plan

    Authors: Xiaoyi Qin, Ming Li, Hui Bu, Rohan Kumar Das, Wei Rao, Shrikanth Narayanan, Haizhou Li

    Abstract: The Far-Field Speaker Verification Challenge 2020 (FFSVC20) is designed to boost the speaker verification research with special focus on far-field distributed microphone arrays under noisy conditions in real scenarios. The objectives of this challenge are to: 1) benchmark the current speech verification technology under this challenging condition, 2) promote the development of new ideas and techno… ▽ More

    Submitted 4 February, 2020; v1 submitted 2 February, 2020; originally announced February 2020.

  25. arXiv:1912.01231  [pdf, other

    cs.SD eess.AS

    HI-MIA : A Far-field Text-Dependent Speaker Verification Database and the Baselines

    Authors: Xiaoyi Qin, Hui Bu, Ming Li

    Abstract: This paper presents a far-field text-dependent speaker verification database named HI-MIA. We aim to meet the data requirement for far-field microphone array based speaker verification since most of the publicly available databases are single channel close-talking and text-independent. The database contains recordings of 340 people in rooms designed for the far-field scenario. Recordings are captu… ▽ More

    Submitted 1 February, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Accepted at ICASSP 2020

  26. arXiv:1904.06026  [pdf

    cs.CV

    Cycle-Consistent Adversarial GAN: the integration of adversarial attack and defense

    Authors: Lingyun Jiang, Kai Qiao, Ruoxi Qin, Linyuan Wang, Jian Chen, Haibing Bu, Bin Yan

    Abstract: In image classification of deep learning, adversarial examples where inputs intended to add small magnitude perturbations may mislead deep neural networks (DNNs) to incorrect results, which means DNNs are vulnerable to them. Different attack and defense strategies have been proposed to better research the mechanism of deep learning. However, those research in these networks are only for one aspect… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: 13 pages,7 tables, 1 figure

  27. arXiv:1808.10583  [pdf, other

    cs.CL

    AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale

    Authors: Jiayu Du, Xingyu Na, Xuechen Liu, Hui Bu

    Abstract: AISHELL-1 is by far the largest open-source speech corpus available for Mandarin speech recognition research. It was released with a baseline system containing solid training and testing pipelines for Mandarin ASR. In AISHELL-2, 1000 hours of clean read-speech data from iOS is published, which is free for academic usage. On top of AISHELL-2 corpus, an improved recipe is developed and released, con… ▽ More

    Submitted 12 September, 2018; v1 submitted 30 August, 2018; originally announced August 2018.

  28. arXiv:1709.05522  [pdf, other

    cs.CL

    AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

    Authors: Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng

    Abstract: An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the largest corpus which is suitable for conducting the speech recognition research and building speech recognition systems for Mandarin. The recording procedure, including audio capturing devices and environments are presented in details. The preparation of the related resources, including transcriptions and lexicon… ▽ More

    Submitted 16 September, 2017; originally announced September 2017.

    Comments: Oriental COCOSDA 2017