Skip to main content

Showing 1–21 of 21 results for author: Bu, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  2. arXiv:2401.03473  [pdf, ps, other

    cs.SD cs.AI eess.AS

    ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

    Abstract: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours… ▽ More

    Submitted 20 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  3. arXiv:2312.06454  [pdf, other

    eess.IV cs.CV cs.LG

    Point Transformer with Federated Learning for Predicting Breast Cancer HER2 Status from Hematoxylin and Eosin-Stained Whole Slide Images

    Authors: Bao Li, Zhenyu Liu, Lizhi Shao, Bensheng Qiu, Hong Bu, Jie Tian

    Abstract: Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation a… ▽ More

    Submitted 27 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  4. arXiv:2309.13573  [pdf, other

    cs.SD eess.AS

    The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

    Authors: Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

    Abstract: With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tr… ▽ More

    Submitted 5 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: 8 pages, Accepted by ASRU2023

  5. arXiv:2306.10805  [pdf

    physics.med-ph cs.CV eess.IV

    Experts' cognition-driven ensemble deep learning for external validation of predicting pathological complete response to neoadjuvant chemotherapy from histological images in breast cancer

    Authors: Yongquan Yang, Fengling Li, Yani Wei, Yuanyuan Zhao, **g Fu, Xiuli Xiao, Hong Bu

    Abstract: In breast cancer imaging, there has been a trend to directly predict pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) from histological images based on deep learning (DL). However, it has been a commonly known problem that the constructed DL-based models numerically have better performances in internal validation than in external validation. The primary reason for this situat… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  6. arXiv:2304.07295  [pdf

    q-bio.QM cs.AI eess.IV

    Experts' cognition-driven safe noisy labels learning for precise segmentation of residual tumor in breast cancer

    Authors: Yongquan Yang, Jie Chen, Yani Wei, Mohammad Alobaidi, Hong Bu

    Abstract: Precise segmentation of residual tumor in breast cancer (PSRTBC) after neoadjuvant chemotherapy is a fundamental key technique in the treatment process of breast cancer. However, achieving PSRTBC is still a challenge, since the breast cancer tissue and tumor cells commonly have complex and varied morphological changes after neoadjuvant chemotherapy, which inevitably increases the difficulty to pro… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  7. arXiv:2211.01585  [pdf, other

    cs.SD eess.AS

    The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results

    Authors: Ao Zhang, Fan Yu, Kaixun Huang, Lei Xie, Longbiao Wang, Eng Siong Chng, Hui Bu, Binbin Zhang, Wei Chen, Xin Xu

    Abstract: This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC). We first address the necessity of the challenge and then introduce the associated dataset collected from a new-energy vehicle (NEV) covering a variety of cockpit acoustic conditions and linguistic contents. We then describe the track arrangement and the baseline system. Specifically, w… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Accepted by ISCSLP2022

  8. arXiv:2209.05273  [pdf, other

    eess.AS

    The 2022 Far-field Speaker Verification Challenge: Exploring domain mismatch and semi-supervised learning under the far-field scenario

    Authors: Xiaoyi Qin, Ming Li, Hui Bu, Shrikanth Narayanan, Haizhou Li

    Abstract: FFSVC2022 is the second challenge of far-field speaker verification. FFSVC2022 provides the fully-supervised far-field speaker verification to further explore the far-field scenario and proposes semi-supervised far-field speaker verification. In contrast to FFSVC2020, FFSVC2022 focus on the single-channel scenario. In addition, a supplementary set for the FFSVC2020 dataset is released this year. T… ▽ More

    Submitted 15 September, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

  9. arXiv:2205.03850  [pdf, other

    cs.CR cs.LG eess.SY

    SeqNet: An Efficient Neural Network for Automatic Malware Detection

    Authors: Jiawei Xu, Wenxuan Fu, Haoyu Bu, Zhi Wang, Lingyun Ying

    Abstract: Malware continues to evolve rapidly, and more than 450,000 new samples are captured every day, which makes manual malware analysis impractical. However, existing deep learning detection models need manual feature engineering or require high computational overhead for long training processes, which might be laborious to select feature space and difficult to retrain for mitigating model aging. There… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

  10. arXiv:2202.03647  [pdf, other

    cs.SD eess.AS

    Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

    Authors: Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu

    Abstract: The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge (M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech technologies. The M2MeT challenge has particularly set up two tracks, speaker diarization (track 1) and multi-speaker automatic speech recognition (ASR) (track 2). Along with the challenge, we released 120 hours of real-recorded Ma… ▽ More

    Submitted 25 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  11. arXiv:2110.07393  [pdf, other

    cs.SD eess.AS

    M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

    Authors: Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie, Siqi Zheng, Zhihao Du, Weilong Huang, Pengcheng Guo, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu

    Abstract: Recent development of speech processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Specifically, two typical tasks, speaker diarization and multi-speaker automatic speech recognition hav… ▽ More

    Submitted 25 February, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  12. arXiv:2104.03603  [pdf, other

    cs.SD eess.AS

    AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario

    Authors: Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, **gdong Chen

    Abstract: In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical ap… ▽ More

    Submitted 10 August, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Accepted by Interspeech 2021

  13. arXiv:2104.01818  [pdf, other

    eess.AS

    The Multi-speaker Multi-style Voice Cloning Challenge 2021

    Authors: Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu

    Abstract: The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning task. Specifically, we formulate the challenge to adapt an average TTS model to the stylistic target voice with limited data from target speaker, evaluated by speaker identity and style similarity. The challenge consists… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: has been accepted to ICASSP 2021

  14. arXiv:2104.00960  [pdf, other

    eess.AS cs.SD

    INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing

    Authors: Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang

    Abstract: The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. The challenge consists of two separate tasks: 1) Task 1 is multi-channel speech enhancement with single microphone array and focusing on practical application with real-time requirement and 2) Task 2 is multi-channel speech enhancement with multiple distribu… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: 5 pages, submitted to INTERSPEECH 2021

  15. arXiv:2102.12173  [pdf

    eess.IV

    Deep learning-based framework for cardiac function assessment in embryonic zebrafish from heart beating videos

    Authors: Amir Mohammad Naderi, Haisong Bu, **gcheng Su, Mao-Hsiang Huang, Khuong Vo, Ramses Seferino Trigo Torres, J. -C. Chiao, Juhyun Lee, Michael P. H. Lau, Xiaolei Xu, Hung Cao

    Abstract: Zebrafish is a powerful and widely-used model system for a host of biological investigations including cardiovascular studies and genetic screening. Zebrafish are readily assessable during developmental stages; however, the current methods for quantification and monitoring of cardiac functions mostly involve tedious manual work and inconsistent estimations. In this paper, we developed and validate… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

  16. arXiv:2011.11879  [pdf

    eess.IV cs.CV cs.LG

    Blind deblurring for microscopic pathology images using deep learning networks

    Authors: Cheng Jiang, Jun Liao, Pei Dong, Zhaoxuan Ma, De Cai, Guoan Zheng, Yue** Liu, Hong Bu, Jianhua Yao

    Abstract: Artificial Intelligence (AI)-powered pathology is a revolutionary step in the world of digital pathology and shows great promise to increase both diagnosis accuracy and efficiency. However, defocus and motion blur can obscure tissue or cell characteristics hence compromising AI algorithms'accuracy and robustness in analyzing the images. In this paper, we demonstrate a deep-learning-based approach… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

  17. arXiv:2011.02198  [pdf, other

    cs.SD eess.AS

    IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

    Authors: Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez

    Abstract: The IEEE Spoken Language Technology Workshop (SLT) 2021 Alpha-mini Speech Challenge (ASC) is intended to improve research on keyword spotting (KWS) and sound source location (SSL) on humanoid robots. Many publications report significant improvements in deep learning based KWS and SSL on open source datasets in recent years. For deep learning model training, it is necessary to expand the data cover… ▽ More

    Submitted 14 November, 2020; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted at IEEE SLT 2021

  18. arXiv:2010.11567  [pdf, other

    cs.SD eess.AS

    AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines

    Authors: Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, Ming Li

    Abstract: In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided… ▽ More

    Submitted 22 April, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

  19. arXiv:2005.08046  [pdf, other

    eess.AS cs.SD

    The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

    Authors: Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li

    Abstract: The INTERSPEECH 2020 Far-Field Speaker Verification Challenge (FFSVC 2020) addresses three different research problems under well-defined conditions: far-field text-dependent speaker verification from single microphone array, far-field text-independent speaker verification from single microphone array, and far-field text-dependent speaker verification from distributed microphone arrays. All three… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: Submitted to INTERSPEECH 2020

  20. arXiv:2002.00387  [pdf, other

    cs.SD eess.AS

    The FFSVC 2020 Evaluation Plan

    Authors: Xiaoyi Qin, Ming Li, Hui Bu, Rohan Kumar Das, Wei Rao, Shrikanth Narayanan, Haizhou Li

    Abstract: The Far-Field Speaker Verification Challenge 2020 (FFSVC20) is designed to boost the speaker verification research with special focus on far-field distributed microphone arrays under noisy conditions in real scenarios. The objectives of this challenge are to: 1) benchmark the current speech verification technology under this challenging condition, 2) promote the development of new ideas and techno… ▽ More

    Submitted 4 February, 2020; v1 submitted 2 February, 2020; originally announced February 2020.

  21. arXiv:1912.01231  [pdf, other

    cs.SD eess.AS

    HI-MIA : A Far-field Text-Dependent Speaker Verification Database and the Baselines

    Authors: Xiaoyi Qin, Hui Bu, Ming Li

    Abstract: This paper presents a far-field text-dependent speaker verification database named HI-MIA. We aim to meet the data requirement for far-field microphone array based speaker verification since most of the publicly available databases are single channel close-talking and text-independent. The database contains recordings of 340 people in rooms designed for the far-field scenario. Recordings are captu… ▽ More

    Submitted 1 February, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Accepted at ICASSP 2020