Skip to main content

Showing 1–16 of 16 results for author: Lv, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2310.14278  [pdf, other

    cs.SD cs.CL eess.AS

    Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

    Authors: Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie

    Abstract: Automatic Speech Recognition (ASR) in conversational settings presents unique challenges, including extracting relevant contextual information from previous conversational turns. Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel conversational ASR system, extending the C… ▽ More

    Submitted 27 April, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: TASLP

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024

  2. arXiv:2309.13373  [pdf, other

    cs.SD cs.LG eess.AS

    Asca: less audio data is more insightful

    Authors: Xiang Li, Junhao Chen, Chao Li, Hongwu Lv

    Abstract: Audio recognition in specialized areas such as birdsong and submarine acoustics faces challenges in large-scale pre-training due to the limitations in available samples imposed by sampling environments and specificity requirements. While the Transformer model excels in audio recognition, its dependence on vast amounts of data becomes restrictive in resource-limited settings. Addressing this, we in… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: 6 pages,3 figures

  3. arXiv:2209.08933  [pdf, ps, other

    eess.IV cs.CV

    Estimating Brain Age with Global and Local Dependencies

    Authors: Yanwu Yang, Xutao Guo, Zhikai Chang, Chenfei Ye, Yang Xiang, Haiyan Lv, Ting Ma

    Abstract: The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such a… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

  4. arXiv:2207.01261  [pdf, other

    cs.SD eess.AS

    Minimizing Sequential Confusion Error in Speech Command Recognition

    Authors: Zhanheng Yang, Hang Lv, Xiong Wang, Ao Zhang, Lei Xie

    Abstract: Speech command recognition (SCR) has been commonly used on resource constrained devices to achieve hands-free user experience. However, in real applications, confusion among commands with similar pronunciations often happens due to the limited capacity of small models deployed on edge devices, which drastically affects the user experience. In this paper, inspired by the advances of discriminative… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted by Interspeech 2022

  5. arXiv:2203.16539  [pdf, other

    cs.LG eess.SP physics.optics

    Identification of diffracted vortex beams at different propagation distances using deep learning

    Authors: Heng Lv, Yan Guo, Zi-Xiang Yang, Chunling Ding, Wu-Hao Cai, Chenglong You, Rui-Bo **

    Abstract: Orbital angular momentum of light is regarded as a valuable resource in quantum technology, especially in quantum communication and quantum sensing and ranging. However, the OAM state of light is susceptible to undesirable experimental conditions such as propagation distance and phase distortions, which hinders the potential for the realistic implementation of relevant technologies. In this articl… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: 9 pages, 4 figures

    Journal ref: Frontiers in Physics 10, 843932 (2022)

  6. arXiv:2203.15455  [pdf, other

    cs.SD cs.CL eess.AS

    WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

    Authors: Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fu** Pan, Jianwei Niu

    Abstract: Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) W… ▽ More

    Submitted 5 July, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  7. arXiv:2109.07045  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Uncertainty Quantification in Medical Image Segmentation with Multi-decoder U-Net

    Authors: Yanwu Yang, Xutao Guo, Yiwei Pan, Pengcheng Shi, Haiyan Lv, Ting Ma

    Abstract: Accurate medical image segmentation is crucial for diagnosis and analysis. However, the models without calibrated uncertainty estimates might lead to errors in downstream analysis and exhibit low levels of robustness. Estimating the uncertainty in the measurement is vital to making definite, informed conclusions. Especially, it is difficult to make accurate predictions on ambiguous areas and focus… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: MICCAI_QUBIQ challenge, conference, Uncertainty qualification

  8. arXiv:2103.09063  [pdf, other

    cs.SD eess.AS

    An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

    Authors: Hang Lv, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie, Sanjeev Khudanpur

    Abstract: We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: 5 pages, 5 figures, icassp

  9. arXiv:2102.04488  [pdf, other

    cs.CL cs.SD eess.AS

    Wake Word Detection with Streaming Transformers

    Authors: Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

    Abstract: Modern wake word detection systems usually rely on neural networks for acoustic modeling. Transformers has recently shown superior performance over LSTM and convolutional networks in various sequence modeling tasks with their better temporal modeling power. However it is not clear whether this advantage still holds for short-range temporal modeling like wake word detection. Besides, the vanilla Tr… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted at IEEE ICASSP 2021. 5 pages, 3 figures

  10. arXiv:2011.09301  [pdf, other

    cs.SD eess.AS

    Context-aware RNNLM Rescoring for Conversational Speech Recognition

    Authors: Kun Wei, Pengcheng Guo, Hang Lv, Zhen Tu, Lei Xie

    Abstract: Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies. Prior work has explored the modeling of long-range context through RNNLM rescoring with improved performance. To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new cont… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

  11. arXiv:2009.08047  [pdf, ps, other

    eess.SP

    Empirical Fourier Decomposition: An Accurate Adaptive Signal Decomposition Method

    Authors: Wei Zhou, Zhongren Feng, Y. F. Xu, Xiongjiang Wang, Hao Lv

    Abstract: Signal decomposition is an effective tool to assist the identification of modal information in time-domain signals. Two signal decomposition methods, including the empirical wavelet transform (EWT) and Fourier decomposition method (FDM), have been developed based on Fourier theory. However, the EWT can suffer from a mode mixing problem for signals with closely-spaced modes and decomposition result… ▽ More

    Submitted 7 October, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

  12. arXiv:2005.08347  [pdf, other

    eess.AS cs.CL cs.SD

    Wake Word Detection with Alignment-Free Lattice-Free MMI

    Authors: Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

    Abstract: Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data, and to use it in on-line applications: (i) we remove the prerequisite of frame-level alignments in the LF-MMI training algorithm, permitting the use of un-tra… ▽ More

    Submitted 28 July, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: Accepted at Interspeech 2020. 5 pages, 3 figures

  13. arXiv:2001.10902  [pdf

    eess.SP

    RPCA-Based High Resolution Through-the-Wall Human Motion Detection and Classification

    Authors: Qiang An, Shuoguang Wang, Wenji Zhang, Hao Lv, Jianqi Wang, Shiyong Li, Ahmad Hoorfar

    Abstract: Radar based assisted living has received great amount of research interest in recent years. By employing the micro-Doppler features of indoor human motions, accurate recognition and classification of different types of movements become possible. Whereas, most of the existing works are focused only on free space detection, the literature on detection and recognition of human motions in through-the-… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

    Comments: 9 pages and 7 figures

  14. arXiv:2001.10449  [pdf

    eess.SP

    Range-Max Enhanced Ultra-Wideband Micro-Doppler Signatures of Behind Wall Indoor Human Activities

    Authors: Qiang An, Shuoguang Wang, Ahmad Hoorfar, Wenji Zhang, Hao Lv, Shiyong Li, Jianqi Wang

    Abstract: Penetrating detection and recognition of behind wall indoor human activities has drawn great attentions from social security and emergency service department in recent years since intelligent surveillance aforehand could avail the proper decision making before operations being carried out. However, due to the influence of the wall effects, the obtained micro-Doppler signatures would be severely de… ▽ More

    Submitted 29 January, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

    Comments: 14 pages and 9 figs

  15. arXiv:1912.00414  [pdf

    eess.SP

    Empirical Fourier Decomposition

    Authors: Wei Zhou, Zhongren Feng, Xiongjiang Wang, Hao Lv

    Abstract: In this paper, a novel decomposition method for non-stationary and nonlinear signals is proposed. This method is inspired by the adaptive wavelet filter bank of the empirical wavelet transform (EWT) and Fourier intrinsic band functions (FIBFs) of the Fourier decomposition method (FDM). Therefore, the proposed approach is entitled as empirical Fourier decomposition (EFD). EFD is defined as the adap… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

  16. arXiv:1909.08723  [pdf, other

    cs.CL cs.SD eess.AS

    Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

    Authors: Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language… ▽ More

    Submitted 14 October, 2019; v1 submitted 18 September, 2019; originally announced September 2019.

    Comments: Accepted to ASRU 2019