Skip to main content

Showing 1–9 of 9 results for author: Xiang, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.15294  [pdf

    eess.SP cs.LG

    Multimodal Physical Fitness Monitoring (PFM) Framework Based on TimeMAE-PFM in Wearable Scenarios

    Authors: Junjie Zhang, Zheming Zhang, Huachen Xiang, Yangquan Tan, Linnan Huo, Fengyi Wang

    Abstract: Physical function monitoring (PFM) plays a crucial role in healthcare especially for the elderly. Traditional assessment methods such as the Short Physical Performance Battery (SPPB) have failed to capture the full dynamic characteristics of physical function. Wearable sensors such as smart wristbands offer a promising solution to this issue. However, challenges exist, such as the computational co… ▽ More

    Submitted 25 March, 2024; originally announced April 2024.

    Comments: 5 pages, 6 figures

  2. arXiv:2309.07413  [pdf, other

    cs.CL cs.SD eess.AS

    CPPF: A contextual and post-processing-free model for automatic speech recognition

    Authors: Lei Zhang, Zhengkun Tian, Xiang Chen, Jiaming Sun, Hongyu Xiang, Ke Ding, Guanglu Wan

    Abstract: ASR systems have become increasingly widespread in recent years. However, their textual outputs often require post-processing tasks before they can be practically utilized. To address this issue, we draw inspiration from the multifaceted capabilities of LLMs and Whisper, and focus on integrating multiple ASR text processing tasks related to speech recognition into the ASR model. This integration n… ▽ More

    Submitted 20 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP2024

  3. arXiv:2309.04182  [pdf, other

    cs.SD cs.IR eess.AS

    A Long-Tail Friendly Representation Framework for Artist and Music Similarity

    Authors: Haoran Xiang, Junyu Dai, Xuchen Song, Furao Shen

    Abstract: The investigation of the similarity between artists and music is crucial in music retrieval and recommendation, and addressing the challenge of the long-tail phenomenon is increasingly important. This paper proposes a Long-Tail Friendly Representation Framework (LTFRF) that utilizes neural networks to model the similarity relationship. Our approach integrates music, user, metadata, and relationshi… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  4. arXiv:2304.11526  [pdf, other

    eess.SY cs.AI

    How to Control Hydrodynamic Force on Fluidic Pinball via Deep Reinforcement Learning

    Authors: Haodong Feng, Yue Wang, Hui Xiang, Zhiyang **, Dixia Fan

    Abstract: Deep reinforcement learning (DRL) for fluidic pinball, three individually rotating cylinders in the uniform flow arranged in an equilaterally triangular configuration, can learn the efficient flow control strategies due to the validity of self-learning and data-driven state estimation for complex fluid dynamic problems. In this work, we present a DRL-based real-time feedback strategy to control th… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

  5. arXiv:2211.03284  [pdf, other

    eess.AS cs.SD

    Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

    Authors: Zhengkun Tian, Hongyu Xiang, Min Li, Feifei Lin, Ke Ding, Guanglu Wan

    Abstract: The CTC model has been widely applied to many application scenarios because of its simple structure, excellent performance, and fast inference speed. There are many peaks in the probability distribution predicted by the CTC models, and each peak represents a non-blank token. The recognition latency of CTC models can be reduced by encouraging the model to predict peaks earlier. Existing methods to… ▽ More

    Submitted 15 March, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP 2023(5 pages, 2 figures)

  6. arXiv:2203.16758  [pdf, other

    eess.AS cs.CL

    CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

    Authors: Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan

    Abstract: History and future contextual information are known to be important for accurate acoustic modeling. However, acquiring future context brings latency for streaming ASR. In this paper, we propose a new framework - Chunking, Simulating Future Context and Decoding (CUSIDE) for streaming speech recognition. A new simulation module is introduced to recursively simulate the future contextual frames, with… ▽ More

    Submitted 2 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted into INTERSPEECH 2022

  7. arXiv:2005.13326  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

    Authors: Keyu An, Hongyu Xiang, Zhijian Ou

    Abstract: In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art r… ▽ More

    Submitted 4 August, 2020; v1 submitted 27 May, 2020; originally announced May 2020.

    Comments: Accepted into INTERSPEECH 2020. arXiv admin note: text overlap with arXiv:1911.08747

  8. arXiv:2002.08419  [pdf, ps, other

    cs.NI eess.SP

    Mode Selection and Resource Allocation in Sliced Fog Radio Access Networks: A Reinforcement Learning Approach

    Authors: Hongyu Xiang, Mugen Peng, Yaohua Sun, Shi Yan

    Abstract: The mode selection and resource allocation in fog radio access networks (F-RANs) have been advocated as key techniques to improve spectral and energy efficiency. In this paper, we investigate the joint optimization of mode selection and resource allocation in uplink F-RANs, where both of the traditional user equipments (UEs) and fog UEs are served by constructed network slice instances. The concer… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

  9. arXiv:1911.08747  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    CAT: CRF-based ASR Toolkit

    Authors: Keyu An, Hongyu Xiang, Zhijian Ou

    Abstract: In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit). A key feature of CAT is discriminative training in the framework of conditional random field (CRF), particularly with connectionist temporal classification (CTC) inspired state topology. CAT contains a full-fledged implementation of CTC-CRF and provides a complete workflow… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: Code released at: https://github.com/thu-spmi/cat