Skip to main content

Showing 1–11 of 11 results for author: Qian, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2309.09180  [pdf, other

    eess.AS cs.AI cs.SD

    Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

    Authors: Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee

    Abstract: We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance. Next, we further decrease the memory occupation of decoding by in… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  2. arXiv:2308.15990  [pdf, other

    cs.SD eess.AS

    Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

    Authors: Aoqi Guo, Sichong Qian, Baoxiang Li, Dazhi Gao

    Abstract: Neural beamformers, which integrate both pre-separation and beamforming modules, have demonstrated impressive effectiveness in target speech extraction. Nevertheless, the performance of these beamformers is inherently limited by the predictive accuracy of the pre-separation module. In this paper, we introduce a neural beamformer supported by a dual-path transformer. Initially, we employ the cross-… ▽ More

    Submitted 7 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  3. arXiv:2308.14638  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

    Authors: Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

    Abstract: This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy base… ▽ More

    Submitted 10 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by 2023 CHiME Workshop, Oral

  4. arXiv:2307.08234  [pdf, other

    eess.AS

    Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

    Authors: Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong, Ed Lin, Michael Zeng

    Abstract: Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions. Pretrained large language models (LLMs) have the potential to improve the performance of E2E ASR. However, integrating a pretrained language model into an E2E speech recognition model has shown limited benefits due to the mismatches between text-based LL… ▽ More

    Submitted 2 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  5. arXiv:2303.11329  [pdf, other

    cs.CV cs.SD eess.AS

    Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

    Authors: Ziyang Chen, Shengyi Qian, Andrew Owens

    Abstract: The images and sounds that we perceive undergo subtle but geometrically consistent changes as we rotate our heads. In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources. We learn to solve these tasks solely through self-supervision. A visual model predicts camera rotation from a pair of ima… ▽ More

    Submitted 21 August, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: ICCV 2023. Project site: https://ificl.github.io/SLfM/

  6. arXiv:2204.09229  [pdf, other

    eess.SY math.OC

    Estimating probabilistic dynamic origin-destination demands using multi-day traffic data on computational graphs

    Authors: Wei Ma, Sean Qian

    Abstract: System-level decision making in transportation needs to understand day-to-day variation of network flows, which calls for accurate modeling and estimation of probabilistic dynamic travel demand on networks. Most existing studies estimate deterministic dynamic origin-destination (OD) demand, while the day-to-day variation of demand and flow is overlooked. Estimating probabilistic distributions of d… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: Submitted to Transportation Science

  7. arXiv:2107.11222  [pdf

    cs.SD eess.AS eess.SP

    Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model

    Authors: Quandong Wang, Junnan Wu, Zhao Yan, Sichong Qian, Liyong Guo, Lichun Fan, Weiji Zhuang, Peng Gao, Yujun Wang

    Abstract: We propose a multi-channel speech enhancement approach with a novel two-stage feature fusion method and a pre-trained acoustic model in a multi-task learning paradigm. In the first fusion stage, the time-domain and frequency-domain features are extracted separately. In the time domain, the multi-channel convolution sum (MCS) and the inter-channel convolution differences (ICDs) features are compute… ▽ More

    Submitted 24 September, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

    Comments: 7 pages, 3 figures, accepted to APSIPA 2021, revised

  8. arXiv:2005.13522  [pdf, other

    eess.SP cs.LG stat.ML

    Learning to Recommend Signal Plans under Incidents with Real-Time Traffic Prediction

    Authors: Weiran Yao, Sean Qian

    Abstract: The main question to address in this paper is to recommend optimal signal timing plans in real time under incidents by incorporating domain knowledge developed with the traffic signal timing plans tuned for possible incidents, and learning from historical data of both traffic and implemented signals timing. The effectiveness of traffic incident management is often limited by the late response time… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: To be published in Transportation Research Record (2020)

  9. arXiv:1910.02376  [pdf, other

    eess.SP cs.CY cs.LG

    High-Resolution Traffic Sensing with Autonomous Vehicles

    Authors: Wei Ma, Sean Qian

    Abstract: The last decades have witnessed the breakthrough of autonomous vehicles (AVs), and the perception capabilities of AVs have been dramatically improved. Various sensors installed on AVs, including, but are not limited to, LiDAR, radar, camera and stereovision, will be collecting massive data and perceiving the surrounding traffic states continuously. In fact, a fleet of AVs can serve as floating (or… ▽ More

    Submitted 6 October, 2019; originally announced October 2019.

    Comments: submitted to Transportation Research Part C: Emerging Technologies

  10. arXiv:1905.05386  [pdf, other

    physics.soc-ph eess.SY

    Measuring and reducing the disequilibrium levels of dynamic networks through ride-sourcing vehicle data

    Authors: Wei Ma, Sean Qian

    Abstract: Transportation systems are being reshaped by ride-sourcing and shared mobility services in recent years. The transportation network companies (TNCs) have been collecting high-granular ride-sourcing vehicle (RV) trajectory data over the past decade, while it is still unclear how the RV data can improve current dynamic network modeling for network traffic management. This paper proposes to statistic… ▽ More

    Submitted 7 November, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: 34 pages, 17 figures, published in Transportation Research Part C: Emerging Technologies

  11. arXiv:1903.04681  [pdf, other

    eess.SY cs.LG

    Estimating multi-class dynamic origin-destination demand through a forward-backward algorithm on computational graphs

    Authors: Wei Ma, Xidong Pi, Sean Qian

    Abstract: Transportation networks are unprecedentedly complex with heterogeneous vehicular flow. Conventionally, vehicle classes are considered by vehicle classifications (such as standard passenger cars and trucks). However, vehicle flow heterogeneity stems from many other aspects in general, e.g., ride-sourcing vehicles versus personal vehicles, human driven vehicles versus connected and automated vehicle… ▽ More

    Submitted 11 March, 2019; originally announced March 2019.

    Comments: 31 pages, 21 figures, submitted to Transportation Research Part C: Emerging Technologies