Skip to main content

Showing 1–16 of 16 results for author: You, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18301  [pdf, other

    eess.AS cs.CL cs.SD

    MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

    Authors: Song Li, Yongbin You, Xuezhi Wang, Zhengkun Tian, Ke Ding, Guanglu Wan

    Abstract: Recently, multilingual artificial intelligence assistants, exemplified by ChatGPT, have gained immense popularity. As a crucial gateway to human-computer interaction, multilingual automatic speech recognition (ASR) has also garnered significant attention, as evidenced by systems like Whisper. However, the proprietary nature of the training data has impeded researchers' efforts to study multilingua… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  2. arXiv:2402.17043  [pdf, other

    eess.SY

    Traffic Control via Connected and Automated Vehicles: An Open-Road Field Experiment with 100 CAVs

    Authors: Jonathan W. Lee, Han Wang, Kathy Jang, Amaury Hayat, Matthew Bunting, Arwa Alanqary, William Barbour, Zhe Fu, Xiaoqian Gong, George Gunter, Sharon Hornstein, Abdul Rahman Kreidieh, Nathan Lichtlé, Matthew W. Nice, William A. Richardson, Adit Shah, Eugene Vinitsky, Fangyu Wu, Shengquan Xiang, Sulaiman Almatrudi, Fahd Althukair, Rahul Bhadani, Joy Carpio, Raphael Chekroun, Eric Cheng , et al. (39 additional authors not shown)

    Abstract: The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experim… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  3. arXiv:2310.11641  [pdf

    eess.IV cs.AI physics.med-ph

    Cloud-Magnetic Resonance Imaging System: In the Era of 6G and Artificial Intelligence

    Authors: Yirong Zhou, Yanhuang Wu, Yuhan Su, **g Li, Jianyun Cai, Yongfu You, Di Guo, Xiaobo Qu

    Abstract: Magnetic Resonance Imaging (MRI) plays an important role in medical diagnosis, generating petabytes of image data annually in large hospitals. This voluminous data stream requires a significant amount of network bandwidth and extensive storage infrastructure. Additionally, local data processing demands substantial manpower and hardware investments. Data isolation across different healthcare instit… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 4pages, 5figures, letters

  4. arXiv:2309.09443  [pdf, other

    eess.AS cs.CL cs.SD

    Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter

    Authors: Song Li, Yongbin You, Xuezhi Wang, Ke Ding, Guanglu Wan

    Abstract: Multilingual intelligent assistants, such as ChatGPT, have recently gained popularity. To further expand the applications of multilingual artificial intelligence assistants and facilitate international communication, it is essential to enhance the performance of multilingual speech recognition, which is a crucial component of speech interaction. In this paper, we propose two simple and parameter-e… ▽ More

    Submitted 19 September, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP2024

  5. arXiv:2308.16836  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

    Authors: Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Based on the main architecture of recently proposed VISinger, we put forward several specific designs for expressive singing voice synthesis. First, dif… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

  6. arXiv:2211.08530  [pdf, ps, other

    eess.SY

    Cyber-Attack Event Analysis for EV Charging Stations

    Authors: Mansi Girdhar, Junho Hong, Yongsik You, Tai-** Song, Manimaran Govindarasu

    Abstract: Safe and secure electric vehicle charging stations (EVCSs) are important in smart transportation infrastructure. The prevalence of EVCSs has rapidly increased over time in response to the rising demand for EV charging. However, developments in information and communication technologies (ICT) have made the cyber-physical system (CPS) of EVCSs susceptible to cyber-attacks, which might destabilize th… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 5 Pages, 2 Figures, 2 Tables, 10 Mathematical Equations, PES GM Conference Paper

  7. arXiv:2112.14888  [pdf, other

    math.OC cs.GT eess.SY

    Parallel Network Flow Allocation in Repeated Routing Games via LQR Optimal Control

    Authors: Marsalis Gibson, Yiling You, Alexandre Bayen

    Abstract: In this article, we study the repeated routing game problem on a parallel network with affine latency functions on each edge. We cast the game setup in a LQR control theoretic framework, leveraging the Rosenthal potential formulation. We use control techniques to analyze the convergence of the game dynamics with specific cases that lend themselves to optimal control. We design proper dynamics para… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 23 pages, 9 figures, TRB submission

  8. arXiv:2104.08824  [pdf

    eess.IV

    XCloud-pFISTA: A Medical Intelligence Cloud for Accelerated MRI

    Authors: Yirong Zhou, Chen Qian, Yi Guo, Zi Wang, Jian Wang, Biao Qu, Di Guo, Yongfu You, Xiaobo Qu

    Abstract: Machine learning and artificial intelligence have shown remarkable performance in accelerated magnetic resonance imaging (MRI). Cloud computing technologies have great advantages in building an easily accessible platform to deploy advanced algorithms. In this work, we develop an open-access, easy-to-use and high-performance medical intelligence cloud computing platform (XCloud-pFISTA) to reconstru… ▽ More

    Submitted 10 June, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

  9. arXiv:2104.02583  [pdf, other

    eess.SY

    Limitations and Improvements of the Intelligent Driver Model (IDM)

    Authors: Saleh Albeaik, Alexandre Bayen, Maria Teresa Chiri, Xiaoqian Gong, Amaury Hayat, Nicolas Kardous, Alexander Keimer, Sean T. McQuade, Benedetto Piccoli, Yiling You

    Abstract: This contribution analyzes the widely used and well-known "intelligent driver model (briefly IDM), which is a second order car-following model governed by a system of ordinary differential equations. Although this model was intensively studied in recent years for properly capturing traffic phenomena and driver braking behavior, a rigorous study of the well-posedness has, to our knowledge, never be… ▽ More

    Submitted 1 April, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: 28 pages, 20 Figures

    MSC Class: 34A12; 34A38; 65L05; 65L08

  10. arXiv:2012.14830  [pdf

    cs.LG eess.IV physics.bio-ph physics.med-ph

    A Sparse Model-inspired Deep Thresholding Network for Exponential Signal Reconstruction -- Application in Fast Biological Spectroscopy

    Authors: Zi Wang, Di Guo, Zhangren Tu, Yihui Huang, Yirong Zhou, Jian Wang, Liubin Feng, Donghai Lin, Yongfu You, Tatiana Agback, Vladislav Orekhov, Xiaobo Qu

    Abstract: The non-uniform sampling is a powerful approach to enable fast acquisition but requires sophisticated reconstruction algorithms. Faithful reconstruction from partial sampled exponentials is highly expected in general signal processing and many applications. Deep learning has shown astonishing potential in this field but many existing problems, such as lack of robustness and explainability, greatly… ▽ More

    Submitted 17 January, 2022; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: 30 pages

  11. arXiv:2011.01576  [pdf, other

    eess.AS cs.SD

    Improving RNN transducer with normalized jointer network

    Authors: Mingkun Huang, Jun Zhang, Meng Cai, Yang Zhang, Jiali Yao, Yongbin You, Yi He, Zejun Ma

    Abstract: Recurrent neural transducer (RNN-T) is a promising end-to-end (E2E) model in automatic speech recognition (ASR). It has shown superior performance compared to traditional hybrid ASR systems. However, training RNN-T from scratch is still challenging. We observe a huge gradient variance during RNN-T training and suspect it hurts the performance. In this work, we analyze the cause of the huge gradien… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

  12. arXiv:2011.01570  [pdf, other

    eess.AS cs.SD

    Dynamic latency speech recognition with asynchronous revision

    Authors: Mingkun Huang, Meng Cai, Jun Zhang, Yang Zhang, Yongbin You, Yi He, Zejun Ma

    Abstract: In this work we propose an inference technique, asynchronous revision, to unify streaming and non-streaming speech recognition models. Specifically, we achieve dynamic latency with only one model by using arbitrary right context during inference. The model is composed of a stack of convolutional layers for audio encoding. In inference stage, the history states of encoder and decoder can be asynchr… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

  13. arXiv:2009.08973  [pdf, other

    cs.LG cs.AI cs.RO eess.SY stat.ML

    GRAC: Self-Guided and Self-Regularized Actor-Critic

    Authors: Lin Shao, Yifan You, Mengyuan Yan, Qingyun Sun, Jeannette Bohg

    Abstract: Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main c… ▽ More

    Submitted 10 November, 2020; v1 submitted 18 September, 2020; originally announced September 2020.

  14. arXiv:2004.03080  [pdf, other

    cs.CV eess.IV

    End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

    Authors: Rui Qian, Divyansh Garg, Yan Wang, Yurong You, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger, Wei-Lun Chao

    Abstract: Reliable and accurate 3D object detection is a necessity for safe autonomous driving. Although LiDAR sensors can provide accurate 3D point cloud estimates of the environment, they are also prohibitively expensive for many settings. Recently, the introduction of pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stere… ▽ More

    Submitted 14 May, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to 2020 Conference on Computer Vision and Pattern Recognition (CVPR 2020)

  15. arXiv:2002.09693  [pdf, other

    cs.LG cs.IR eess.IV

    Interpretable Crowd Flow Prediction with Spatial-Temporal Self-Attention

    Authors: Haoxing Lin, Weijia Jia, Yongjian You, Yi** Sun

    Abstract: Crowd flow prediction has been increasingly investigated in intelligent urban computing field as a fundamental component of urban management system. The most challenging part of predicting crowd flow is to measure the complicated spatial-temporal dependencies. A prevalent solution employed in current methods is to divide and conquer the spatial and temporal information by various architectures (e.… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

    Comments: 7pages

  16. arXiv:1903.04740  [pdf, other

    eess.SP

    Sphere Bounding Scheme for Probabilistic Robust Constructive Interference Precoding in MISO Downlink Transmission

    Authors: Yuning You, Gangming Lv

    Abstract: In this letter, we propose a sphere bounding scheme for probabilistic robust constructive interference (CI) power minimizing precoding, to address the imperfect channel state information (CSI) caused by the channel error (CE), which satisfies the known distribution in single-cell multiuser multipleinput single-output (MISO) downlink transmission. In the proposed scheme, we transform the probabilis… ▽ More

    Submitted 12 March, 2019; originally announced March 2019.

    Comments: 5 pages, 4 figures