Skip to main content

Showing 1–47 of 47 results for author: Yao, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05790  [pdf, ps, other

    eess.SP

    Integrated Sensing and Communication for Anti-Jamming with OAM

    Authors: Li** Liang, Wenchi Cheng, Wei Zhang, Zhuohui Yao

    Abstract: The spectrum share and open nature of wireless channels enable integrated sensing and communication (ISAC) susceptible to hostile jamming attacks. Due to the intrinsic orthogonality and rich azimuth angle information of orbital angular momentum (OAM), vortex electromagnetic waves with helical phase fronts have shown great potential to achieve high-resolution imaging and strong anti-jamming capabil… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  2. arXiv:2404.17400  [pdf, other

    cs.CV cs.AI eess.IV

    Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement

    Authors: Zishu Yao, Guodong Fan, **fu Fan, Min Gan, C. L. Philip Chen

    Abstract: Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range corre… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 14 page

  3. GNSS Measurement-Based Context Recognition for Vehicle Navigation using Gated Recurrent Unit

    Authors: Sheng Liu, Zhiqiang Yao, Xuemeng Cao, Xiaowen Cai

    Abstract: Recent years, people have put forward higher and higher requirements for context-adaptive navigation (CAN). CAN system realizes seamless navigation in complex environments by recognizing the ambient surroundings of vehicles, and it is crucial to develop a fast, reliable, and robust navigational context recognition (NCR) method to enable CAN systems to operate effectively. Environmental context rec… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 9 pages, 9 figures, 5 tables

    Journal ref: Proceedings of the 36th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2023)

  4. arXiv:2404.09729  [pdf

    eess.SP cs.IT cs.LG stat.ME

    Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis

    Authors: Shuaicong Hu, Yanan Wang, Jian Liu, **gyu Lin, Shengmei Qin, Zhenning Nie, Zhifeng Yao, Wenjie Cai, Cuiwei Yang

    Abstract: Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG mor… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 16 pages, 12 figures

    ACM Class: I.5.2

  5. arXiv:2403.08170  [pdf, other

    cs.CV eess.IV

    Versatile Defense Against Adversarial Attacks on Image Recognition

    Authors: Haibo Zhang, Zhihua Yao, Kouichi Sakurai

    Abstract: Adversarial attacks present a significant security risk to image recognition tasks. Defending against these attacks in a real-life setting can be compared to the way antivirus software works, with a key consideration being how well the defense can adapt to new and evolving attacks. Another important factor is the resources involved in terms of time and cost for training defense models and updating… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  6. arXiv:2402.00313  [pdf, other

    cs.LG eess.SY

    Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach

    Authors: Zhiyuan Yao, Ionut Florescu, Chihoon Lee

    Abstract: In this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with dete… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

    Comments: Under Review

  7. arXiv:2401.09508  [pdf, other

    eess.IV physics.data-an

    4D-ONIX: A deep learning approach for reconstructing 3D movies from sparse X-ray projections

    Authors: Yuhe Zhang, Zisheng Yao, Robert Klöfkorn, Tobias Ritschel, Pablo Villanueva-Perez

    Abstract: The X-ray flux provided by X-ray free-electron lasers and storage rings offers new spatiotemporal possibilities to study in-situ and operando dynamics, even using single pulses of such facilities. X-ray Multi-Projection Imaging (XMPI) is a novel technique that enables volumetric information using single pulses of such facilities and avoids centrifugal forces induced by state-of-the-art time-resolv… ▽ More

    Submitted 2 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  8. arXiv:2401.05425  [pdf, other

    eess.SP cs.LG

    An Unobtrusive and Lightweight Ear-worn System for Continuous Epileptic Seizure Detection

    Authors: Abdul Aziz, Nhat Pham, Neel Vora, Cody Reynolds, Jaime Lehnen, Pooja Venkatesh, Zhuoran Yao, Jay Harvey, Tam Vu, Kan Ding, Phuc Nguyen

    Abstract: Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scal… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  9. arXiv:2311.16149  [pdf, other

    physics.ins-det eess.IV physics.optics

    Development towards high-resolution kHz-speed rotation-free volumetric imaging

    Authors: Eleni Myrto Asimakopoulou, Valerio Bellucci, Sarlota Birnsteinova, Zisheng Yao, Yuhe Zhang, Ilia Petrov, Carsten Deiter, Andrea Mazzolari, Marco Romagnoni, Dusan Korytar, Zdenko Zaprazny, Zuzana Kuglerova, Libor Juha, Bratislav Lukic, Alexander Rack, Liubov Samoylova, Francisco Garcia Moreno, Stephen A Hall, Tillmann Neu, Xiaoyu Liang, Patrik Vagovic, Pablo Villanueva-Perez

    Abstract: X-ray multi-projection imaging (XMPI) provides rotation-free 3D movies of optically opaque samples. The absence of rotation enables superior imaging speed and preserves fragile sample dynamics by avoiding the shear forces introduced by conventional rotary tomography. Here, we present our XMPI observations at the ID19 beamline (ESRF, France) of 3D dynamics in melted aluminum with 1000 frames per se… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: 12 pages, 7 figures

    Journal ref: Opt. Express 32, (2024), 4413-4426

  10. arXiv:2311.03062  [pdf

    physics.optics cs.LG eess.SP

    Imaging through multimode fibres with physical prior

    Authors: Chuncheng Zhang, Yingjie Shi, Zheyi Yao, Xiubao Sui, Qian Chen

    Abstract: Imaging through perturbed multimode fibres based on deep learning has been widely researched. However, existing methods mainly use target-speckle pairs in different configurations. It is challenging to reconstruct targets without trained networks. In this paper, we propose a physics-assisted, unsupervised, learning-based fibre imaging scheme. The role of the physical prior is to simplify the mappi… ▽ More

    Submitted 13 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

  11. arXiv:2311.02818  [pdf, other

    cs.LG eess.SP

    Signal Processing Meets SGD: From Momentum to Filter

    Authors: Zhipeng Yao, Guiyuan Fu, Ying Li, Yu Zhang, Dazhou Li, Rui Yu

    Abstract: In deep learning, stochastic gradient descent (SGD) and its momentum-based variants are widely used for optimization, but they typically suffer from slow convergence. Conversely, existing adaptive learning rate optimizers speed up convergence but often compromise generalization. To resolve this issue, we propose a novel optimization method designed to accelerate SGD's convergence without sacrifici… ▽ More

    Submitted 24 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

  12. arXiv:2310.11230  [pdf, other

    eess.AS cs.LG cs.SD

    Zipformer: A faster and better encoder for automatic speech recognition

    Authors: Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui **, Long Lin, Daniel Povey

    Abstract: The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame… ▽ More

    Submitted 9 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Published as a conference paper at ICLR 2024

  13. arXiv:2309.08105  [pdf, other

    eess.AS cs.SD

    Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

    Authors: Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey

    Abstract: In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest freely-available corpus of speech with supervisions. Different from other open-sourced datasets that only provide normalized transcriptions, Libriheavy contains richer information such as punctuation, casin… ▽ More

    Submitted 14 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  14. arXiv:2309.07414  [pdf, other

    eess.AS cs.CL cs.SD

    PromptASR for contextualized ASR with controllable style

    Authors: Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey

    Abstract: Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR) systems to achieve contextualized ASR with controllable style of transcriptions. Specifically, a dedicated text encoder encodes the text prompts and t… ▽ More

    Submitted 24 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Proc. ICASSP 2024

  15. arXiv:2306.06284  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Everybody Compose: Deep Beats To Music

    Authors: Conghao Shen, Violet Z. Yao, Yixin Liu

    Abstract: This project presents a deep learning approach to generate monophonic melodies based on input beats, allowing even amateurs to create their own music compositions. Three effective methods - LSTM with Full Attention, LSTM with Local Attention, and Transformer with Relative Position Representation - are proposed for this novel task, providing great variation, harmony, and structure in the generated… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: Accepted MMSys '23

    Journal ref: Proceedings of the 14th Conference on ACM Multimedia Systems (2023)

  16. arXiv:2305.11920  [pdf, other

    eess.IV physics.optics

    Megahertz X-ray Multi-projection imaging

    Authors: Pablo Villanueva-Perez, Valerio Bellucci, Yuhe Zhang, Sarlota Birnsteinova, Rita Graceffa, Luigi Adriano, Eleni Myrto Asimakopoulou, Ilia Petrov, Zisheng Yao, Marco Romagnoni, Andrea Mazzolari, Romain Letrun, Chan Kim, Jayanath C. P. Koliyadu, Carsten Deiter, Richard Bean, Gabriele Giovanetti, Luca Gelisio, Tobias Ritschel, Adrian Mancuso, Henry N. Chapman, Alke Meents, Tokushi Sato, Patrik Vagovic

    Abstract: X-ray time-resolved tomography is one of the most popular X-ray techniques to probe dynamics in three dimensions (3D). Recent developments in time-resolved tomography opened the possibility of recording kilohertz-rate 3D movies. However, tomography requires rotating the sample with respect to the X-ray beam, which prevents characterization of faster structural dynamics. Here, we present megahertz… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  17. arXiv:2305.11558  [pdf, other

    eess.AS cs.CL

    Blank-regularized CTC for Frame Skip** in Neural Transducer

    Authors: Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey

    Abstract: Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems. Due to their frame-synchronous design, blank symbols are introduced to address the length mismatch between acoustic frames and output tokens, which might bring redundant computation. Previous studies managed to accelerate the training and inference of neural Transducers by… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted in INTERSPEECH 2023

  18. arXiv:2305.11539  [pdf, other

    eess.AS

    Delay-penalized CTC implemented based on Finite State Transducer

    Authors: Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey

    Abstract: Connectionist Temporal Classification (CTC) suffers from the latency problem when applied to streaming models. We argue that in CTC lattice, the alignments that can access more future context are preferred during training, thereby leading to higher symbol delay. In this work we propose the delay-penalized CTC which is augmented with latency penalty regularization. We devise a flexible and efficien… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted in INTERSPEECH 2023

  19. arXiv:2304.11316  [pdf, other

    physics.optics eess.IV

    Iterative fluctuation ghost imaging

    Authors: Huan Zhao, Xiao-Qian Wang, Chao Gao, Zhuo Yu, Hong Wang, Yu Wang, Li-Dan Gou, Zhi-Hai Yao

    Abstract: We present a new technique, iterative fluctuation ghost imaging (IFGI) which dramatically enhances the resolution of ghost imaging (GI). It is shown that, by the fluctuation characteristics of the second-order correlation function, the imaging information with the narrower point spread function (PSF) than the original information can be got. The effects arising from the PSF and the iteration times… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

  20. arXiv:2212.03830  [pdf, other

    cs.AI eess.SY

    A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

    Authors: Jiajun Chai, Wenzhang Chen, Yuanheng Zhu, Zong-xin Yao, Dongbin Zhao

    Abstract: Unmanned combat air vehicle (UCAV) combat is a challenging scenario with continuous action space. In this paper, we propose a general hierarchical framework to resolve the within-vision-range (WVR) air-to-air combat problem under 6 dimensions of degree (6-DOF) dynamics. The core idea is to divide the whole decision process into two loops and use reinforcement learning (RL) to solve them separately… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  21. arXiv:2211.13443  [pdf, other

    cs.SD eess.AS

    TESSP: Text-Enhanced Self-Supervised Speech Pre-training

    Authors: Zhuoyuan Yao, Shuo Ren, Sanyuan Chen, Ziyang Ma, Pengcheng Guo, Lei Xie

    Abstract: Self-supervised speech pre-training empowers the model with the contextual structure inherent in the speech signal while self-supervised text pre-training empowers the model with linguistic information. Both of them are beneficial for downstream speech tasks such as ASR. However, the distinct pre-training objectives make it challenging to jointly optimize the speech and text representation in the… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: 9 pages, 4 figures

  22. arXiv:2211.13005  [pdf, other

    eess.SP cs.LG

    A CNN-Transformer Deep Learning Model for Real-time Sleep Stage Classification in an Energy-Constrained Wireless Device

    Authors: Zongyan Yao, Xilin Liu

    Abstract: This paper proposes a deep learning (DL) model for automatic sleep stage classification based on single-channel EEG data. The DL model features a convolutional neural network (CNN) and transformers. The model was designed to run on energy and memory-constrained devices for real-time operation with local processing. The Fpz-Cz EEG signals from a publicly available Sleep-EDF dataset are used to trai… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

  23. arXiv:2211.00508  [pdf, other

    eess.AS cs.CL cs.SD

    Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation

    Authors: Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang, Long Lin, Mingshuang Luo, Piotr Zelasko, Daniel Povey

    Abstract: Knowledge distillation(KD) is a common approach to improve model performance in automatic speech recognition (ASR), where a student model is trained to imitate the output behaviour of a teacher model. However, traditional KD methods suffer from teacher label storage issue, especially when the training corpora are large. Although on-the-fly teacher label generation tackles this issue, the training… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2022

  24. arXiv:2211.00490  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Delay-penalized transducer for low-latency streaming ASR

    Authors: Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long lin, Piotr Żelasko, Daniel Povey

    Abstract: In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy. Although a few existing methods are able to achieve this goal, they are difficult to implement due to their dependency on external alignments. In this paper, we propose a simple way to penalize symbol delay in transducer model, so that we can b… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

  25. arXiv:2211.00484  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Fast and parallel decoding for transducer

    Authors: Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey

    Abstract: The transducer architecture is becoming increasingly popular in the field of speech recognition, because it is naturally streaming as well as high in accuracy. One of the drawbacks of transducer is that it is difficult to decode in a fast and parallel way due to an unconstrained number of symbols that can be emitted per time step. In this work, we introduce a constrained version of transducer loss… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

  26. arXiv:2209.15329  [pdf, other

    cs.CL cs.AI eess.AS

    SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

    Authors: Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Lirong Dai, **yu Li, Furu Wei

    Abstract: How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discret… ▽ More

    Submitted 15 June, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: We have corrected the errors in the pre-training data for SpeechLM-P Base models, new results are updated

  27. arXiv:2207.13334  [pdf

    physics.optics eess.IV

    Fast optical refocusing through multimode fiber bend using Cake-Cutting Hadamard encoding algorithm to improve robustness

    Authors: Chuncheng Zhang, Zheyi Yao, Zhengyue Qin, Guohua Gu, Qian Chen, Zhihua Xie, Guodong Liu, Xiubao Sui

    Abstract: Multimode fibres offer the advantages of high resolution and miniaturization over single mode fibers in the field of optical imaging. However, multimode fibre's imaging is susceptible to perturbations of MMF that can lead to secondary spatial distortions in the transmitted image. Perturbations include random disturbances in the fiber as well as environmental noise. Here, we exploit the fast focusi… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

  28. arXiv:2206.13236  [pdf, other

    eess.AS cs.AI cs.LG

    Pruned RNN-T for fast, memory-efficient ASR training

    Authors: Fangjun Kuang, Liyong Guo, Wei Kang, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey

    Abstract: The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that its loss function is relatively slow to compute, and can use a lot of memory. Excessive GPU memory usage can make it impractical to use RNN-T loss in… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  29. arXiv:2206.01451  [pdf, other

    cs.AI cs.DC eess.SY

    Learning Distributed and Fair Policies for Network Load Balancing as Markov Potential Game

    Authors: Zhiyuan Yao, Zihan Ding

    Abstract: This paper investigates the network load balancing problem in data centers (DCs) where multiple load balancers (LBs) are deployed, using the multi-agent reinforcement learning (MARL) framework. The challenges of this problem consist of the heterogeneous processing architecture and dynamic environments, as well as limited and partial observability of each LB agent in distributed networking systems,… ▽ More

    Submitted 14 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

  30. arXiv:2203.15455  [pdf, other

    cs.SD cs.CL eess.AS

    WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

    Authors: Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fu** Pan, Jianwei Niu

    Abstract: Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) W… ▽ More

    Submitted 5 July, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  31. arXiv:2203.00682  [pdf, other

    eess.IV physics.data-an physics.med-ph physics.optics

    ONIX: an X-ray deep-learning tool for 3D reconstructions from sparse views

    Authors: Yuhe Zhang, Zisheng Yao, Tobias Ritschel, Pablo Villanueva-Perez

    Abstract: Three-dimensional (3D) X-ray imaging techniques like tomography and confocal microscopy are crucial for academic and industrial applications. These approaches access 3D information by scanning the sample with respect to the X-ray source. However, the scanning process limits the temporal resolution when studying dynamics and is not feasible for some applications, such as surgical guidance in medica… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

  32. arXiv:2110.04791  [pdf, other

    eess.AS cs.LG cs.SD

    Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain

    Authors: Zengwei Yao, Wenjie Pei, Fanglin Chen, Guangming Lu, David Zhang

    Abstract: The crux of single-channel speech separation is how to encode the mixture of signals into such a latent embedding space that the signals from different speakers can be precisely separated. Existing methods for speech separation either transform the speech signals into frequency domain to perform separation or seek to learn a separable embedding space by constructing a latent domain based on convol… ▽ More

    Submitted 31 January, 2022; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  33. arXiv:2109.00374  [pdf, other

    eess.IV cs.CV

    ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection

    Authors: Zeyang Yao, Jiawei Zhang, Hailong Qiu, Tianchen Wang, Yiyu Shi, Jian Zhuang, Yuhao Dong, Mei** Huang, Xiaowei Xu

    Abstract: Type-B Aortic Dissection (TBAD) is one of the most serious cardiovascular events characterized by a growing yearly incidence,and the severity of disease prognosis. Currently, computed tomography angiography (CTA) has been widely adopted for the diagnosis and prognosis of TBAD. Accurate segmentation of true lumen (TL), false lumen (FL), and false lumen thrombus (FLT) in CTA are crucial for the prec… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

  34. A High-Performance, Reconfigurable, Fully Integrated Time-Domain Reflectometry Architecture Using Digital I/Os

    Authors: Zhenyu Xu, Thomas Mauldin, Zheyi Yao, Gerald Hefferman, Tao Wei

    Abstract: Time-domain reflectometry (TDR) is an established means of measuring impedance inhomogeneity of a variety of waveguides, providing critical data necessary to characterize and optimize the performance of high-bandwidth computational and communication systems. However, TDR systems with both the high spatial resolution (sub-cm) and voltage resolution (sub-$\muV$) required to evaluate high-performance… ▽ More

    Submitted 1 May, 2021; originally announced May 2021.

    Comments: 8 pages, 8 figures

    Journal ref: February 2021, IEEE Transactions on Instrumentation and Measurement PP(99):1-1

  35. arXiv:2104.04702  [pdf, other

    cs.SD eess.AS

    Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR

    Authors: Fan Yu, Haoneng Luo, Pengcheng Guo, Yuhao Liang, Zhuoyuan Yao, Lei Xie, Yingying Gao, Lei**g Hou, Shilei Zhang

    Abstract: Continuous integrate-and-fire (CIF) based models, which use a soft and monotonic alignment mechanism, have been well applied in non-autoregressive (NAR) speech recognition with competitive performance compared with other NAR methods. However, such an alignment learning strategy may suffer from an erroneous acoustic boundary estimation, severely hindering the convergence speed as well as the system… ▽ More

    Submitted 26 September, 2021; v1 submitted 10 April, 2021; originally announced April 2021.

    Comments: 5 pages,4 figures

  36. arXiv:2103.16827  [pdf, other

    eess.AS cs.CL cs.SD

    Integer-only Zero-shot Quantization for Efficient Speech Recognition

    Authors: Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer

    Abstract: End-to-end neural network models achieve improved performance on various automatic speech recognition (ASR) tasks. However, these models perform poorly on edge hardware due to large memory and computation requirements. While quantizing model weights and/or activations to low-precision can be a promising solution, previous research on quantizing ASR models is limited. In particular, the previous ap… ▽ More

    Submitted 30 January, 2022; v1 submitted 31 March, 2021; originally announced March 2021.

    Journal ref: ICASSP 2022

  37. arXiv:2102.01547  [pdf, other

    cs.SD cs.CL eess.AS

    WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit

    Authors: Zhuoyuan Yao, Di Wu, Xiong Wang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei

    Abstract: In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model. The main motivation of WeNet is to close the gap between the research and the production of E2E speechrecognition models. WeNet provides an… ▽ More

    Submitted 29 December, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, 4 tables

  38. New Closed-form Joint Localization and Synchronization using Sequential One-way TOAs

    Authors: Ningyan Guo, Sihao Zhao, Xiao-** Zhang, Zheng Yao, Xiaowei Cui, Mingquan Lu

    Abstract: It is an essential technique for the moving user nodes (UNs) with clock offset and clock skew to resolve the joint localization and synchronization (JLAS) problem. Existing iterative maximum likelihood methods using sequential one-way time-of-arrival (TOA) measurements from the anchor nodes' (AN) broadcast signals require a good initial guess and have a computational complexity that grows with the… ▽ More

    Submitted 14 April, 2022; v1 submitted 29 January, 2021; originally announced February 2021.

  39. Robotic Knee Tracking Control to Mimic the Intact Human Knee Profile Based on Actor-critic Reinforcement Learning

    Authors: Ruofan Wu, Zhikai Yao, Jennie Si, He, Huang

    Abstract: We address a state-of-the-art reinforcement learning (RL) control approach to automatically configure robotic prosthesis impedance parameters to enable end-to-end, continuous locomotion intended for transfemoral amputee subjects. Specifically, our actor-critic based RL provides tracking control of a robotic knee prosthesis to mimic the intact knee profile. This is a significant advance from our pr… ▽ More

    Submitted 22 January, 2021; originally announced January 2021.

  40. arXiv:2101.03487  [pdf, other

    cs.RO eess.SY

    Reinforcement Learning Enabled Automatic Impedance Control of a Robotic Knee Prosthesis to Mimic the Intact Knee Motion in a Co-Adapting Environment

    Authors: Ruofan Wu, Minhan Li, Zhikai Yao, Jennie Si, He, Huang

    Abstract: Automatically configuring a robotic prosthesis to fit its user's needs and physical conditions is a great technical challenge and a roadblock to the adoption of the technology. Previously, we have successfully developed reinforcement learning (RL) solutions toward addressing this issue. Yet, our designs were based on using a subjectively prescribed target motion profile for the robotic knee during… ▽ More

    Submitted 10 January, 2021; originally announced January 2021.

  41. arXiv:2101.00068  [pdf, other

    eess.SY

    Toward Reliable Designs of Data-Driven Reinforcement Learning Tracking Control for Euler-Lagrange Systems

    Authors: Zhikai Yao, Jennie Si, Ruofan Wu, Jianyong Yao

    Abstract: This paper addresses reinforcement learning based, direct signal tracking control with an objective of develo** mathematically suitable and practically useful design approaches. Specifically, we aim to provide reliable and easy to implement designs in order to reach reproducible neural network-based solutions. Our proposed new design takes advantage of two control design frameworks: a reinforcem… ▽ More

    Submitted 30 March, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

  42. arXiv:2012.07237  [pdf

    eess.IV cs.CV cs.LG

    Accurate Cell Segmentation in Digital Pathology Images via Attention Enforced Networks

    Authors: Muyi Sun, Zeyi Yao, Guanhong Zhang

    Abstract: Automatic cell segmentation is an essential step in the pipeline of computer-aided diagnosis (CAD), such as the detection and grading of breast cancer. Accurate segmentation of cells can not only assist the pathologists to make a more precise diagnosis, but also save much time and labor. However, this task suffers from stain variation, cell inhomogeneous intensities, background clutters and cells… ▽ More

    Submitted 27 December, 2020; v1 submitted 13 December, 2020; originally announced December 2020.

    Comments: 6 pages. Accepted by ICPR2020 in the first round

  43. arXiv:2012.05481  [pdf, other

    cs.SD cs.CL eess.AS

    Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

    Authors: Binbin Zhang, Di Wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei

    Abstract: In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model. Our model adopts the hybrid CTC/attention architecture, in which the conformer layers in the encoder are modified. We propose a dynamic chunk-based attention strategy to allow arbitrary right context length. At inference time, the CTC decoder generates n-b… ▽ More

    Submitted 29 December, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

  44. arXiv:2011.08469  [pdf, other

    cs.SD cs.CL eess.AS

    Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter

    Authors: Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie

    Abstract: End-to-end models are favored in automatic speech recognition (ASR) because of its simplified system structure and superior performance. Among these models, recurrent neural network transducer (RNN-T) has achieved significant progress in streaming on-device speech recognition because of its high-accuracy and low-latency. RNN-T adopts a prediction network to enhance language information, but its la… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: 7 pages, 3 figures, 5 tables

  45. arXiv:2011.06724  [pdf, other

    cs.SD eess.AS

    The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

    Authors: Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao

    Abstract: Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data. However improving robustness, including achieving equally good performance on diverse speakers and accents, is still a challenging problem. In particular, the performance of children speech recognition (CSR) still lags behind due to 1) the speech and language characteristics of children's… ▽ More

    Submitted 16 November, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: 7 pages, 3 figures, 3 tables

  46. arXiv:2011.02198  [pdf, other

    cs.SD eess.AS

    IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

    Authors: Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez

    Abstract: The IEEE Spoken Language Technology Workshop (SLT) 2021 Alpha-mini Speech Challenge (ASC) is intended to improve research on keyword spotting (KWS) and sound source location (SSL) on humanoid robots. Many publications report significant improvements in deep learning based KWS and SSL on open source datasets in recent years. For deep learning model training, it is necessary to expand the data cover… ▽ More

    Submitted 14 November, 2020; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted at IEEE SLT 2021

  47. A Novel Posture Positioning Method for Multi-Joint Manipulators

    Authors: Zhi-Qiang Yao, Yi-Jue Dai, Qing-Na Li, Dang Xie, Ze-Hui Liu

    Abstract: Safety and automatic control are extremely important when operating manipulators. For large engineering manipulators, the main challenge is to accurately recognize the posture of all arm segments. In classical sensing methods, the accuracy of an inclinometer is easily affected by the elastic deformation in the manipulator's arms. This results in big error accumulations when sensing the angle of jo… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: 7 pages, 8 figures

    Journal ref: IEEE Sensors Journal, 2020