Skip to main content

Showing 1–28 of 28 results for author: Lei, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.11091  [pdf, other

    cs.SD cs.CV eess.AS

    Multitask frame-level learning for few-shot sound event detection

    Authors: Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang

    Abstract: This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 6 pages, 4 figures, conference

  2. arXiv:2402.02775  [pdf

    physics.optics eess.IV physics.bio-ph

    Instant square lattice structured illumination microscopy: an optimal strategy towards photon-saving and real-time super-resolution observation

    Authors: Tianyu Zhao, Zhaojun Wang, Manming Shu, **gxiang Zhang, Yansheng Liang, Shaowei Wang, Ming Lei

    Abstract: Over the past decade, structured illumination microscopy (SIM) has found its niche in super-resolution (SR) microscopy due to its fast imaging speed and low excitation intensity. However, due to the significantly higher light dose compared to wide-field microscopy and the time-consuming post-processing procedures, long-term, real-time, super-resolution observation of living cells is still out of r… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  3. arXiv:2312.01423  [pdf, other

    eess.SP

    Self-Critical Alternate Learning based Semantic Broadcast Communication

    Authors: Zhilin Lu, Rongpeng Li, Ming Lei, Chan Wang, Zhifeng Zhao, Honggang Zhang

    Abstract: Semantic communication (SemCom) has been deemed as a promising communication paradigm to break through the bottleneck of traditional communications. Nonetheless, most of the existing works focus more on point-to-point communication scenarios and its extension to multi-user scenarios is not that straightforward due to its cost-inefficiencies to directly scale the JSCC framework to the multi-user co… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  4. arXiv:2311.08188  [pdf, ps, other

    cs.IT eess.SP

    Fast List Decoding of High-Rate Polar Codes

    Authors: Yang Lu, Ming-Min Zhao, Ming Lei, Min-Jian Zhao

    Abstract: Due to the ability to provide superior error-correction performance, the successive cancellation list (SCL) algorithm is widely regarded as one of the most promising decoding algorithms for polar codes with short-to-moderate code lengths. However, the application of SCL decoding in low-latency communication scenarios is limited due to its sequential nature. To reduce the decoding latency, developi… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 13 pages, 8 figures

  5. arXiv:2302.02587  [pdf, other

    eess.SP

    Joint Scattering Environment Sensing and Channel Estimation Based on Non-stationary Markov Random Field

    Authors: Wenkang Xu, Yongbo Xiao, An Liu, Ming Lei, Minjian Zhao

    Abstract: This paper considers an integrated sensing and communication system, where some radar targets also serve as communication scatterers. A location domain channel modeling method is proposed based on the position of targets and scatterers in the scattering environment, and the resulting radar and communication channels exhibit a two-dimensional (2-D) joint burst sparsity. We propose a joint scatterin… ▽ More

    Submitted 18 July, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: 15 pages, 13 figures, submitted to IEEE Transactions on Wireless Communications

  6. arXiv:2206.12281  [pdf

    eess.SP

    Real-time Dual-channel 2 * 2 MIMO Fiber-THz-Fiber Seamless Integration System at 385 GHz and 435 GHz

    Authors: Jiao Zhang, Min Zhu, Bingchang Hua, Mingzheng Lei, Yuancheng Cai, Liang Tian, Yucong Zou, Like Ma, Yongming Huang, Jianjun Yu, Xiaohu You

    Abstract: We demonstrate the first practical real-time dual-channel fiber-THz-fiber 2 * 2 MIMO seamless integration system with a record net data rate of 2 * 103.125 Gb/s at 385 GHz and 435 GHz over two spans of 20 km SSMF and 3 m wireless link.

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: This paper has been accepted by ECOC 2022

  7. arXiv:2204.12115  [pdf, ps, other

    cs.IT eess.SP

    Fast Successive-Cancellation Decoding of Polar Codes with Sequence Nodes

    Authors: Yang Lu, Ming-Min Zhao, Ming Lei, Min-Jian Zhao

    Abstract: Due to the sequential nature of the successive-cancellation (SC) algorithm, the decoding of polar codes suffers from significant decoding latencies. Fast SC decoding is able to speed up the SC decoding process, by implementing parallel decoders at the intermediate levels of the SC decoding tree for some special nodes with specific information and frozen bit patterns. To further improve the paralle… ▽ More

    Submitted 18 November, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

    Comments: 30 pages, 6 figures, submitted for possible journal publication

  8. arXiv:2202.07816  [pdf, other

    eess.AS cs.CL cs.SD

    ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

    Authors: Yi Ren, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen, Zhijie Yan, Zhou Zhao

    Abstract: Expressive text-to-speech (TTS) has become a hot research topic recently, mainly focusing on modeling prosody in speech. Prosody modeling has several challenges: 1) the extracted pitch used in previous prosody modeling works have inevitable errors, which hurts the prosody modeling; 2) different attributes of prosody (e.g., pitch, duration and energy) are dependent on each other and produce the nat… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  9. arXiv:2111.13694  [pdf, other

    cs.SD cs.LG eess.AS

    Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

    Authors: Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei

    Abstract: Overlap** speech diarization is always treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set. Specifically, we propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels according to the similarities between speech feat… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

    Comments: Submitted to ICASSP 2022, 5 pages, 2 figures

  10. FedSpeech: Federated Text-to-Speech with Continual Learning

    Authors: Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

    Abstract: Federated learning enables collaborative training of machine learning models under strict privacy restrictions and federated text-to-speech aims to synthesize natural speech of multiple users with a few audio training samples stored in their devices locally. However, federated text-to-speech faces several challenges: very few training samples from each speaker are available, training samples are a… ▽ More

    Submitted 22 May, 2023; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted by IJCAI 2021

    Journal ref: 2021. Main Track. Pages 3829-3835

  11. arXiv:2109.04049  [pdf, other

    cs.SD cs.AI eess.AS

    BeamTransformer: Microphone Array-based Overlap** Speech Detection

    Authors: Siqi Zheng, Shiliang Zhang, Weilong Huang, Qian Chen, Hongbin Suo, Ming Lei, **wei Feng, Zhijie Yan

    Abstract: We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling. BeamTransformer seeks to optimize modeling of sequential relationship among signals from different spatial direction. Overlap** speech detection is one of the tasks where such optimization is favorable. In this paper we effectively ap… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

  12. arXiv:2106.09317  [pdf, other

    cs.CL cs.SD eess.AS

    EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model

    Authors: Chenye Cui, Yi Ren, **glin Liu, Feiyang Chen, Rongjie Huang, Ming Lei, Zhou Zhao

    Abstract: Recently, there has been an increasing interest in neural speech synthesis. While the deep neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to generate a more emotional and more expressive speech is becoming a new challenge to researchers due to the scarcity of high-quality emotion speech dataset and the lack of advanced emotional TTS model. In this paper, we… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted by Interspeech 2021

  13. arXiv:2104.05784  [pdf, other

    cs.SD eess.AS

    Extremely Low Footprint End-to-End ASR System for Smart Device

    Authors: Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlin

    Abstract: Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate the acoustic, pronunciation and language models into a single neural network, which outperforms conventional models. Among E2E approaches, attention-based models, e.g. Transformer, have emerged as being superior. Such models have opened the door to deployment of ASR on smart devices, however they still suffer… ▽ More

    Submitted 6 July, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures, accepted by INTERSPEECH 2021

  14. arXiv:2010.15311  [pdf, other

    eess.AS cs.SD

    DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech

    Authors: Zhiying Huang, Hao Li, Ming Lei

    Abstract: With the number of smart devices increasing, the demand for on-device text-to-speech (TTS) increases rapidly. In recent years, many prominent End-to-End TTS methods have been proposed, and have greatly improved the quality of synthesized speech. However, to ensure the qualified speech, most TTS systems depend on large and complex neural network models, and it's hard to deploy these TTS systems on-… ▽ More

    Submitted 14 January, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figure, Submitted to ICASSP2021

  15. arXiv:2010.14099  [pdf, other

    cs.SD eess.AS

    Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model

    Authors: Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin

    Abstract: Recently, online end-to-end ASR has gained increasing attention. However, the performance of online systems still lags far behind that of offline systems, with a large gap in quality of recognition. For specific scenarios, we can trade-off between performance and latency, and can train multiple systems with different delays to match the performance and latency requirements of various application s… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: 5 pages, 2 figures, submitted to ICASSP 2021

  16. arXiv:2009.04293  [pdf

    eess.SY eess.SP

    An Infrared Communication System based on Handstand Pendulum

    Authors: Xingchen Li, Changlu Li, Yun Wang, Mengqi Lei

    Abstract: This paper mainly introduces an infrared optical communication system based on stable and handstand pendulum. This system adopts the method of loading the infrared light emitting end on an handstand pendulum to realize the stability and controllability of the infrared light transmission light path. In this system, 940nm infrared light is mainly used for audio signal transmission, and an handstand… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

  17. arXiv:2006.12761  [pdf, other

    cs.CV eess.IV

    Benchmarking features from different radiomics toolkits / toolboxes using Image Biomarkers Standardization Initiative

    Authors: Mingxi Lei, Bino Varghese, Darryl Hwang, Steven Cen, Xiaomeng Lei, Afshin Azadikhah, Bhushan Desai, Assad Oberai, Vinay Duddalwar

    Abstract: There is no consensus regarding the radiomic feature terminology, the underlying mathematics, or their implementation. This creates a scenario where features extracted using different toolboxes could not be used to build or validate the same model leading to a non-generalization of radiomic results. In this study, the image biomarker standardization initiative (IBSI) established phantom and benchm… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 21 pages, 8 figures

  18. arXiv:2006.06240  [pdf, ps, other

    eess.SP cs.IT cs.LG

    A PDD Decoder for Binary Linear Codes With Neural Check Polytope Projection

    Authors: Yi Wei, Ming-Min Zhao, Min-Jian Zhao, Ming Lei

    Abstract: Linear Programming (LP) is an important decoding technique for binary linear codes. However, the advantages of LP decoding, such as low error floor and strong theoretical guarantee, etc., come at the cost of high computational complexity and poor performance at the low signal-to-noise ratio (SNR) region. In this letter, we adopt the penalty dual decomposition (PDD) framework and propose a PDD algo… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: This pape has been accepted for publication in IEEE wireless communications letters

  19. arXiv:2006.01713  [pdf, other

    cs.SD eess.AS

    SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

    Authors: Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin

    Abstract: End-to-end speech recognition has become popular in recent years, since it can integrate the acoustic, pronunciation and language models into a single neural network. Among end-to-end approaches, attention-based methods have emerged as being superior. For example, Transformer, which adopts an encoder-decoder architecture. The key improvement introduced by Transformer is the utilization of self-att… ▽ More

    Submitted 20 May, 2020; originally announced June 2020.

    Comments: submitted to INTERSPEECH2020

  20. arXiv:2006.01712  [pdf, other

    cs.SD eess.AS

    Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

    Authors: Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie

    Abstract: Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention. Many efforts have been paid to turn the non-streaming attention-based E2E-ASR system into streaming architecture. In this work, we propose a novel online E2E-ASR system by using Streaming Chunk-Aware Multihead Attention(SCAMA) and a latency control memory equipped self-attention network (LC-SA… ▽ More

    Submitted 20 May, 2020; originally announced June 2020.

    Comments: submitted to INTERSPEECH2020

  21. arXiv:2005.10463  [pdf, other

    cs.SD cs.CL eess.AS

    Simplified Self-Attention for Transformer-based End-to-End Speech Recognition

    Authors: Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie

    Abstract: Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies. However, such improvements are usually obtained through the use of very large neural networks. Transformer models mainly include two submodules - position-wise feedforward layers and self-attention (SAN) layers.… ▽ More

    Submitted 17 November, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: Accepted to SLT 2021

  22. arXiv:2002.07601  [pdf, other

    cs.IT cs.LG eess.SP stat.ML

    ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning

    Authors: Yi Wei, Ming-Min Zhao, Min-Jian Zhao, Ming Lei

    Abstract: Inspired by the recent advances in deep learning (DL), this work presents a deep neural network aided decoding algorithm for binary linear codes. Based on the concept of deep unfolding, we design a decoding network by unfolding the alternating direction method of multipliers (ADMM)-penalized decoder. In addition, we propose two improved versions of the proposed network. The first one transforms th… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: 5 pages, 4 figures, accepted for publication in IEEE communications letters

  23. arXiv:1906.03814  [pdf, other

    eess.SP cs.IT cs.LG stat.ML

    Learned Conjugate Gradient Descent Network for Massive MIMO Detection

    Authors: Yi Wei, Ming-Min Zhao, Mingyi Hong, Min-jian Zhao, Ming Lei

    Abstract: In this work, we consider the use of model-driven deep learning techniques for massive multiple-input multiple-output (MIMO) detection. Compared with conventional MIMO systems, massive MIMO promises improved spectral efficiency, coverage and range. Unfortunately, these benefits are coming at the cost of significantly increased computational complexity. To reduce the complexity of signal detection… ▽ More

    Submitted 1 June, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: Part of this work has been accepted by IEEE ICC 2020

  24. arXiv:1904.10045  [pdf, other

    eess.AS cs.NE cs.SD

    Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

    Authors: Shiliang Zhang, Ming Lei, Zhijie Yan

    Abstract: Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external language model by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced b… ▽ More

    Submitted 27 March, 2019; originally announced April 2019.

    Comments: 6pages, 5 figures

  25. arXiv:1811.02353  [pdf

    eess.SP cs.HC cs.LG

    An amplitudes-perturbation data augmentation method in convolutional neural networks for EEG decoding

    Authors: Xian-Rui Zhang, Meng-Ying Lei, Yang Li

    Abstract: Brain-Computer Interface (BCI) system provides a pathway between humans and the outside world by analyzing brain signals which contain potential neural information. Electroencephalography (EEG) is one of most commonly used brain signals and EEG recognition is an important part of BCI system. Recently, convolutional neural networks (ConvNet) in deep learning are becoming the new cutting edge tools… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  26. arXiv:1810.09119  [pdf

    eess.SP q-bio.QM

    A Parametric Time Frequency-Conditional Granger Causality Method Using Ultra-regularized Orthogonal Least Squares and Multiwavelets for Dynamic Connectivity Analysis in EEGs

    Authors: Yang Li, Mengying Lei, Weigang Cui, Yuzhu Guo, Hua-Liang Wei

    Abstract: Objective: This study proposes a new parametric TF (time frequency) CGC (conditional Granger causality) method for high precision connectivity analysis over time and frequency in multivariate coupling nonstationary systems, and applies it to scalp and source EEG signals to reveal dynamic interaction patterns in oscillatory neocortical sensorimotor networks. Methods: The Geweke spectral measure is… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

  27. arXiv:1803.02445  [pdf, other

    eess.AS cs.SD

    Linear networks based speaker adaptation for speech synthesis

    Authors: Zhiying Huang, Heng Lu, Ming Lei, Zhijie Yan

    Abstract: Speaker adaptation methods aim to create fair quality synthesis speech voice font for target speakers while only limited resources available. Recently, as deep neural networks based statistical parametric speech synthesis (SPSS) methods become dominant in SPSS TTS back-end modeling, speaker adaptation under the neural network based SPSS framework has also became an important task. In this paper, l… ▽ More

    Submitted 5 March, 2018; originally announced March 2018.

    Comments: 5 pages, 6 figures, accepted by ICASSP 2018

  28. arXiv:1709.07747  [pdf

    eess.IV physics.optics

    SNR-based adaptive acquisition method for fast Fourier ptychographic microscopy

    Authors: An Pan, Yan Zhang, Maosen Li, Meiling Zhou, Junwei Min, Ming Lei, Baoli Yao

    Abstract: Fourier ptychographic microscopy (FPM) is a computational imaging technique with both high resolution and large field-of-view. However, the effective numerical aperture (NA) achievable with a typical LED panel is ambiguous and usually relies on the repeated tests of different illumination NAs. The imaging quality of each raw image usually depends on the visual assessments, which is subjective and… ▽ More

    Submitted 2 October, 2017; v1 submitted 19 September, 2017; originally announced September 2017.

    Comments: 11 pages, 6 figures