Skip to main content

Showing 1–50 of 76 results for author: Yu, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.11175  [pdf, other

    cs.SD eess.AS

    SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression

    Authors: Zhihang Sun, Andong Li, Rilin Chen, Hao Zhang, Meng Yu, Yi Zhou, Dong Yu

    Abstract: The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, t… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  3. arXiv:2406.09589  [pdf, other

    eess.AS

    Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

    Authors: Yiwen Shao, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Daniel Povey, Sanjeev Khudanpur

    Abstract: In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge. Traditional approaches often rely on microphone array configurations and the information of the target speaker's location or voiceprint. This study introduces the Solo Spatial Feature (S… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at Interspeech 2024

  4. arXiv:2405.19213  [pdf, other

    eess.SY cs.AI cs.LG cs.NI

    HawkVision: Low-Latency Modeless Edge AI Serving

    Authors: ChonLam Lao, Jiaqi Gao, Ganesh Ananthanarayanan, Aditya Akella, Minlan Yu

    Abstract: The trend of modeless ML inference is increasingly growing in popularity as it hides the complexity of model inference from users and caters to diverse user and application accuracy requirements. Previous work mostly focuses on modeless inference in data centers. To provide low-latency inference, in this paper, we promote modeless inference at the edge. The edge environment introduces additional c… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  5. A Valuation Framework for Customers Impacted by Extreme Temperature-Related Outages

    Authors: Min Gyung Yu, Monish Mukherjee, Shiva Poudela, Sadie R. Bender, Sarmad Hanif, Trevor D. Hardy, Hayden M. Reeve

    Abstract: Extreme temperature outages can lead to not just economic losses but also various non-energy impacts (NEI) due to significant degradation of indoor operating conditions caused by service disruptions. However, existing resilience assessment approaches lack specificity for extreme temperature conditions. They often overlook temperature-related mortality and neglect the customer characteristics and g… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: Appl. Energy.368(2024)123450

  6. arXiv:2405.02504  [pdf, other

    eess.IV cs.CV

    Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI

    Authors: Minhui Yu, Mengqi Wu, Ling Yue, Andrea Bozoki, Mingxia Liu

    Abstract: Magnetic resonance imaging (MRI) and positron emission tomography (PET) are increasingly used in multimodal analysis of neurodegenerative disorders. While MRI is broadly utilized in clinical settings, PET is less accessible. Many studies have attempted to use deep generative models to synthesize PET from MRI scans. However, they often suffer from unstable training and inadequately preserve brain f… ▽ More

    Submitted 8 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  7. arXiv:2403.05901  [pdf, other

    cs.ET eess.SY

    Unleashing the Power of T1-cells in SFQ Arithmetic Circuits

    Authors: Rassul Bairamkulov, Mingfei Yu, Giovanni De Micheli

    Abstract: Rapid single-flux quantum (RSFQ), a leading cryogenic superconductive electronics (SCE) technology, offers extremely low power dissipation and high speed. However, implementing RSFQ systems at VLSI complexity faces challenges, such as substantial area overhead from gate-level pipelining and path balancing, exacerbated by RSFQ's limited layout density. T1 flip-flop (T1-FF) is an RSFQ logic cell ope… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: To appear at the 2024 ACM/IEEE Design Automation and Test in Europe, Valencia, Spain, 25-27 March 2024. 2 pages, 1 figure, 1 table

  8. arXiv:2401.06650  [pdf, ps, other

    eess.SY

    LMI-based robust model predictive control for a quarter car with series active variable geometry suspension

    Authors: Zilin Feng, Anastasis Georgiou, Simos A. Evangelou, Min Yu, Imad M Jaimoukha, Daniele Dini

    Abstract: This paper proposes a robust model predictive control-based solution for the recently introduced series active variable geometry suspension (SAVGS) to improve the ride comfort and road holding of a quarter car. In order to close the gap between the nonlinear multi-body SAVGS model and its linear equivalent, a new uncertain system characterization is proposed that captures unmodeled dynamics, param… ▽ More

    Submitted 29 January, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 13 pages, 11 figures, 2 tables, IEEE Transactions on Control Systems Technology

  9. arXiv:2311.14316  [pdf, other

    eess.SP cs.AI

    Windformer:Bi-Directional Long-Distance Spatio-Temporal Network For Wind Speed Prediction

    Authors: Xuewei Li, Zewen Shang, Zhiqiang Liu, Jian Yu, Wei Xiong, Mei Yu

    Abstract: Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic in… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  10. arXiv:2311.13075  [pdf, other

    eess.AS

    Deep Audio Zooming: Beamwidth-Controllable Neural Beamformer

    Authors: Meng Yu, Dong Yu

    Abstract: Audio zooming, a signal processing technique, enables selective focusing and enhancement of sound signals from a specified region, attenuating others. While traditional beamforming and neural beamforming techniques, centered on creating a directional array, necessitate the designation of a singular target direction, they often overlook the concept of a field of view (FOV), that defines an angular… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 6 pages, 5 figures

  11. arXiv:2311.08271  [pdf, other

    cs.LG cs.IT cs.NI eess.SP

    Mobility-Induced Graph Learning for WiFi Positioning

    Authors: Kyuwon Han, Seung Min Yu, Seong-Lyun Kim, Seung-Woo Ko

    Abstract: A smartphone-based user mobility tracking could be effective in finding his/her location, while the unpredictable error therein due to low specification of built-in inertial measurement units (IMUs) rejects its standalone usage but demands the integration to another positioning technique like WiFi positioning. This paper aims to propose a novel integration technique using a graph neural network ca… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: submitted to a possible IEEE journal

  12. arXiv:2311.05477  [pdf, other

    eess.IV cs.CV cs.LG

    Using ResNet to Utilize 4-class T2-FLAIR Slice Classification Based on the Cholinergic Pathways Hyperintensities Scale for Pathological Aging

    Authors: Wei-Chun Kevin Tsai, Yi-Chien Liu, Ming-Chun Yu, Chia-Ju Chou, Sui-Hing Yan, Yang-Teng Fan, Yan-Hsiang Huang, Yen-Ling Chiu, Yi-Fang Chuang, Ran-Zan Wang, Yao-Chia Shih

    Abstract: The Cholinergic Pathways Hyperintensities Scale (CHIPS) is a visual rating scale used to assess the extent of cholinergic white matter hyperintensities in T2-FLAIR images, serving as an indicator of dementia severity. However, the manual selection of four specific slices for rating throughout the entire brain is a time-consuming process. Our goal was to develop a deep learning-based model capable… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 8 pages, 2 figures, 2 tables

  13. arXiv:2310.13177  [pdf

    eess.SY

    Enhancing Building Energy Efficiency through Advanced Sizing and Dispatch Methods for Energy Storage

    Authors: Min Gyung Yu, Xu Ma, Bowen Huang, Karthik Devaprasad, Fredericka Brown, Di Wu

    Abstract: Energy storage and electrification of buildings hold great potential for future decarbonized energy systems. However, there are several technical and economic barriers that prevent large-scale adoption and integration of energy storage in buildings. These barriers include integration with building control systems, high capital costs, and the necessity to identify and quantify value streams for dif… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  14. arXiv:2310.03608  [pdf, other

    eess.IV cs.CV

    How Good Are Synthetic Medical Images? An Empirical Study with Lung Ultrasound

    Authors: Menghan Yu, Sourabh Kulhare, Courosh Mehanian, Charles B Delahunt, Daniel E Shea, Zohreh Laverriere, Ishan Shah, Matthew P Horning

    Abstract: Acquiring large quantities of data and annotations is known to be effective for develo** high-performing deep learning models, but is difficult and expensive to do in the healthcare context. Adding synthetic training data using generative models offers a low-cost method to deal effectively with the data scarcity challenge, and can also address data imbalance and patient privacy issues. In this s… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: accepted in Simulation and Synthesis in Medical Imaging (SASHIMI)

  15. arXiv:2309.16049  [pdf, other

    eess.AS cs.SD eess.SP

    Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression

    Authors: Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu

    Abstract: Acoustic howling suppression (AHS) is a critical challenge in audio communication systems. In this paper, we propose a novel approach that leverages the power of neural networks (NN) to enhance the performance of traditional Kalman filter algorithms for AHS. Specifically, our method involves the integration of NN modules into the Kalman filter, enabling refining reference signal, a key factor in e… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Paper in submission

  16. arXiv:2309.16048  [pdf, other

    eess.AS cs.SD eess.SP

    Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

    Authors: Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu

    Abstract: In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process. This framework integrates a neural network (NN) module into the closed-loop system during training with signals generated recursively on the fly to closely mimic the streaming process of acoustic howling suppression (AHS). The propose… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Paper in submission

  17. arXiv:2309.09028  [pdf, other

    eess.AS cs.SD

    Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

    Authors: Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu

    Abstract: Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing. Existing deep learning based enhancement methods often struggle to effectively remove background noise and reverberation in real-world scenarios, hampering listening experiences. To address these challenges, we propose a novel approach that uses pre-trained generative methods to resynth… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: Paper in submission

  18. arXiv:2307.03668  [pdf

    eess.SP

    Using electrical impedance spectroscopy to identify equivalent circuit models of lubricated contacts with complex geometry: in-situ application to mini traction machine

    Authors: Min Yu, Jie Zhang, Arndt Joedicke, Tom Reddyhoff

    Abstract: Electrical contact resistance or capacitance as measured between a lubricated contact has been used in tribometers, partially reflecting the lubrication condition. In contrast, the electrical impedance provides rich information of magnitude and phase, which can be interpreted using equivalent circuit models, enabling more comprehensive measurements, including the variation of lubricant film thickn… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  19. arXiv:2305.02583  [pdf, other

    eess.AS cs.SD

    Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

    Authors: Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu

    Abstract: Deep learning has been recently introduced for efficient acoustic howling suppression (AHS). However, the recurrent nature of howling creates a mismatch between offline training and streaming inference, limiting the quality of enhanced speech. To address this limitation, we propose a hybrid method that combines a Kalman filter with a self-attentive recurrent neural network (SARNN) to leverage thei… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: submitted to INTERSPEECH 2023. arXiv admin note: text overlap with arXiv:2302.09252

  20. arXiv:2305.01637  [pdf, other

    eess.AS cs.SD

    Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings

    Authors: Hao Zhang, Meng Yu, Dong Yu

    Abstract: Hybrid meetings have become increasingly necessary during the post-COVID period and also brought new challenges for solving audio-related problems. In particular, the interplay between acoustic echo and acoustic howling in a hybrid meeting makes the joint suppression of them difficult. This paper proposes a deep learning approach to tackle this problem by formulating a recurrent feedback suppressi… ▽ More

    Submitted 4 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

  21. arXiv:2302.13273  [pdf, other

    cs.SD cs.MM eess.AS

    Two-Stream Joint-Training for Speaker Independent Acoustic-to-Articulatory Inversion

    Authors: Jianrong Wang, **yu Liu, Li Liu, Xuewei Li, Mei Yu, Jie Gao, Qiang Fang

    Abstract: Acoustic-to-articulatory inversion (AAI) aims to estimate the parameters of articulators from speech audio. There are two common challenges in AAI, which are the limited data and the unsatisfactory performance in speaker independent scenario. Most current works focus on extracting features directly from speech and ignoring the importance of phoneme information which may limit the performance of AA… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

  22. arXiv:2302.09252  [pdf, other

    eess.AS cs.SD

    Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression

    Authors: Hao Zhang, Meng Yu, Dong Yu

    Abstract: In this paper, we formulate acoustic howling suppression (AHS) as a supervised learning problem and propose a deep learning approach, called Deep AHS, to address it. Deep AHS is trained in a teacher forcing way which converts the recurrent howling suppression process into an instantaneous speech separation process to simplify the problem and accelerate the model training. The proposed method utili… ▽ More

    Submitted 17 August, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Accepted for publication in 2023 ICASSP

  23. arXiv:2301.12363  [pdf, other

    eess.AS cs.SD

    NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation

    Authors: Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang

    Abstract: The robustness of the Kalman filter to double talk and its rapid convergence make it a popular approach for addressing acoustic echo cancellation (AEC) challenges. However, the inability to model nonlinearity and the need to tune control parameters cast limitations on such adaptive filtering algorithms. In this paper, we integrate the frequency domain Kalman filter (FDKF) and deep neural networks… ▽ More

    Submitted 26 December, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: The term of the algorithm is renamed because it conflicts with an existing KalmanNet algorithm proposed by Revach et. al. (arXiv:2107.10043); Accepted by ASRU 2023

  24. arXiv:2212.12810  [pdf, other

    eess.IV cs.CV

    Hybrid Representation Learning for Cognitive Diagnosis in Late-Life Depression Over 5 Years with Structural MRI

    Authors: Lintao Zhang, Lihong Wang, Minhui Yu, Rong Wu, David C. Steffens, Guy G. Potter, Mingxia Liu

    Abstract: Late-life depression (LLD) is a highly prevalent mood disorder occurring in older adults and is frequently accompanied by cognitive impairment (CI). Studies have shown that LLD may increase the risk of Alzheimer's disease (AD). However, the heterogeneity of presentation of geriatric depression suggests that multiple biological mechanisms may underlie it. Current biological research on LLD progress… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

  25. arXiv:2212.03997  [pdf, other

    eess.SY

    Analyzing At-Scale Distribution Grid Response to Extreme Temperatures

    Authors: Sarmad Hanif, Monish Mukherjee, Shiva Poudel, Rohit A **siwale, Min Gyung Yu, Trevor Hardy, Hayden Reeve

    Abstract: Threats against power grids continue to increase, as extreme weather conditions and natural disasters (extreme events) become more frequent. Hence, there is a need for the simulation and modeling of power grids to reflect realistic conditions during extreme events conditions, especially distribution systems. This paper presents a modeling and simulation platform for electric distribution grids whi… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

  26. arXiv:2211.12590  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Deep Neural Mel-Subband Beamformer for In-car Speech Separation

    Authors: Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

    Abstract: While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use. In this paper, we propose DL-based mel-subband spatio-temporal beamformer to perform speech separation… ▽ More

    Submitted 11 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  27. arXiv:2211.10023  [pdf, other

    cs.CV cs.LG eess.IV

    LiSnowNet: Real-time Snow Removal for LiDAR Point Cloud

    Authors: Ming-Yuan Yu, Ram Vasudevan, Matthew Johnson-Roberson

    Abstract: LiDARs have been widely adopted to modern self-driving vehicles, providing 3D information of the scene and surrounding objects. However, adverser weather conditions still pose significant challenges to LiDARs since point clouds captured during snowfall can easily be corrupted. The resulting noisy point clouds degrade downstream tasks such as map**. Existing works in de-noising point clouds corru… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: The paper has been accepted for the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

  28. arXiv:2210.17014  [pdf

    eess.SY physics.optics

    Parametrically driven inertial sensing in chip-scale optomechanical cavities at the thermodynamical limits with extended dynamic range

    Authors: Jaime Gonzalo Flor Flores, Talha Yerebakan, Wenting Wang, Mingbin Yu, Dim-Lee Kwong, Andrey Matsko, Chee Wei Wong

    Abstract: Recent scientific and technological advances have enabled the detection of gravitational waves, autonomous driving, and the proposal of a communications network on the Moon (Lunar Internet or LunaNet). These efforts are based on the measurement of minute displacements and correspondingly the forces or fields transduction, which translate to acceleration, velocity, and position determination for na… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

  29. arXiv:2209.07302  [pdf, other

    cs.SD eess.AS

    MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement

    Authors: Jianrong Wang, Xiaomin Li, Xuewei Li, Mei Yu, Qiang Fang, Li Liu

    Abstract: Speech enhancement improves speech quality and promotes the performance of various downstream tasks. However, most current speech enhancement work was mainly devoted to improving the performance of downstream automatic speech recognition (ASR), only a relatively small amount of work focused on the automatic speaker verification (ASV) task. In this work, we propose a MVNet consisted of a memory ass… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: ICONIP 2022

  30. arXiv:2206.06145  [pdf

    q-bio.MN eess.SY

    Identification of cancer-kee** genes as therapeutic targets by finding network control hubs

    Authors: Xizhe Zhang, Chunyu Pan, Xinru Wei, Meng Yu, Shuangjie Liu, Jun An, Jie** Yang, Baojun Wei, Wenjun Hao, Yang Yao, Yuyan Zhu, Weixiong Zhang

    Abstract: Finding cancer driver genes has been a focal theme of cancer research and clinical studies. One of the recent approaches is based on network structural controllability that focuses on finding a control scheme and driver genes that can steer the cell from an arbitrary state to a designated state. While theoretically sound, this approach is impractical for many reasons, e.g., the control scheme is o… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Contact the corresponding authors for supplementary material

  31. arXiv:2205.10401  [pdf, other

    eess.AS cs.SD

    NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

    Authors: Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

    Abstract: Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back. In this paper, we present an all-deep-learning framework that implicitly estimates the second order statistics of echo/noise and target speech, and jointly solves echo and noise suppression th… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

  32. arXiv:2205.00434  [pdf, other

    cs.CV eess.IV

    Reinforced Swin-Convs Transformer for Underwater Image Enhancement

    Authors: Tingdi Ren, Haiyong Xu, Gangyi Jiang, Mei Yu, Ting Luo

    Abstract: Underwater Image Enhancement (UIE) technology aims to tackle the challenge of restoring the degraded underwater images due to light absorption and scattering. To address problems, a novel U-Net based Reinforced Swin-Convs Transformer for the Underwater Image Enhancement method (URSCT-UIE) is proposed. Specifically, with the deficiency of U-Net based on pure convolutions, we embedded the Swin Trans… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: Submitted by NeurIPS 2022

  33. arXiv:2203.17068  [pdf, other

    eess.AS cs.SD

    EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

    Authors: Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu

    Abstract: In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting. Our proposed framework integrates speaker diarization based on end-to-end neural diarization (EEND) models, speaker counting with encoder-decoder based attractors (EDA), and speech separation using Conv-TasNet. In addition, we propose a multiple 1x1 convoluti… ▽ More

    Submitted 15 December, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted in SLT 2022

  34. arXiv:2203.16037  [pdf, other

    cs.SD cs.LG eess.AS

    Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE

    Authors: Ziang Long, Yunling Zheng, Meng Yu, Jack Xin

    Abstract: Variational auto-encoder (VAE) is an effective neural network architecture to disentangle a speech utterance into speaker identity and linguistic content latent embeddings, then generate an utterance for a target speaker from that of a source speaker. This is possible by concatenating the identity embedding of the target speaker and the content embedding of the source speaker uttering a desired se… ▽ More

    Submitted 22 August, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  35. arXiv:2203.04162  [pdf, other

    eess.SY

    Feedforward PID Control of Full-Car with Parallel Active Link Suspension for Improved Chassis Attitude Stabilization

    Authors: Zilin Feng, Min Yu, Simos A. Evangelou, Imad M Jaimoukha, Daniele Dini

    Abstract: PID control is commonly utilized in an active suspension system to achieve desirable chassis attitude, where, due to delays, feedback information has much difficulty regulating the roll and pitch behavior, and stabilizing the chassis attitude, which may result in roll over when the vehicle steers at a large longitudinal velocity. To address the problem of the feedback delays in chassis attitude st… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: 8 pages, 17 figures, CCTA conference

  36. arXiv:2203.04147  [pdf, other

    eess.SY

    Mu-synthesis PID Control of Full-Car with Parallel Active Link Suspension Under Variable Payload

    Authors: Zilin Feng, Min Yu, Simos A. Evangelou, Imad M Jaimoukha, Daniele Dini

    Abstract: This paper presents a combined mu-synthesis PID control scheme, employing a frequency separation paradigm, for a recently proposed novel active suspension, the Parallel Active Link Suspension (PALS). The developed mu-synthesis control scheme is superior to the conventional H-infinity control, previously designed for the PALS, in terms of ride comfort and road holding (higher frequency dynamics), w… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: 13 pages, 24 figures

  37. arXiv:2112.05755  [pdf, other

    eess.IV cs.CV

    Information Prebuilt Recurrent Reconstruction Network for Video Super-Resolution

    Authors: Shuyun Wang, Ming Yu, Cuihong Xue, Yingchun Guo, Gang Yan

    Abstract: The video super-resolution (VSR) method based on the recurrent convolutional network has strong temporal modeling capability for video sequences. However, the temporal receptive field of different recurrent units in the unidirectional recurrent network is unbalanced. Earlier reconstruction frames receive less spatio-temporal information, resulting in fuzziness or artifacts. Although the bidirectio… ▽ More

    Submitted 2 February, 2023; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: 12 pages,9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  38. arXiv:2111.15016  [pdf, other

    cs.CL cs.SD eess.AS

    Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

    Authors: Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

    Abstract: Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with label-to-frame synchronization, our joint m… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  39. arXiv:2111.04904  [pdf, other

    eess.AS cs.SD eess.SP

    Joint Neural AEC and Beamforming with Double-Talk Detection

    Authors: Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

    Abstract: Acoustic echo cancellation (AEC) in full-duplex communication systems eliminates acoustic feedback. However, nonlinear distortions induced by audio devices, background noise, reverberation, and double-talk reduce the efficiency of conventional AEC systems. Several hybrid AEC models were proposed to address this, which use deep learning models to suppress residual echo from standard adaptive filter… ▽ More

    Submitted 27 June, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: Accepted in Interspeech 2022

  40. arXiv:2110.05438  [pdf, other

    cs.NI cs.DS eess.SY

    Zero-CPU Collection with Direct Telemetry Access

    Authors: Jonatan Langlet, Ran Ben Basat, Sivaramakrishnan Ramanathan, Gabriele Oliaro, Michael Mitzenmacher, Minlan Yu, Gianni Antichi

    Abstract: Programmable switches are driving a massive increase in fine-grained measurements. This puts significant pressure on telemetry collectors that have to process reports from many switches. Past research acknowledged this problem by either improving collectors' stack performance or by limiting the amount of data sent from switches. In this paper, we take a different and radical approach: switches are… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: To appear in ACM HotNets 2021

  41. arXiv:2110.04057  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    FAST-RIR: Fast neural diffuse room impulse response generator

    Authors: Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu

    Abstract: We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment. Our FAST-RIR takes rectangular room dimensions, listener and speaker positions, and reverberation time as inputs and generates specular and diffuse reflections for a given acoustic environment. Our FAST-RIR is capable of generating… ▽ More

    Submitted 5 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022. More results and source code is available at https://anton-jeran.github.io/FRIR/

  42. arXiv:2106.13686  [pdf, other

    cs.MM cs.SD eess.AS eess.IV

    Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition

    Authors: Jianrong Wang, Ziyue Tang, Xuewei Li, Mei Yu, Qiang Fang, Li Liu

    Abstract: Cued Speech (CS) is a visual communication system for the deaf or hearing impaired people. It combines lip movements with hand cues to obtain a complete phonetic repertoire. Current deep learning based methods on automatic CS recognition suffer from a common problem, which is the data scarcity. Until now, there are only two public single speaker datasets for French (238 sentences) and British Engl… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

  43. arXiv:2104.08450  [pdf, other

    cs.SD cs.AI eess.AS

    MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

    Authors: Xiyun Li, Yong Xu, Meng Yu, Shi-Xiong Zhang, Jiaming Xu, Bo Xu, Dong Yu

    Abstract: Recently, our proposed recurrent neural network (RNN) based all deep learning minimum variance distortionless response (ADL-MVDR) beamformer method yielded superior performance over the conventional MVDR by replacing the matrix inversion and eigenvalue decomposition with two recurrent neural networks. In this work, we present a self-attentive RNN beamformer to further improve our previous RNN-base… ▽ More

    Submitted 26 April, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

  44. arXiv:2104.01227  [pdf, other

    eess.AS cs.SD

    MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment

    Authors: Meng Yu, Chunlei Zhang, Yong Xu, Shixiong Zhang, Dong Yu

    Abstract: The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests. Non-intrusive speech quality assessment has attracted much attention recently due to the lack of access to clean reference signals for objectiv… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech 2021

  45. arXiv:2103.16849  [pdf, other

    eess.AS cs.SD

    TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation

    Authors: Helin Wang, Bo Wu, Lianwu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu

    Abstract: In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments. We propose a temporal-contextual attention approach on the deep neural network (DNN) for environment-aware speech dereverberation, which can adaptively attend to the contextual information. More specifically, a FullBand based Tempo… ▽ More

    Submitted 26 August, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: Submitted to Interspeech 2021

  46. arXiv:2103.08781  [pdf, other

    eess.AS

    Towards Robust Speaker Verification with Target Speaker Enhancement

    Authors: Chunlei Zhang, Meng Yu, Chao Weng, Dong Yu

    Abstract: This paper proposes the target speaker enhancement based speaker verification network (TASE-SVNet), an all neural model that couples target speaker enhancement and speaker embedding extraction for robust speaker verification (SV). Specifically, an enrollment speaker conditioned speech enhancement module is employed as the front-end for extracting target speaker from its mixture with interfering sp… ▽ More

    Submitted 15 March, 2021; originally announced March 2021.

    Comments: Accepted by IEEE ICASSP 2021

  47. arXiv:2102.07955  [pdf, other

    eess.AS cs.SD

    Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

    Authors: Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

    Abstract: Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimate the direction of arrival (DOA) of all the speakers simultaneously from the audio mixture. At the heart of the proposal is a source splitting mechanism that creates source-specific intermediate repre… ▽ More

    Submitted 28 November, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: Submitted to Computer Speech & Language

  48. arXiv:2101.01280  [pdf, other

    cs.SD eess.AS

    Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation

    Authors: Yong Xu, Zhuohuang Zhang, Meng Yu, Shi-Xiong Zhang, Dong Yu

    Abstract: Although the conventional mask-based minimum variance distortionless response (MVDR) could reduce the non-linear distortion, the residual noise level of the MVDR separated speech is still high. In this paper, we propose a spatio-temporal recurrent neural network based beamformer (RNN-BF) for target speech separation. This new beamforming framework directly learns the beamforming weights from the e… ▽ More

    Submitted 3 April, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

    Comments: Submitted to Interspeech2021, Demo: https://yongxuustc.github.io/grnnbf/

  49. arXiv:2012.13442  [pdf, other

    eess.AS cs.SD

    Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

    Authors: Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu

    Abstract: Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, however, conventional neural mask-based MVDR systems s… ▽ More

    Submitted 15 November, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP); Demos available at https://zzhang68.github.io/mcmf-adl-mvdr/

  50. arXiv:2012.07178  [pdf, other

    eess.AS cs.LG

    Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

    Authors: Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu, Dong Yu

    Abstract: In this study, we investigate self-supervised representation learning for speaker verification (SV). First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples. We show that better speaker embeddings can be learned by momentum contrasti… ▽ More

    Submitted 14 February, 2021; v1 submitted 13 December, 2020; originally announced December 2020.

    Comments: Accepted to ICASSP2021