Skip to main content

Showing 1–15 of 15 results for author: Bai, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.08742  [pdf

    eess.AS cs.SD

    A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes

    Authors: Yicheng Hsu, Mingsian R. Bai

    Abstract: Binaural Audio Telepresence (BAT) aims to encode the acoustic scene at the far end into binaural signals for the user at the near end. BAT encompasses an immense range of applications that can vary between two extreme modes of Immersive BAT (I-BAT) and Enhanced BAT (E-BAT). With I-BAT, our goal is to preserve the full ambience as if we were at the far end, while with E-BAT, our goal is to enhance… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  2. arXiv:2402.19275  [pdf, other

    eess.SY cs.LG

    Adaptive Testing Environment Generation for Connected and Automated Vehicles with Dense Reinforcement Learning

    Authors: **gxuan Yang, Ruoxuan Bai, Haoyuan Ji, Yi Zhang, Jianming Hu, Shuo Feng

    Abstract: The assessment of safety performance plays a pivotal role in the development and deployment of connected and automated vehicles (CAVs). A common approach involves designing testing scenarios based on prior knowledge of CAVs (e.g., surrogate models), conducting tests in these scenarios, and subsequently evaluating CAVs' safety performances. However, substantial differences between CAVs and the prio… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  3. arXiv:2401.16850  [pdf

    eess.AS cs.SD

    Spatial-Temporal Activity-Informed Diarization and Separation

    Authors: Yicheng Hsu, Ssuhan Chen, Mingsian R. Bai

    Abstract: A robust multichannel speaker diarization and separation system is proposed by exploiting the spatio-temporal activity of the speakers. The system is realized in a hybrid architecture that combines the array signal processing units and the deep learning units. For speaker diarization, a spatial coherence matrix across time frames is computed based on the whitened relative transfer functions (wRTFs… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 13 pages

  4. arXiv:2311.12706  [pdf

    eess.AS

    Learning-based Array Configuration-Independent Binaural Audio Telepresence with Scalable Signal Enhancement and Ambience Preservation

    Authors: Yicheng Hsu, Mingsian R. Bai

    Abstract: Audio Telepresence (AT) aims to create an immersive experience of the audio scene at the far end for the user(s) at the near end. The application of AT could encompass scenarios with varying degrees of emphasis on signal enhancement and ambience preservation. It is desirable for an AT system to be scalable between these two extremes. To this end, we propose an array-based Binaural AT (BAT) system… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 10 pages, 11 figures

  5. arXiv:2311.09655  [pdf, other

    cs.SD cs.CV eess.AS

    Multi-View Spectrogram Transformer for Respiratory Sound Classification

    Authors: Wentao He, Yuchen Yan, Jianfeng Ren, Ruibin Bai, Xudong Jiang

    Abstract: Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MV… ▽ More

    Submitted 30 May, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: The paper was published at ICASSP 2024

  6. arXiv:2310.12837  [pdf

    eess.AS

    Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function

    Authors: Hsinyu Chang, Yicheng Hsu, Mingsian R. Bai

    Abstract: Recent research advances in deep neural network (DNN)-based beamformers have shown great promise for speech enhancement under adverse acoustic conditions. Different network architectures and input features have been explored in estimating beamforming weights. In this paper, we propose a deep beamformer based on an efficient convolutional recurrent network (CRN) trained with a novel ARray RespOnse-… ▽ More

    Submitted 22 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 6 pages

  7. arXiv:2305.18865  [pdf, other

    eess.IV cs.CV

    Elongated Physiological Structure Segmentation via Spatial and Scale Uncertainty-aware Network

    Authors: Yinglin Zhang, Ruiling Xi, Huazhu Fu, Dave Towey, RuiBin Bai, Risa Higashita, Jiang Liu

    Abstract: Robust and accurate segmentation for elongated physiological structures is challenging, especially in the ambiguous region, such as the corneal endothelium microscope image with uneven illumination or the fundus image with disease interference. In this paper, we present a spatial and scale uncertainty-aware network (SSU-Net) that fully uses both spatial and scale uncertainty to highlight ambiguous… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  8. arXiv:2304.08887  [pdf

    eess.AS

    Array Configuration-Agnostic Personal Voice Activity Detection Based on Spatial Coherence

    Authors: Yicheng Hsu, Mingsian R. Bai

    Abstract: Personal voice activity detection has received increased attention due to the growing popularity of personal mobile devices and smart speakers. PVAD is often an integral element to speech enhancement and recognition for these applications in which lightweight signal processing is only enabled for the target user. However, in real-world scenarios, the detection performance may degrade because of co… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted by INTER-NOISE 2023. arXiv admin note: text overlap with arXiv:2211.08748

  9. arXiv:2211.08748  [pdf

    eess.AS cs.SD

    Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence

    Authors: Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

    Abstract: Personalized speech enhancement has been a field of active research for suppression of speechlike interferers such as competing speakers or TV dialogues. Compared with single channel approaches, multichannel PSE systems can be more effective in adverse acoustic conditions by leveraging the spatial information in microphone signals. However, the implementation of multichannel PSEs to accommodate a… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  10. arXiv:2210.11123  [pdf

    eess.AS cs.SD

    Model-matching Principle Applied to the Design of an Array-based All-neural Binaural Rendering System for Audio Telepresence

    Authors: Yicheng Hsu, Chenghumg Ma, Mingsian R. Bai

    Abstract: Telepresence aims to create an immersive but virtual experience of the audio and visual scene at the far end for users at the near end. In this contribution, we propose an array-based binaural rendering system that converts the array microphone signals into the head-related transfer function (HRTF) filtered output signals for headphone-rendering. The proposed approach is formulated in light of a m… ▽ More

    Submitted 6 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: accepted by ICASSP 2023

  11. arXiv:2207.08126  [pdf

    eess.AS cs.SD

    Multi-channel target speech enhancement based on ERB-scaled spatial coherence features

    Authors: Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

    Abstract: Recently, speech enhancement technologies that are based on deep learning have received considerable research attention. If the spatial information in microphone signals is exploited, microphone arrays can be advantageous under some adverse acoustic conditions compared with single-microphone systems. However, multichannel speech enhancement is often performed in the short-time Fourier transform (S… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

    Comments: Accepted by International Congress on Acoustics (ICA) 2022. arXiv admin note: substantial text overlap with arXiv:2112.05686

  12. arXiv:2206.09728  [pdf

    eess.AS

    Multi-channel end-to-end neural network for speech enhancement, source localization, and voice activity detection

    Authors: Yuan Chen, Yicheng Hsu, Mingsian R. Bai

    Abstract: Speech enhancement and source localization has been active research for several decades with a wide range of real-world applications. Recently, the Deep Complex Convolution Recurrent network (DCCRN) has yielded impressive enhancement performance for single-channel systems. In this study, a neural beamformer consisting of a beamformer and a novel multi-channel DCCRN is proposed for speech enhanceme… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: Accepted by ICA2022

  13. Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

    Authors: Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

    Abstract: Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target speech extraction from the mixture signals can be performed with the aid of the user's vocal features. Various features are accounted for in this study's proposed… ▽ More

    Submitted 29 April, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: accepted by ICASSP 2022

  14. arXiv:2006.16312  [pdf, other

    cs.LG cs.DS cs.IR eess.SY stat.ML

    Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

    Authors: Xiaotian Hao, Zhaoqing Peng, Yi Ma, Guan Wang, Junqi **, Jianye Hao, Shan Chen, Rongquan Bai, Mingzhou Xie, Miao Xu, Zhenzhe Zheng, Chuan Yu, Han Li, Jian Xu, Kun Gai

    Abstract: In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing adver… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: accepted by ICML 2020

  15. arXiv:2003.03797  [pdf, other

    eess.IV cs.CV

    1D Probabilistic Undersampling Pattern Optimization for MR Image Reconstruction

    Authors: Shengke Xue, Ruiliang Bai, Xinyu **

    Abstract: Magnetic resonance imaging (MRI) is mainly limited by long scanning time and vulnerable to human tissue motion artifacts, in 3D clinical scenarios. Thus, k-space undersampling is used to accelerate the acquisition of MRI while leading to visually poor MR images. Recently, some studies 1) use effective undersampling patterns, or 2) design deep neural networks to improve the quality of resulting ima… ▽ More

    Submitted 8 January, 2022; v1 submitted 8 March, 2020; originally announced March 2020.

    Comments: Manuscript temporarily, will be submitted IEEE Trans. Med. Imag. eventually