Skip to main content

Showing 1–16 of 16 results for author: Bai, M

Searching in archive eess. Search in all archives.
.
  1. Multi-Objective Sizing Optimization Method of Microgrid Considering Cost and Carbon Emissions

    Authors: Xiang Zhu, Guangchun Ruan, Hua Geng, Honghai Liu, Mingfei Bai, Chao Peng

    Abstract: Microgrid serves as a promising solution to integrate and manage distributed renewable energy resources. In this paper, we establish a stochastic multi-objective sizing optimization (SMOSO) model for microgrid planning, which fully captures the battery degradation characteristics and the total carbon emissions. The microgrid operator aims to simultaneously maximize the economic benefits and minimi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industry Applications

  2. arXiv:2405.08742  [pdf

    eess.AS cs.SD

    A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes

    Authors: Yicheng Hsu, Mingsian R. Bai

    Abstract: Binaural Audio Telepresence (BAT) aims to encode the acoustic scene at the far end into binaural signals for the user at the near end. BAT encompasses an immense range of applications that can vary between two extreme modes of Immersive BAT (I-BAT) and Enhanced BAT (E-BAT). With I-BAT, our goal is to preserve the full ambience as if we were at the far end, while with E-BAT, our goal is to enhance… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  3. arXiv:2403.03675  [pdf, other

    cs.IT eess.SP

    ZF Beamforming Tensor Compression for Massive MIMO Fronthaul

    Authors: Libin Zheng, Zihao Wang, Minru Bai, Zhenjie Tan

    Abstract: In the rapidly evolving landscape of 5G and beyond 5G (B5G) mobile cellular communications, efficient data compression and reconstruction strategies become paramount, especially in massive multiple-input multiple-output (MIMO) systems. A critical challenge in these systems is the capacity-limited fronthaul, particularly in the context of the Ethernet-based common public radio interface (eCPRI) con… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  4. arXiv:2401.16850  [pdf

    eess.AS cs.SD

    Spatial-Temporal Activity-Informed Diarization and Separation

    Authors: Yicheng Hsu, Ssuhan Chen, Mingsian R. Bai

    Abstract: A robust multichannel speaker diarization and separation system is proposed by exploiting the spatio-temporal activity of the speakers. The system is realized in a hybrid architecture that combines the array signal processing units and the deep learning units. For speaker diarization, a spatial coherence matrix across time frames is computed based on the whitened relative transfer functions (wRTFs… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 13 pages

  5. arXiv:2311.12706  [pdf

    eess.AS

    Learning-based Array Configuration-Independent Binaural Audio Telepresence with Scalable Signal Enhancement and Ambience Preservation

    Authors: Yicheng Hsu, Mingsian R. Bai

    Abstract: Audio Telepresence (AT) aims to create an immersive experience of the audio scene at the far end for the user(s) at the near end. The application of AT could encompass scenarios with varying degrees of emphasis on signal enhancement and ambience preservation. It is desirable for an AT system to be scalable between these two extremes. To this end, we propose an array-based Binaural AT (BAT) system… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 10 pages, 11 figures

  6. arXiv:2310.12837  [pdf

    eess.AS

    Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function

    Authors: Hsinyu Chang, Yicheng Hsu, Mingsian R. Bai

    Abstract: Recent research advances in deep neural network (DNN)-based beamformers have shown great promise for speech enhancement under adverse acoustic conditions. Different network architectures and input features have been explored in estimating beamforming weights. In this paper, we propose a deep beamformer based on an efficient convolutional recurrent network (CRN) trained with a novel ARray RespOnse-… ▽ More

    Submitted 22 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 6 pages

  7. arXiv:2304.08887  [pdf

    eess.AS

    Array Configuration-Agnostic Personal Voice Activity Detection Based on Spatial Coherence

    Authors: Yicheng Hsu, Mingsian R. Bai

    Abstract: Personal voice activity detection has received increased attention due to the growing popularity of personal mobile devices and smart speakers. PVAD is often an integral element to speech enhancement and recognition for these applications in which lightweight signal processing is only enabled for the target user. However, in real-world scenarios, the detection performance may degrade because of co… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted by INTER-NOISE 2023. arXiv admin note: text overlap with arXiv:2211.08748

  8. arXiv:2303.06867  [pdf

    eess.AS

    Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence

    Authors: Yicheng Hsu, Mingsian Bai

    Abstract: A three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whitened relative transfer functions (wRTFs) across time frames. The global activity functions of each speaker are estimated from a simplex constructed using the eigenvectors of the SCM, while the… ▽ More

    Submitted 7 August, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: 20 pages, 17 figures

  9. arXiv:2211.08748  [pdf

    eess.AS cs.SD

    Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence

    Authors: Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

    Abstract: Personalized speech enhancement has been a field of active research for suppression of speechlike interferers such as competing speakers or TV dialogues. Compared with single channel approaches, multichannel PSE systems can be more effective in adverse acoustic conditions by leveraging the spatial information in microphone signals. However, the implementation of multichannel PSEs to accommodate a… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  10. arXiv:2210.11123  [pdf

    eess.AS cs.SD

    Model-matching Principle Applied to the Design of an Array-based All-neural Binaural Rendering System for Audio Telepresence

    Authors: Yicheng Hsu, Chenghumg Ma, Mingsian R. Bai

    Abstract: Telepresence aims to create an immersive but virtual experience of the audio and visual scene at the far end for users at the near end. In this contribution, we propose an array-based binaural rendering system that converts the array microphone signals into the head-related transfer function (HRTF) filtered output signals for headphone-rendering. The proposed approach is formulated in light of a m… ▽ More

    Submitted 6 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: accepted by ICASSP 2023

  11. arXiv:2207.08126  [pdf

    eess.AS cs.SD

    Multi-channel target speech enhancement based on ERB-scaled spatial coherence features

    Authors: Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

    Abstract: Recently, speech enhancement technologies that are based on deep learning have received considerable research attention. If the spatial information in microphone signals is exploited, microphone arrays can be advantageous under some adverse acoustic conditions compared with single-microphone systems. However, multichannel speech enhancement is often performed in the short-time Fourier transform (S… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

    Comments: Accepted by International Congress on Acoustics (ICA) 2022. arXiv admin note: substantial text overlap with arXiv:2112.05686

  12. arXiv:2206.09728  [pdf

    eess.AS

    Multi-channel end-to-end neural network for speech enhancement, source localization, and voice activity detection

    Authors: Yuan Chen, Yicheng Hsu, Mingsian R. Bai

    Abstract: Speech enhancement and source localization has been active research for several decades with a wide range of real-world applications. Recently, the Deep Complex Convolution Recurrent network (DCCRN) has yielded impressive enhancement performance for single-channel systems. In this study, a neural beamformer consisting of a beamformer and a novel multi-channel DCCRN is proposed for speech enhanceme… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: Accepted by ICA2022

  13. arXiv:2205.03594  [pdf

    eess.AS cs.SD

    Acoustic echo suppression using a learning-based multi-frame minimum variance distortionless response filter

    Authors: Yuefeng Tsai, Yicheng Hsu, Mingsian Bai

    Abstract: Distortion resulting from acoustic echo suppression (AES) is a common issue in full-duplex communication. To address the distortion problem, a multi-frame minimum variance distortionless response (MFMVDR) filtering technique is proposed. The MFMVDR filter with parameter estimation which was used in speech enhancement problems is extended in this study from a deep learning perspective. To alleviate… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: Submitted to International Workshop on Acoustic Signal Enhancement (IWAENC) 2022

  14. Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

    Authors: Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

    Abstract: Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target speech extraction from the mixture signals can be performed with the aid of the user's vocal features. Various features are accounted for in this study's proposed… ▽ More

    Submitted 29 April, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: accepted by ICASSP 2022

  15. arXiv:2107.01343  [pdf

    cs.LG eess.SP

    Short-term probabilistic photovoltaic power forecast based on deep convolutional long short-term memory network and kernel density estimation

    Authors: Mingliang Bai, Xinyu Zhao, Zhenhua Long, **fu Liu, Daren Yu

    Abstract: Solar energy is a clean and renewable energy. Photovoltaic (PV) power is an important way to utilize solar energy. Accurate PV power forecast is crucial to the large-scale application of PV power and the stability of electricity grid. This paper proposes a novel method for short-term photovoltaic power forecast using deep convolutional long short-term memory (ConvLSTM) network and kernel density e… ▽ More

    Submitted 3 July, 2021; originally announced July 2021.

  16. arXiv:1908.04924  [pdf, ps, other

    cs.LG eess.IV stat.ML

    Tensor-Train Parameterization for Ultra Dimensionality Reduction

    Authors: Mingyuan Bai, S. T. Boris Choy, Xin Song, Junbin Gao

    Abstract: Locality preserving projections (LPP) are a classical dimensionality reduction method based on data graph information. However, LPP is still responsive to extreme outliers. LPP aiming for vectorial data may undermine data structural information when it is applied to multidimensional data. Besides, it assumes the dimension of data to be smaller than the number of instances, which is not suitable fo… ▽ More

    Submitted 13 August, 2019; originally announced August 2019.