Skip to main content

Showing 1–50 of 56 results for author: Fu, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18871  [pdf, other

    eess.AS cs.CL

    DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

    Authors: Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

    Abstract: Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby fa… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2405.06573  [pdf, other

    cs.SD cs.AI eess.AS

    An Investigation of Incorporating Mamba for Speech Enhancement

    Authors: Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao

    Abstract: This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the properties of Mamba by integrating it as the core model in both basic and advanced SE systems, along with utilizing signal-level distances as well as metric… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  3. arXiv:2402.16321  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

    Authors: Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang

    Abstract: Speech quality estimation has recently undergone a paradigm shift from human-hearing expert designs to machine-learning models. However, current models rely mainly on supervised learning, which is time-consuming and expensive for label collection. To solve this problem, we propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantized-variatio… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Published as a conference paper at ICLR 2024

  4. arXiv:2401.12468  [pdf, ps, other

    eess.SY

    Minimum observability of probabilistic Boolean networks

    Authors: Jiayi Xu, Shihua Fu, Liyuan Xia, Jianjun Wang

    Abstract: This paper studies the minimum observability of probabilistic Boolean networks (PBNs), the main objective of which is to add the fewest measurements to make an unobservable PBN become observable. First of all, the algebraic form of a PBN is established with the help of semi-tensor product (STP) of matrices. By combining the algebraic forms of two identical PBNs into a parallel system, a method to… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  5. arXiv:2401.01165  [pdf, other

    cs.LG eess.SP

    Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer

    Authors: Yanni Wang, Hecheng Jia, Shilei Fu, Hui** Lin, Feng Xu

    Abstract: The electromagnetic inverse problem has long been a research hotspot. This study aims to reverse radar view angles in synthetic aperture radar (SAR) images given a target model. Nonetheless, the scarcity of SAR data, combined with the intricate background interference and imaging mechanisms, limit the applications of existing learning-based approaches. To address these challenges, we propose an in… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  6. arXiv:2311.08878  [pdf, other

    eess.AS cs.SD

    Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

    Authors: Hsin-Tien Chiang, Szu-Wei Fu, Hsin-Min Wang, Yu Tsao, John H. L. Hansen

    Abstract: Without the need for a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations. While deep learning models have been used to develop non-intrusive speech assessment methods with promising results, there is limited research on hearing-impaired subjects. This study proposes a multi-objective non-intrusive hearing-aid speech assessment model, cal… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  7. arXiv:2309.12766  [pdf, other

    eess.AS cs.SD

    A Study on Incorporating Whisper for Robust Speech Assessment

    Authors: Ryandhimas E. Zezario, Yu-Wen Chen, Szu-Wei Fu, Yu Tsao, Hsin-Min Wang, Chiou-Shann Fuh

    Abstract: This research introduces an enhanced version of the multi-objective speech assessment model--MOSA-Net+, by leveraging the acoustic features from Whisper, a large-scaled weakly supervised model. We first investigate the effectiveness of Whisper in deploying a more robust speech assessment model. After that, we explore combining representations from Whisper and SSL models. The experimental results r… ▽ More

    Submitted 29 April, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE ICME 2024

  8. arXiv:2307.04517  [pdf, other

    eess.AS

    Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility

    Authors: Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao

    Abstract: Subjective tests are the gold standard for evaluating speech quality and intelligibility; however, they are time-consuming and expensive. Thus, objective measures that align with human perceptions are crucial. This study evaluates the correlation between commonly used objective measures and subjective speech quality and intelligibility using a Chinese speech dataset. Moreover, new objective measur… ▽ More

    Submitted 10 October, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

  9. arXiv:2304.00658  [pdf, other

    eess.AS

    Improving Meeting Inclusiveness using Speech Interruption Analysis

    Authors: Szu-Wei Fu, Yaran Fan, Yasaman Hosseinkashi, Jayant Gupchup, Ross Cutler

    Abstract: Meetings are a pervasive method of communication within all types of companies and organizations, and using remote collaboration systems to conduct meetings has increased dramatically since the COVID-19 pandemic. However, not all meetings are inclusive, especially in terms of the participation rates among attendees. In a recent large-scale survey conducted at Microsoft, the top suggestion given by… ▽ More

    Submitted 4 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

  10. arXiv:2303.13567  [pdf

    cs.LG cs.CV eess.IV

    AI Models Close to your Chest: Robust Federated Learning Strategies for Multi-site CT

    Authors: Edward H. Lee, Brendan Kelly, Emre Altinmakas, Hakan Dogan, Maryam Mohammadzadeh, Errol Colak, Steve Fu, Olivia Choudhury, Ujjwal Ratan, Felipe Kitamura, Hernan Chaves, Jimmy Zheng, Mourad Said, Eduardo Reis, Jaekwang Lim, Patricia Yokoo, Courtney Mitchell, Golnaz Houshmand, Marzyeh Ghassemi, Ronan Killeen, Wendy Qiu, Joel Hayden, Farnaz Rafiee, Chad Klochko, Nicholas Bevins , et al. (5 additional authors not shown)

    Abstract: While it is well known that population differences from genetics, sex, race, and environmental factors contribute to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI developm… ▽ More

    Submitted 13 April, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

  11. Differentiable SAR Renderer and SAR Target Reconstruction

    Authors: Shilei Fu, Feng Xu

    Abstract: Forward modeling of wave scattering and radar imaging mechanisms is the key to information extraction from synthetic aperture radar (SAR) images. Like inverse graphics in optical domain, an inherently-integrated forward-inverse approach would be promising for SAR advanced information retrieval and target reconstruction. This paper presents such an attempt to the inverse graphics for SAR imagery. A… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

  12. arXiv:2204.03339  [pdf, other

    eess.AS

    Boosting Self-Supervised Embeddings for Speech Enhancement

    Authors: Kuo-Hsuan Hung, Szu-wei Fu, Huan-Hsin Tseng, Hsin-Tien Chiang, Yu Tsao, Chii-Wann Lin

    Abstract: Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) performance on several downstream tasks. However, there remains room for improvement in speech enhancement (SE) tasks. In this study, we used a cross-domain feature to solve the problem that SSL embeddings may lack fine-grained information to regenerate speech signals. By integrating the SSL representatio… ▽ More

    Submitted 5 July, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: accepted to INTERSPEECH-2022

  13. arXiv:2204.03310  [pdf, other

    eess.AS cs.LG cs.SD

    MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

    Authors: Ryandhimas E. Zezario, Szu-wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility sco… ▽ More

    Submitted 30 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  14. arXiv:2203.17152  [pdf, other

    cs.SD cs.CL eess.AS

    Perceptual Contrast Stretching on Target Feature for Speech Enhancement

    Authors: Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

    Abstract: Speech enhancement (SE) performance has improved considerably owing to the use of deep learning models as a base function. Herein, we propose a perceptual contrast stretching (PCS) approach to further improve SE performance. The PCS is derived based on the critical band importance function and is applied to modify the targets of the SE model. Specifically, the contrast of target features is stretc… ▽ More

    Submitted 15 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  15. arXiv:2203.06306  [pdf, other

    eess.IV

    DURRNet: Deep Unfolded Single Image Reflection Removal Network

    Authors: Jun-Jie Huang, Tianrui Liu, Zhixiong Yang, Shao**g Fu, Wentao Zhao, Pier Luigi Dragotti

    Abstract: Single image reflection removal problem aims to divide a reflection-contaminated image into a transmission image and a reflection image. It is a canonical blind source separation problem and is highly ill-posed. In this paper, we present a novel deep architecture called deep unfolded single image reflection removal network (DURRNet) which makes an attempt to combine the best features from model-ba… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

  16. arXiv:2111.05703  [pdf, other

    eess.AS cs.SD

    OSSEM: one-shot speaker adaptive speech enhancement using meta learning

    Authors: Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

    Abstract: Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers. In this study, we propose a novel meta-learning-based speaker-adaptive SE approach (called OSSEM) that aims to achieve SE model adaptation in a one-shot manner. OSSEM consists of a modified tra… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

  17. arXiv:2111.04436  [pdf, other

    cs.SD cs.LG eess.AS

    SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

    Authors: Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

    Abstract: Numerous compression and acceleration strategies have achieved outstanding results on classification tasks in various fields, such as computer vision and speech signal processing. Nevertheless, the same strategies have yielded ungratified performance on regression tasks because the nature between these and classification tasks differs. In this paper, a novel sign-exponent-only floating-point netwo… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

  18. arXiv:2111.02363  [pdf, other

    eess.AS cs.LG cs.SD

    Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

    Authors: Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously. Experimental results show that MOSA-Net can improve the linear correlation coefficient (LCC) by 0.026 (0.990 vs 0.964 in seen noise environments) and 0.012 (0.969 vs 0.957 in unseen noise environments) in PESQ prediction, compared t… ▽ More

    Submitted 23 June, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

  19. arXiv:2110.05866  [pdf

    cs.SD cs.CL eess.AS

    MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

    Authors: Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

    Abstract: Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training. Consequently, several noisy speeches recorded in daily life cannot be used to train the model. Although certain unsupervised learning frameworks have also been proposed to solve the pair constraint, they still require clean s… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  20. DeepGOMIMO: Deep Learning-Aided Generalized Optical MIMO with CSI-Free Blind Detection

    Authors: Xin Zhong, Chen Chen, Shu Fu, Zhihong Zeng, Min Liu

    Abstract: Generalized optical multiple-input multiple-output (GOMIMO) techniques have been recently shown to be promising for high-speed optical wireless communication (OWC) systems. In this paper, we propose a novel deep learning-aided GOMIMO (DeepGOMIMO) framework for GOMIMO systems, where channel state information (CSI)-free blind detection can be enabled by employing a specially designed deep neural net… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  21. Deep Learning-Aided OFDM-Based Generalized Optical Quadrature Spatial Modulation

    Authors: Chen Chen, Lin Zeng, Xin Zhong, Shu Fu, Min Liu, Pengfei Du

    Abstract: In this paper, we propose an orthogonal frequency division multiplexing (OFDM)-based generalized optical quadrature spatial modulation (GOQSM) technique for multiple-input multiple-output optical wireless communication (MIMO-OWC) systems. Considering the error propagation and noise amplification effects when applying maximum likelihood and maximum ratio combining (ML-MRC)-based detection, we furth… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Journal ref: IEEE Photonics Journal, 2022

  22. arXiv:2106.04624  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    SpeechBrain: A General-Purpose Speech Toolkit

    Authors: Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

    Abstract: SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Preprint

  23. Collaborative Multi-Resource Allocation in Terrestrial-Satellite Network Towards 6G

    Authors: Shu Fu, Jie Gao, Lian Zhao

    Abstract: Terrestrial-satellite networks are envisioned to play a significant role in the sixth-generation (6G) wireless networks. In such networks, hot air balloons are useful as they can relay the signals between satellites and ground stations. Most existing works assume that the hot air balloons are deployed at the same height with the same minimum elevation angle to the satellites, which may not be prac… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Journal ref: IEEE Transactions on Wireless Communications, 2021

  24. arXiv:2104.07539  [pdf, other

    cs.LG eess.SY

    Multi-Agent Reinforcement Learning Based Coded Computation for Mobile Ad Hoc Computing

    Authors: Baoqian Wang, Junfei Xie, Kejie Lu, Yan Wan, Shengli Fu

    Abstract: Mobile ad hoc computing (MAHC), which allows mobile devices to directly share their computing resources, is a promising solution to address the growing demands for computing resources required by mobile devices. However, offloading a computation task from a mobile device to other mobile devices is a challenging task due to frequent topology changes and link failures because of node mobility, unsta… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  25. arXiv:2104.03538  [pdf

    cs.SD cs.AI eess.AS

    MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

    Authors: Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

    Abstract: The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discr… ▽ More

    Submitted 4 June, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Accepted by Interspeech 2021

  26. arXiv:2103.12954  [pdf, ps, other

    math.OC cs.LG eess.SY

    Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

    Authors: Shengjun Zhang, Yunlong Dong, Dong Xie, Lisha Yao, Colleen P. Bailey, Shengli Fu

    Abstract: This paper investigates the stochastic distributed nonconvex optimization problem of minimizing a global cost function formed by the summation of $n$ local cost functions. We solve such a problem by involving zeroth-order (ZO) information exchange. In this paper, we propose a ZO distributed primal-dual coordinate method (ZODIAC) to solve the stochastic optimization problem. Agents approximate thei… ▽ More

    Submitted 13 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

  27. arXiv:2011.04292  [pdf

    cs.SD cs.LG eess.AS

    STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

    Authors: Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang

    Abstract: The calculation of most objective speech intelligibility assessment metrics requires clean speech as a reference. Such a requirement may limit the applicability of these metrics in real-world scenarios. To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net. The input and output of STOI-Net are speech spectral features a… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted in APSIPA 2020

  28. arXiv:2010.15174  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement

    Authors: Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

    Abstract: Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e.g. phones and syllables. In this study, we propose a novel phone-fortified perceptual loss (PFPL) that takes phonetic information into account for training SE models. To effectively incorporate the phonetic information… ▽ More

    Submitted 27 April, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

  29. arXiv:2009.11975  [pdf, other

    cs.CV eess.IV

    CoFF: Cooperative Spatial Feature Fusion for 3D Object Detection on Autonomous Vehicles

    Authors: **gda Guo, Dominic Carrillo, Sihai Tang, Qi Chen, Qing Yang, Song Fu, Xi Wang, Nannan Wang, Paparao Palacharla

    Abstract: To reduce the amount of transmitted data, feature map based fusion is recently proposed as a practical solution to cooperative 3D object detection by autonomous vehicles. The precision of object detection, however, may require significant improvement, especially for objects that are far away or occluded. To address this critical issue for the safety of autonomous vehicles and human beings, we prop… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

  30. arXiv:2008.09264  [pdf, other

    eess.AS cs.LG cs.SD

    CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application

    Authors: Yu-Wen Chen, Kuo-Hsuan Hung, You-** Li, Alexander Chao-Fu Kang, Ya-Hsin Lai, Kai-Chun Liu, Szu-Wei Fu, Syu-Siang Wang, Yu Tsao

    Abstract: This study presents a deep learning-based speech signal-processing mobile application known as CITISEN. The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), allowing CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users. For SE, a… ▽ More

    Submitted 25 April, 2022; v1 submitted 20 August, 2020; originally announced August 2020.

  31. arXiv:2006.11139  [pdf, other

    eess.AS

    Waveform-based Voice Activity Detection Exploiting Fully Convolutional networks with Multi-Branched Encoders

    Authors: Cheng Yu, Kuo-Hsuan Hung, I-Fan Lin, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung

    Abstract: In this study, we propose an encoder-decoder structured system with fully convolutional networks to implement voice activity detection (VAD) directly on the time-domain waveform. The proposed system processes the input waveform to identify its segments to be either speech or non-speech. This novel waveform-based VAD algorithm, with a short-hand notation "WVAD", has two main particularities. First,… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  32. arXiv:2006.10296  [pdf

    eess.AS cs.LG cs.SD

    Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

    Authors: Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-** Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

    Abstract: The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To fur… ▽ More

    Submitted 3 March, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted by APSIPA 2020

  33. NOMA for Energy-Efficient LiFi-Enabled Bidirectional IoT Communication

    Authors: Chen Chen, Shu Fu, Xin Jian, Min Liu, Xiong Deng, Zhiguo Ding

    Abstract: In this paper, we consider a light fidelity (LiFi)-enabled bidirectional Internet of Things (IoT) communication system, where visible light and infrared light are used in the downlink and uplink, respectively. In order to improve the energy efficiency (EE) of the bidirectional LiFi-IoT system, non-orthogonal multiple access (NOMA) with a quality-of-service (QoS)-guaranteed optimal power allocation… ▽ More

    Submitted 24 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Journal ref: IEEE Transactions on Communications, 2021

  34. arXiv:2004.00932  [pdf, other

    eess.AS cs.SD

    iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

    Authors: Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi

    Abstract: The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments. In this work, we propose a deep learning-based speech modification method to compensate for the intelligibility loss, with the constraint that the root mean square (RMS) level and duration of the speech signal are maintained before and after modifications. Specifically, we utilize an iMetricGAN… ▽ More

    Submitted 7 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: 5 pages, Submitted to INTERSPEECH 2020

  35. arXiv:2003.00451  [pdf

    eess.IV cs.MM

    Weak Texture Information Map Guided Image Super-resolution with Deep Residual Networks

    Authors: Bo Fu, Liyan Wang, Yuechu Wu, Yufeng Wu, Shilin Fu, Yonggong Ren

    Abstract: Single image super-resolution (SISR) is an image processing task which obtains high-resolution (HR) image from a low-resolution (LR) image. Recently, due to the capability in feature extraction, a series of deep learning methods have brought important crucial improvement for SISR. However, we observe that no matter how deeper the networks are designed, they usually do not have good generalization… ▽ More

    Submitted 18 March, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

  36. arXiv:2002.01255  [pdf, other

    cs.IT cs.NI eess.SP

    Revealing Much While Saying Less: Predictive Wireless for Status Update

    Authors: Zhiyuan Jiang, Zixu Cao, Siyu Fu, Fei Peng, Shan Cao, Shunqing Zhang, Shugong Xu

    Abstract: Wireless communications for status update are becoming increasingly important, especially for machine-type control applications. Existing work has been mainly focused on Age of Information (AoI) optimizations. In this paper, a status-aware predictive wireless interface design, networking and implementation are presented which aim to minimize the status recovery error of a wireless networked system… ▽ More

    Submitted 4 February, 2020; originally announced February 2020.

    Comments: To appear in IEEE INFOCOM 2020

  37. arXiv:2001.03030  [pdf

    eess.SP

    Distributed Brillouin frequency shift extraction via a convolutional neural network

    Authors: Yiqing Chang, Hao Wu, Can Zhao, Li Shen, Songnian Fu, Ming Tang

    Abstract: Distributed optical fiber Brillouin sensors detect the temperature and strain along a fiber according to the local Brillouin frequency shift, which is usually calculated by the measured Brillouin spectrum using Lorentzian curve fitting. In addition, cross-correlation, principal component analysis, and machine learning methods have been proposed for the more efficient extraction of Brillouin freque… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

  38. arXiv:1912.01319  [pdf, other

    cs.IT cs.NI eess.SP

    AI-Assisted Low Information Latency Wireless Networking

    Authors: Zhiyuan Jiang, Siyu Fu, Sheng Zhou, Zhisheng Niu, Shunqing Zhang, Shugong Xu

    Abstract: The 5G Phase-2 and beyond wireless systems will focus more on vertical applications such as autonomous driving and industrial Internet-of-things, many of which are categorized as ultra-Reliable Low-Latency Communications (uRLLC). In this article, an alternative view on uRLLC is presented, that information latency, which measures the distortion of information resulted from time lag of its acquisiti… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: To appear in IEEE Wireless Communications

  39. arXiv:1912.01080  [pdf, other

    cs.NI eess.SP

    Low-Latency High-Level Data Sharing for Connected and Autonomous Vehicular Networks

    Authors: Qi Chen, Sihai Tang, Jacob Hochstetler, **gda Guo, Yuan Li, **bo Xiong, Qing Yang, Song Fu

    Abstract: Autonomous vehicles can combine their own data with that of other vehicles to enhance their perceptive ability, and thus improve detection accuracy and driving safety. Data sharing among autonomous vehicles, however, is a challenging problem due to the sheer volume of data generated by various types of sensors on the vehicles. In this paper, we propose a low-latency, high-level (L3) data sharing p… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

  40. arXiv:1911.09847  [pdf, ps, other

    eess.AS cs.SD eess.SP

    Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

    Authors: Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung

    Abstract: Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources and thus may complicate the SE system. As an alternative source, a bone-conducted speech signal has a moderate data size while ma… ▽ More

    Submitted 17 June, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

    Comments: multi-modal, bone/air-conducted signals, speech enhancement, fully convolutional network

    Journal ref: IEEE Signal Processing Letters, vol. 27, pp. 1035-1039, 2020

  41. arXiv:1910.06585  [pdf, other

    eess.SP

    Hybrid Beamforming/Combining for Millimeter Wave MIMO: A Machine Learning Approach

    Authors: Jiyun Tao, **g Xing, Jienan Chen, Chuan Zhang, Shengli Fu

    Abstract: Hybrid beamforming (HB) has emerged as a promising technology to support ultra high transmission capacity and with low complexity for Millimeter Wave (mmWave) multiple-input and multiple-output (MIMO) system. However, the design of digital and analog beamformer is a challenge task with non-convex optimization, especially for the multi-user scenario. Recently, the blooming of deep learning research… ▽ More

    Submitted 3 January, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: 5 pages, 4 figures

  42. arXiv:1909.11919  [pdf, other

    cs.SD eess.AS

    A Study of Joint Effect on Denoising Techniques and Visual Cues to Improve Speech Intelligibility in Cochlear Implant Simulation

    Authors: Rung-Yu Tseng, Tao-Wei Wang, Szu-Wei Fu, Chia-Ying Lee, Yu Tsao

    Abstract: Speech perception is key to verbal communication. For people with hearing loss, the capability to recognize speech is restricted, particularly in a noisy environment or the situations without visual cues, such as lip-reading unavailable via phone call. This study aimed to understand the improvement of vocoded speech intelligibility in cochlear implant (CI) simulation through two potential methods:… ▽ More

    Submitted 18 December, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

  43. arXiv:1909.11912  [pdf

    cs.SD eess.AS

    Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement

    Authors: Natalie Yu-Hsien Wang, Hsiao-Lan Sharon Wang, Tao-Wei Wang, Szu-Wei Fu, Xugan Lu, Yu Tsao, Hsin-Min Wang

    Abstract: The combined electric and acoustic stimulation (EAS) has demonstrated better speech recognition than conventional cochlear implant (CI) and yielded satisfactory performance under quiet conditions. However, when noise signals are involved, both the electric signal and the acoustic signal may be distorted, thereby resulting in poor recognition performance. To suppress noise effects, speech enhanceme… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

  44. arXiv:1909.11909  [pdf, other

    cs.SD cs.LG eess.AS

    Multichannel Speech Enhancement by Raw Waveform-map** using Fully Convolutional Networks

    Authors: Chang-Le Liu, Sze-Wei Fu, You-** Li, Jen-Wei Huang, Hsin-Min Wang, Yu Tsao

    Abstract: In recent years, waveform-map**-based speech enhancement (SE) methods have garnered significant attention. These methods generally use a deep learning model to directly process and reconstruct speech waveforms. Because both the input and output are in waveform format, the waveform-map**-based SE methods can overcome the distortion caused by imperfect phase estimation, which may be encountered… ▽ More

    Submitted 24 February, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: Accepted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  45. arXiv:1906.01078  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

    Authors: Jyun-Yi Wu, Cheng Yu, Szu-Wei Fu, Chih-Ting Liu, Shao-Yi Chien, Yu Tsao

    Abstract: Most recent studies on deep learning based speech enhancement (SE) focused on improving denoising performance. However, successful SE applications require striking a desirable balance between denoising performance and computational cost in real scenarios. In this study, we propose a novel parameter pruning (PP) technique, which removes redundant channels in a neural network. In addition, a paramet… ▽ More

    Submitted 31 July, 2019; v1 submitted 31 May, 2019; originally announced June 2019.

    Comments: 4pages, 6 figures

  46. arXiv:1905.04874  [pdf, other

    cs.SD cs.LG eess.AS

    MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

    Authors: Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin

    Abstract: Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores. To overcome this issue, we propose a novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics. Mor… ▽ More

    Submitted 13 May, 2019; originally announced May 2019.

    Comments: Accepted by Thirty-sixth International Conference on Machine Learning (ICML) 2019

  47. arXiv:1905.01898  [pdf

    cs.SD cs.LG eess.AS

    Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

    Authors: Szu-Wei Fu, Chien-Feng Liao, Yu Tsao

    Abstract: Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently. The main reason is that the conventional mean squared error (MSE) loss cannot represent auditory perception well. One of the typical hu-man-perception-related metrics, which is the perceptual evaluation of speech quality (PESQ), has been proven to provide a high correlat… ▽ More

    Submitted 14 November, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: Accepted by IEEE Signal Processing Letters (SPL)

  48. MOSNet: Deep Learning based Objective Assessment for Voice Conversion

    Authors: Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

    Abstract: Existing objective evaluation metrics for voice conversion (VC) are not always correlated with human perception. Therefore, training VC models with such criteria may not effectively improve naturalness and similarity of converted speech. In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech. We adopt the convolutional and recurrent neural netw… ▽ More

    Submitted 14 July, 2021; v1 submitted 17 April, 2019; originally announced April 2019.

    Comments: Accepted to Interspeech2019

  49. arXiv:1812.05683  [pdf

    eess.SP physics.optics

    Crosstalk Impacts on Homogeneous Weakly-Coupled Multicore Fiber Based IM/DD System

    Authors: Lin Gan, Jiajun Zhou, Liang Huo, Li Shen, Chen Yang, Weijun Tong, Songnian Fu, Ming Tang, Deming Liu

    Abstract: We numerically discussed crosstalk impacts on homogeneous weakly-coupled multicore fiber based intensity modulation/direct-detection (IM/DD) systems taking into account mean crosstalk power fluctuation, walk-off between cores, laser frequency offset, and laser linewidth.

    Submitted 12 November, 2018; originally announced December 2018.

    Comments: 3 pages, 11 figures;

    Journal ref: Asia Communications and Photonics Conference (ACP 2018), Su1D.8

  50. High-speed PAM4-based Optical SDM Interconnects with Directly Modulated Long-wavelength VCSEL

    Authors: Joris Van Kerrebrouck, Xiaodan Pang, Oskars Ozolins, Rui Lin, Aleksejs Udalcovs, Lu Zhang, Haolin Li, Silvia Spiga, Markus-Christian Amann, Lin Gan, Ming Tang, Songnian Fu, Richard Schatz, Gunnar Jacobsen, Sergei Popov, Deming Liu, Weijun Tong, Guy Torfs, Johan Bauwelinck, Jiajia Chen, Xin Yin

    Abstract: This paper reports the demonstration of high-speed PAM-4 transmission using a 1.5-μm single-mode vertical cavity surface emitting laser (SM-VCSEL) over multicore fiber with 7 cores over different distances. We have successfully generated up to 70 Gbaud 4-level pulse amplitude modulation (PAM-4) signals with a VCSEL in optical back-to-back, and transmitted 50 Gbaud PAM-4 signals over both 1-km disp… ▽ More

    Submitted 13 November, 2018; originally announced December 2018.

    Comments: 7 pages, accepted to publication in 'Journal of Lightwave Technology (JLT)