Skip to main content

Showing 1–50 of 64 results for author: Wu, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.15754  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    Multimodal Segmentation for Vocal Tract Modeling

    Authors: Rishi Jain, Bohan Yu, Peter Wu, Tejas Prabhune, Gopala Anumanchipalli

    Abstract: Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  2. arXiv:2406.12998  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Articulatory Encodec: Vocal Tract Kinematics as a Codec for Speech

    Authors: Cheol Jun Cho, Peter Wu, Tejas S. Prabhune, Dhruv Agarwal, Gopala K. Anumanchipalli

    Abstract: Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- articulatory encodec. The articulatory encod… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2405.15153  [pdf, other

    eess.SP

    Optimal Reference Nodes Deployment for Positioning Seafloor Anchor Nodes

    Authors: Wei Huang, Pengfei Wu, Tianhe Xu, Hao Zhang, Kaitao Meng

    Abstract: Seafloor anchor nodes, which form a geodetic network, are designed to provide surface and underwater users with positioning, navigation and timing (PNT) services. Due to the non-uniform distribution of underwater sound speed, accurate positioning of underwater anchor nodes is a challenge work. Traditional anchor node positioning typically uses cross or circular shapes, however, how to optimize the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2404.14132  [pdf, other

    cs.CV eess.IV

    CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

    Authors: Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024 Workshop, Code: https://github.com/CalvinYang0/CRNet

  5. arXiv:2404.13537  [pdf, other

    eess.IV cs.CV

    Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

    Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

  6. arXiv:2312.15668  [pdf, ps, other

    cs.IT eess.SP

    Air-to-Ground Communications Beyond 5G: UAV Swarm Formation Control and Tracking

    Authors: Xiao Fan, Peiran Wu, Minghua Xia

    Abstract: Unmanned aerial vehicle (UAV) communications have been widely accepted as promising technologies to support air-to-ground communications in the forthcoming sixth-generation (6G) wireless networks. This paper proposes a novel air-to-ground communication model consisting of aerial base stations served by UAVs and terrestrial user equipments (UEs) by integrating the technique of coordinated multi-poi… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 14 pages, 9 figures, to appear in IEEE TWC

  7. arXiv:2312.12810  [pdf, other

    eess.AS cs.SD

    Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection

    Authors: Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli

    Abstract: Dysfluent speech modeling requires time-accurate and silence-aware transcription at both the word-level and phonetic-level. However, current research in dysfluency modeling primarily focuses on either transcription or detection, and the performance of each aspect remains limited. In this work, we present an unconstrained dysfluency modeling (UDM) approach that addresses both transcription and dete… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 2023 ASRU

  8. arXiv:2312.09034  [pdf, other

    eess.AS cs.SD eess.IV

    Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection

    Authors: Davide Berghi, Peipei Wu, **zheng Zhao, Wenwu Wang, Philip J. B. Jackson

    Abstract: Sound event localization and detection (SELD) combines two subtasks: sound event detection (SED) and direction of arrival (DOA) estimation. SELD is usually tackled as an audio-only problem, but visual information has been recently included. Few audio-visual (AV)-SELD works have been published and most employ vision via face/object bounding boxes, or human pose keypoints. In contrast, we explore th… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  9. arXiv:2312.01566  [pdf, other

    physics.med-ph eess.IV

    Coronary Atherosclerotic Plaque Characterization with Photon-counting CT: a Simulation-based Feasibility Study

    Authors: Mengzhou Li, Mingye Wu, Jed Pack, Pengwei Wu, Bruno De Man, Adam Wang, Koen Nieman, Ge Wang

    Abstract: Recent development of photon-counting CT (PCCT) brings great opportunities for plaque characterization with much-improved spatial resolution and spectral imaging capability. While existing coronary plaque PCCT imaging results are based on detectors made of CZT or CdTe materials, deep-silicon photon-counting detectors have unique performance characteristics and promise distinct imaging capabilities… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: 13 figures, 5 tables

  10. arXiv:2311.09537  [pdf, other

    cs.SD eess.AS eess.SP

    Future Full-Ocean Deep SSPs Prediction based on Hierarchical Long Short-Term Memory Neural Networks

    Authors: Jiajun Lu, Hao Zhang, Pengfei Wu, Sijia Li, Wei Huang

    Abstract: The spatial-temporal distribution of underwater sound velocity affects the propagation mode of underwater acoustic signals. Therefore, rapid estimation and prediction of underwater sound velocity distribution is crucial for providing underwater positioning, navigation and timing (PNT) services. Currently, sound speed profile (SSP) inversion methods have a faster time response rate compared to dire… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2310.09522

  11. arXiv:2310.16287  [pdf, other

    cs.SD cs.GR eess.AS

    Towards Streaming Speech-to-Avatar Synthesis

    Authors: Tejas S. Prabhune, Peter Wu, Bohan Yu, Gopala K. Anumanchipalli

    Abstract: Streaming speech-to-avatar synthesis creates real-time animations for a virtual character from audio data. Accurate avatar representations of speech are important for the visualization of sound in linguistics, phonetics, and phonology, visual feedback to assist second language acquisition, and virtual embodiment for paralyzed patients. Previous works have highlighted the capability of deep articul… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  12. arXiv:2310.14778  [pdf, other

    cs.MM cs.SD eess.AS

    Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

    Authors: **zheng Zhao, Yong Xu, Xinyuan Qian, Davide Berghi, Peipei Wu, Meng Cui, Jianyuan Sun, Philip J. B. Jackson, Wenwu Wang

    Abstract: Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we condu… ▽ More

    Submitted 17 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  13. arXiv:2310.08251  [pdf, other

    eess.SP

    Underwater Sound Speed Profile Construction: A Review

    Authors: Wei Huang, Jixuan Zhou, Fan Gao, Jiajun Lu, Sijia Li, Pengfei Wu, Junting Wang, Hao Zhang, Tianhe Xu

    Abstract: Real--time and accurate construction of regional sound speed profiles (SSP) is important for building underwater positioning, navigation, and timing (PNT) systems as it greatly affect the signal propagation modes such as trajectory. In this paper, we summarizes and analyzes the current research status in the field of underwater SSP construction, and the mainstream methods include direct SSP measur… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  14. arXiv:2310.02497  [pdf, other

    cs.SD cs.LG eess.AS

    Towards an Interpretable Representation of Speaker Identity via Perceptual Voice Qualities

    Authors: Robin Netzorg, Bohan Yu, Andrea Guzman, Peter Wu, Luna McNulty, Gopala Anumanchipalli

    Abstract: Unlike other data modalities such as text and vision, speech does not lend itself to easy interpretation. While lay people can understand how to describe an image or sentence via perception, non-expert descriptions of speech often end at high-level demographic information, such as gender or age. In this paper, we propose a possible interpretable representation of speaker identity based on perceptu… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  15. arXiv:2309.07861  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    CiwaGAN: Articulatory information exchange

    Authors: Gašper Beguš, Thomas Lu, Alan Zhou, Peter Wu, Gopala K. Anumanchipalli

    Abstract: Humans encode information into sounds by controlling articulators and decode information from sounds using the auditory apparatus. This paper introduces CiwaGAN, a model of human spoken language acquisition that combines unsupervised articulatory modeling with an unsupervised model of information exchange through the auditory modality. While prior research includes unsupervised articulatory modeli… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  16. arXiv:2308.05262  [pdf, other

    eess.SP

    Robust Interference Mitigation techniques for Direct Position Estimation

    Authors: Haoqing Li, Shuo Tang, Peng Wu, Pau Closas

    Abstract: Global Navigation Satellite System (GNSS) is pervasive in navigation and positioning applications, where precise position and time referencing estimations are required. Conventional methods for GNSS positioning involve a two-step process, where intermediate measurements such as Doppler shift and time delay of received GNSS signals are computed and then used to solve for the receiver's position. Al… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  17. arXiv:2308.03420  [pdf

    eess.SY

    A Safe DRL Method for Fast Solution of Real-Time Optimal Power Flow

    Authors: Pengfei Wu, Chen Chen, Dexiang Lai, Jian Zhong

    Abstract: High-level penetration of intermittent renewable energy sources (RESs) has introduced significant uncertainties into modern power systems. In order to rapidly and economically respond to the fluctuations of power system operating state, this paper proposes a safe deep reinforcement learning (SDRL) based method for fast solution of real-time optimal power flow (RT-OPF) problems. The proposed method… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  18. arXiv:2307.16096  [pdf, ps, other

    cs.IT eess.SP

    D-STAR: Dual Simultaneously Transmitting and Reflecting Reconfigurable Intelligent Surfaces for Joint Uplink/Downlink Transmission

    Authors: Li-Hsiang Shen, Po-Chen Wu, Chia-Jou Ku, Yu-Ting Li, Kai-Ten Feng, Yuanwei Liu, Lajos Hanzo

    Abstract: The joint uplink/downlink (JUD) design of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) is conceived in support of both uplink (UL) and downlink (DL) users. Furthermore, the dual STAR-RISs (D-STAR) concept is conceived as a promising architecture for 360-degree full-plane service coverage, including UL/DL users located between the base station (BS) and t… ▽ More

    Submitted 8 February, 2024; v1 submitted 29 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE TCOM

  19. arXiv:2307.02471  [pdf, other

    eess.AS

    Deep Speech Synthesis from MRI-Based Articulatory Representations

    Authors: Peter Wu, Tingle Li, Yi**g Lu, Yubin Zhang, Jiachen Lian, Alan W Black, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

    Abstract: In this paper, we study articulatory synthesis, a speech synthesis method using human vocal tract information that offers a way to develop efficient, generalizable and interpretable synthesizers. While recent advances have enabled intelligible articulatory synthesis using electromagnetic articulography (EMA), these methods lack critical articulatory information like excitation and nasality, limiti… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  20. arXiv:2306.13558  [pdf, other

    eess.SP

    One-Bit Spectrum Sensing for Cognitive Radio

    Authors: Pei-Wen Wu, Lei Huang, David Ramírez, Yu-Hang Xiao, Hing Cheung So

    Abstract: Spectrum sensing in cognitive radio necessitates effective monitoring of wide bandwidths, which requires high-rate sampling. Traditional spectrum sensing methods employing high-precision analog-to-digital converters (ADCs) result in increased power consumption and expensive hardware costs. In this paper, we explore blind spectrum sensing utilizing one-bit ADCs. We derive a closed-form detector bas… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  21. arXiv:2306.10359  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Text-Driven Foley Sound Generation With Latent Diffusion Model

    Authors: Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu Wang

    Abstract: Foley sound generation aims to synthesise the background sound for multimedia content. Previous models usually employ a large development set with labels as input (e.g., single numbers or one-hot vector). In this work, we propose a diffusion model based system for Foley sound generation with text conditions. To alleviate the data scarcity issue, our model is initially pre-trained with large-scale… ▽ More

    Submitted 18 September, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

    Comments: Submit to DCASE-workshop 2023, an extension and supersedes the previous technical report arXiv:2305.15905

  22. arXiv:2305.17896  [pdf, other

    eess.SP

    Continuous and Noninvasive Measurement of Arterial Pulse Pressure and Pressure Waveform using an Image-free Ultrasound System

    Authors: Lirui Xu, Pang Wu, Pan Xia, Fanglin Geng, Peng Wang, Xianxiang Chen, Zhenfeng Li, Lidong Du, Shu** Liu, Li Li, Hongbo Chang, Zhen Fang

    Abstract: The local beat-to-beat local pulse pressure (PP) and blood pressure waveform of arteries, especially central arteries, are important indicators of the course of cardiovascular diseases (CVDs). Nevertheless, noninvasive measurement of them remains a challenge in the clinic. This work presents a three-element image-free ultrasound system with a low-computational method for real-time measurement of l… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 13 pages, 12 figures

  23. arXiv:2305.17499  [pdf, other

    cs.CL cs.MM eess.AS

    CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training

    Authors: Linhao Dong, Zhecheng An, Peihao Wu, Jun Zhang, Lu Lu, Zejun Ma

    Abstract: Speech or text representation generated by pre-trained models contains modal-specific information that could be combined for benefiting spoken language understanding (SLU) tasks. In this work, we propose a novel pre-training paradigm termed Continuous Integrate-and-Fire Pre-Training (CIF-PT). It relies on a simple but effective frame-to-token alignment: continuous integrate-and-fire (CIF) to bridg… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 Findings

  24. arXiv:2305.00383  [pdf, other

    cs.IT eess.SP

    Edge Learning for Large-Scale Internet of Things With Task-Oriented Efficient Communication

    Authors: Haihui Xie, Minghua Xia, Peiran Wu, Shuai Wang, H. Vincent Poor

    Abstract: In the Internet of Things (IoT) networks, edge learning for data-driven tasks provides intelligent applications and services. As the network size becomes large, different users may generate distinct datasets. Thus, to suit multiple edge learning tasks for large-scale IoT networks, this paper performs efficient communication under the task-oriented principle by using the collaborative design of wir… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: 16 pages, 8 figures; accepted for publication in IEEE TWC

  25. arXiv:2302.06774  [pdf, other

    eess.AS cs.SD

    Speaker-Independent Acoustic-to-Articulatory Speech Inversion

    Authors: Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

    Abstract: To build speech processing methods that can handle speech as naturally as humans, researchers have explored multiple ways of building an invertible map** from speech to an interpretable space. The articulatory space is a promising inversion target, since this space captures the mechanics of speech production. To this end, we build an acoustic-to-articulatory inversion (AAI) model that leverages… ▽ More

    Submitted 24 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  26. arXiv:2211.00968  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation

    Authors: Rao Ma, Xiaobo Wu, ** Qiu, Yanan Qin, Haihua Xu, Peihao Wu, Zejun Ma

    Abstract: ASR model deployment environment is ever-changing, and the incoming speech can be switched across different domains during a session. This brings a challenge for effective domain adaptation when only target domain text data is available, and our objective is to obtain obviously improved performance on the target domain while the performance on the general domain is less undermined. In this paper,… ▽ More

    Submitted 2 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP 2023

  27. arXiv:2210.15272  [pdf, ps, other

    eess.AS cs.SD eess.SP

    A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution

    Authors: Yisi Liu, Peter Wu, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Estimation of fundamental frequency (F0) in voiced segments of speech signals, also known as pitch tracking, plays a crucial role in pitch synchronous speech analysis, speech synthesis, and speech manipulation. In this paper, we capitalize on the high time and frequency resolution of the pseudo Wigner-Ville distribution (PWVD) and propose a new PWVD-based pitch estimation method. We devise an effi… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  28. arXiv:2210.15173  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Articulation GAN: Unsupervised modeling of articulatory learning

    Authors: Gašper Beguš, Alan Zhou, Peter Wu, Gopala K Anumanchipalli

    Abstract: Generative deep neural networks are widely used for speech synthesis, but most existing models directly generate waveforms or spectral outputs. Humans, however, produce speech by controlling articulators, which results in the production of speech sounds through physical properties of sound propagation. We introduce the Articulatory Generator to the Generative Adversarial Network paradigm, a new un… ▽ More

    Submitted 12 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

  29. Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

    Authors: Cheol Jun Cho, Peter Wu, Abdelrahman Mohamed, Gopala K. Anumanchipalli

    Abstract: Recent self-supervised learning (SSL) models have proven to learn rich representations of speech, which can readily be utilized by diverse downstream tasks. To understand such utilities, various analyses have been done for speech SSL models to reveal which and how information is encoded in the learned representations. Although the scope of previous analyses is extensive in acoustic, phonetic, and… ▽ More

    Submitted 20 July, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

  30. arXiv:2209.06337  [pdf, other

    eess.AS cs.SD q-bio.QM

    Deep Speech Synthesis from Articulatory Representations

    Authors: Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

    Abstract: In the articulatory synthesis task, speech is synthesized from input features containing information about the physical behavior of the human vocal tract. This task provides a promising direction for speech synthesis research, as the articulatory space is compact, smooth, and interpretable. Current works have highlighted the potential for deep learning models to perform articulatory synthesis. How… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  31. arXiv:2208.08433  [pdf, other

    cs.CR cs.HC cs.LG eess.SP

    Label Flip** Data Poisoning Attack Against Wearable Human Activity Recognition System

    Authors: Abdur R. Shahid, Ahmed Imteaj, Peter Y. Wu, Diane A. Igoche, Tauhidul Alam

    Abstract: Human Activity Recognition (HAR) is a problem of interpreting sensor data to human movement using an efficient machine learning (ML) approach. The HAR systems rely on data from untrusted users, making them susceptible to data poisoning attacks. In a poisoning attack, attackers manipulate the sensor readings to contaminate the training set, misleading the HAR to produce erroneous outcomes. This pap… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: Submitted to IEEE SSCI 2022 Conference

  32. arXiv:2205.04029  [pdf, other

    cs.SD cs.MM eess.AS

    Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

    Authors: Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Frank Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin **

    Abstract: This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training,… ▽ More

    Submitted 2 July, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: Accepted by Interspeech

  33. arXiv:2112.14633  [pdf, other

    cs.IT eess.SP

    Bayesian Compressive Channel Estimation for Hybrid Full-Dimensional MIMO Communications

    Authors: Hongqing Huang, Peiran Wu, Minghua Xia

    Abstract: Efficient channel estimation is challenging in full-dimensional multiple-input multiple-output communication systems, particularly in those with hybrid digital-analog architectures. Under a compressive sensing framework, this letter first designs a uniform dictionary based on a spherical Fibonacci grid to represent channels in a sparse domain, yielding smaller angular errors in three-dimensional b… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 5 pages, 5 figures, submitted for possible publication

  34. arXiv:2111.15636  [pdf

    eess.SP cs.AI stat.AP

    Generating gapless land surface temperature with a high spatio-temporal resolution by fusing multi-source satellite-observed and model-simulated data

    Authors: Jun Ma, Huanfeng Shen, Penghai Wu, **gan Wu, Meiling Gao, Chunlei Meng

    Abstract: Land surface temperature (LST) is a key parameter when monitoring land surface processes. However, cloud contamination and the tradeoff between the spatial and temporal resolutions greatly impede the access to high-quality thermal infrared (TIR) remote sensing data. Despite the massive efforts made to solve these dilemmas, it is still difficult to generate LST estimates with concurrent spatial com… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

  35. arXiv:2111.03283  [pdf, other

    eess.SY cs.RO

    Cooperative Transportation of UAVs Without Inter-UAV Communication

    Authors: Pin-Xian Wu, Cheng-Cheng Yang, Teng-Hu Cheng

    Abstract: A leader-follower system is developed for cooperative transportation. To the best of our knowledge, this is the first work that inter-UAV communication is not required and the reference trajectory of the payload can be modified in real time, so that it can be applied to a dynamically changing environment. To track the modified reference trajectory in real time under the communication-free conditio… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

  36. arXiv:2111.01326  [pdf, other

    eess.AS cs.CL cs.SD

    Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

    Authors: Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W Black

    Abstract: Speech processing systems currently do not support the vast majority of languages, in part due to the lack of data in low-resource languages. Cross-lingual transfer offers a compelling way to help bridge this digital divide by incorporating high-resource data into low-resource systems. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-re… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  37. arXiv:2110.07840  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet2-TTS: Extending the Edge of TTS Research

    Authors: Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe

    Abstract: This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit. ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features, including: on-the-fly flexible pre-processing, joint training with neural vocoders, and state-of-the-art TTS models with extensions like full-band E2E text-to-waveform modeling, which simplify the training pipeline and further enhance T… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP2022. Demo HP: https://espnet.github.io/icassp2022-tts/

  38. arXiv:2110.04153  [pdf, other

    eess.AS cs.SD

    Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

    Authors: Pengfei Wu, Junjie Pan, Chenchang Xu, Junhui Zhang, Lin Wu, Xiang Yin, Zejun Ma

    Abstract: In expressive speech synthesis, there are high requirements for emotion interpretation. However, it is time-consuming to acquire emotional audio corpus for arbitrary speakers due to their deduction ability. In response to this problem, this paper proposes a cross-speaker emotion transfer method that can realize the transfer of emotions from source speaker to target speaker. A set of emotion tokens… ▽ More

    Submitted 10 October, 2021; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022, 5 pages,2 figures

  39. arXiv:2110.02515  [pdf, ps, other

    cs.IT eess.SP

    A Sparsity Adaptive Algorithm to Recover NB-IoT Signal from Legacy LTE Interference

    Authors: Yijia Guo, Wenkun Wen, Peiran Wu, Minghua Xia

    Abstract: As a forerunner in 5G technologies, Narrowband Internet of Things (NB-IoT) will be inevitably coexisting with the legacy Long-Term Evolution (LTE) system. Thus, it is imperative for NB-IoT to mitigate LTE interference. By virtue of the strong temporal correlation of the NB-IoT signal, this letter develops a sparsity adaptive algorithm to recover the NB-IoT signal from legacy LTE interference, by c… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: 5 pages, 7 figures, to appear in IEEE Wireless Communications Letters

  40. arXiv:2110.02513  [pdf, ps, other

    cs.IT eess.SP

    UGV-assisted Wireless Powered Backscatter Communications for Large-Scale IoT Networks

    Authors: Erhu Chen, Peiran Wu, Yik-Chung Wu, Minghua Xia

    Abstract: Wireless powered backscatter communications (WPBC) is capable of implementing ultra-low-power communication, thus promising in the Internet of Things (IoT) networks. In practice, however, it is challenging to apply WPBC in large-scale IoT networks because of its short communication range. To address this challenge, this paper exploits an unmanned ground vehicle (UGV) to assist WPBC in large-scale… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: 15 pages, 7 figures, to appear in IEEE Transactions on Wireless Communications

  41. arXiv:2109.04606  [pdf, other

    cs.RO eess.SY

    Probabilistic Guaranteed Path Planning for Safe Urban Air Mobility Using Chance Constrained RRT

    Authors: Pengcheng Wu, Lin Li, Junfei Xie, Jun Chen

    Abstract: Safety is a critical concern for the success of urban air mobility, especially in dynamic and uncertain environments. This paper proposes a path planning algorithm based on RRT in conjunction with chance constraints in the presence of uncertain obstacles. The chance-constrained formulation for Gaussian distributed obstacles is developed by converting the probabilistic constraints to deterministic… ▽ More

    Submitted 11 March, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: 8 pages, AIAA conference article

    MSC Class: AIAA

  42. arXiv:2108.09229  [pdf

    physics.med-ph eess.IV

    Using Uncertainty in Deep Learning Reconstruction for Cone-Beam CT of the Brain

    Authors: Pengwei Wu, Alejandro Sisniega, Ali Uneri, Runze Han, Craig Jones, Prasad Vagdargi, Xiaoxuan Zhang, Mark Luciano, William Anderson, Jeffrey Siewerdsen

    Abstract: Contrast resolution beyond the limits of conventional cone-beam CT (CBCT) systems is essential to high-quality imaging of the brain. We present a deep learning reconstruction method (dubbed DL-Recon) that integrates physically principled reconstruction models with DL-based image synthesis based on the statistical uncertainty in the synthesis image. A synthesis network was developed to generate a s… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: This work was presented at the 16th International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine (Fully3D), July 19-23, 2021, Leuven, Belgium

  43. arXiv:2108.00256  [pdf, other

    eess.SY

    An Intelligent Energy Management Framework for Hybrid-Electric Propulsion Systems Using Deep Reinforcement Learning

    Authors: Peng Wu, Julius Partridge, Enrico Anderlini, Yuanchang Liu, Richard Bucknall

    Abstract: Hybrid-electric propulsion systems powered by clean energy derived from renewable sources offer a promising approach to decarbonise the world's transportation systems. Effective energy management systems are critical for such systems to achieve optimised operational performance. However, develo** an intelligent energy management system for applications such as ships operating in a highly stochas… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

  44. arXiv:2107.04189  [pdf, other

    cs.LG eess.SP

    Personalized Federated Learning over non-IID Data for Indoor Localization

    Authors: Peng Wu, Tales Imbiriba, Junha Park, Sunwoo Kim, Pau Closas

    Abstract: Localization and tracking of objects using data-driven methods is a popular topic due to the complexity in characterizing the physics of wireless channel propagation models. In these modeling approaches, data needs to be gathered to accurately train models, at the same time that user's privacy is maintained. An appealing scheme to cooperatively achieve these goals is known as Federated Learning (F… ▽ More

    Submitted 21 July, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

  45. arXiv:2105.08819  [pdf, other

    eess.IV cs.CV cs.LG

    Fast and Accurate Quantized Camera Scene Detection on Smartphones, Mobile AI 2021 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Sheng Chen, Xin Xia, Zhaoyan Liu, Yuwei Zhang, Feng Zhu, Jiashi Li, Xuefeng Xiao, Yuan Tian, Xinglong Wu, Christos Kyrkou, Yixin Chen, Zexin Zhang, Yunbo Peng, Yue Lin, Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Himanshu Kumar, Chao Ge, Pei-Lin Wu, **-Hua Du, Andrew Batutin , et al. (6 additional authors not shown)

    Abstract: Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions th… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: substantial text overlap with arXiv:2105.08630; text overlap with arXiv:2105.07825, arXiv:2105.07809, arXiv:2105.08629

  46. arXiv:2105.08630  [pdf, other

    eess.IV cs.CV cs.LG

    Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, **-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu , et al. (13 additional authors not shown)

    Abstract: Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based d… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: text overlap with arXiv:2105.07809

  47. arXiv:2101.08919  [pdf, other

    eess.AS cs.CR cs.SD

    Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks

    Authors: Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, Louis-Philippe Morency

    Abstract: As users increasingly rely on cloud-based computing services, it is important to ensure that uploaded speech data remains private. Existing solutions rely either on server-side methods or focus on hiding speaker identity. While these approaches reduce certain security concerns, they do not give users client-side control over whether their biometric information is sent to the server. In this paper,… ▽ More

    Submitted 22 October, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

  48. arXiv:2012.00876  [pdf, other

    cs.CL eess.AS

    Automatically Identifying Language Family from Acoustic Examples in Low Resource Scenarios

    Authors: Peter Wu, Yifan Zhong, Alan W Black

    Abstract: Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches. In this work, we propose a method to analyze language similarity using deep learning. Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language fam… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  49. arXiv:2011.02563  [pdf, other

    eess.SY

    Fault-Tolerant Individual Pitch Control of Floating Offshore Wind Turbines via Subspace Predictive Repetitive Control

    Authors: Yichao Liu, Joeri Frederik, Riccardo M. G. Ferrari, ** Wu, Sunwei Li, Jan-Willem van Wingerden

    Abstract: Individual Pitch Control (IPC) is an effective and widely-used strategy to mitigate blade loads in wind turbines. However, conventional IPC fails to cope with blade and actuator faults, and this situation may lead to an emergency shutdown and increased maintenance costs. In this paper, a Fault-Tolerant Individual Pitch Control (FTIPC) scheme is developed to accommodate these faults in Floating Off… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

  50. Fast Adaptive Fault Accommodation in Floating Offshore Wind Turbines via Model-Based Fault Diagnosis and Subspace Predictive Repetitive Control

    Authors: Yichao Liu, ** Wu, Riccardo M. G. Ferrari, Jan-Willem van Wingerden

    Abstract: As Floating Offshore Wind Turbines (FOWTs) operate in deep waters and are subjected to stressful wind and wave induced loads, they are more prone than onshore counterparts to experience faults and failure. In particular, the pitch system may experience Pitch Actuator Stuck (PAS) type of faults, which will result in a complete loss of control authority. In this paper, a novel fast and adaptive solu… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: IFAC World Congress 2020