Skip to main content

Showing 1–40 of 40 results for author: Gao, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.16987  [pdf

    eess.SP cs.LG

    AI for Equitable Tennis Training: Leveraging AI for Equitable and Accurate Classification of Tennis Skill Levels and Training Phases

    Authors: Gyanna Gao, Hao-Yu Liao, Zhenhong Hu

    Abstract: Numerous studies have demonstrated the manifold benefits of tennis, such as increasing overall physical and mental health. Unfortunately, many children and youth from low-income families are unable to engage in this sport mainly due to financial constraints such as private lesson expenses as well as logistical concerns to and back from such lessons and clinics. While several tennis self-training s… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 21 pages, 9 figures, 1 table

  2. arXiv:2406.16927  [pdf

    eess.SP

    Anomaly Detection Utilizing a Riemann Metric for Robust Myoelectric Pattern Recognition

    Authors: ZongYe Hu, Ge Gao, Xiang Chen, Xu Zhang

    Abstract: Traditional myoelectric pattern recognition (MPR) systems excel within controlled laboratory environments but they are interfered when confronted with anomaly or novel motions not encountered during the training phase. Utilizing metric ways to distinguish the target and novel motions based on extractors compared to training set is a prevalent idea to alleviate such interference. An innovative meth… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2406.16873  [pdf, other

    eess.SP cs.AI cs.LG cs.RO

    A Survey of Machine Learning Techniques for Improving Global Navigation Satellite Systems

    Authors: Adyasha Mohanty, Grace Gao

    Abstract: Global Navigation Satellite Systems (GNSS)-based positioning plays a crucial role in various applications, including navigation, transportation, logistics, map**, and emergency services. Traditional GNSS positioning methods are model-based and they utilize satellite geometry and the known properties of satellite signals. However, model-based methods have limitations in challenging environments a… ▽ More

    Submitted 29 March, 2024; originally announced June 2024.

    Comments: Under consideration for EURASIP Journal on Advances in Signal Processing

  4. arXiv:2406.07061  [pdf, other

    eess.IV cs.CV

    Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments

    Authors: Gan Gao, Andrew H. Song, Fiona Wang, David Brenes, Rui Wang, Sarah S. L. Chow, Kevin W. Bishop, Lawrence D. True, Faisal Mahmood, Jonathan T. C. Liu

    Abstract: Accurate patient diagnoses based on human tissue biopsies are hindered by current clinical practice, where pathologists assess only a limited number of thin 2D tissue slices sectioned from 3D volumetric tissue. Recent advances in non-destructive 3D pathology, such as open-top light-sheet microscopy, enable comprehensive imaging of spatially heterogeneous tissue morphologies, offering the feasibili… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR CVMI 2024

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 6955-6965

  5. arXiv:2404.12629  [pdf, other

    eess.SP

    Spreading Code Optimization for Low-Earth Orbit Satellites via Mixed-Integer Convex Programming

    Authors: Alan Yang, Tara Mina, Grace Gao

    Abstract: Optimizing the correlation properties of spreading codes is critical for minimizing inter-channel interference in satellite navigation systems. By improving the codes' correlation sidelobes, we can enhance navigation performance while minimizing the required spreading code lengths. In the case of low earth orbit (LEO) satellite navigation, shorter code lengths (on the order of a hundred) are prefe… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  6. Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation

    Authors: Tianhao Peng, Ge Gao, Heming Sun, Fan Zhang, David Bull

    Abstract: In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Report number: 2312.02605

  7. arXiv:2307.04692  [pdf, other

    eess.SP cs.RO eess.SY

    Spoofing-Resilient LiDAR-GPS Factor Graph Localization with Chimera Authentication

    Authors: Adam Dai, Tara Minda, Ashwin Kanhere, Grace Gao

    Abstract: Many vehicle platforms typically use sensors such as LiDAR or camera for locally-referenced navigation with GPS for globally-referenced navigation. However, due to the unencrypted nature of GPS signals, all civilian users are vulner-able to spoofing attacks, where a malicious spoofer broadcasts fabricated signals and causes the user to track a false position fix. To protect against such GPS spoofi… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  8. HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation

    Authors: Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

    Abstract: Learning-based video compression is currently a popular research topic, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed… ▽ More

    Submitted 26 January, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

  9. arXiv:2301.00657  [pdf, other

    eess.AS cs.AI cs.CL

    MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset

    Authors: Kailin Liang, Bin Liu, Yifan Hu, Rui Liu, Feilong Bao, Guanglai Gao

    Abstract: Text-to-Speech (TTS) synthesis for low-resource languages is an attractive research issue in academia and industry nowadays. Mongolian is the official language of the Inner Mongolia Autonomous Region and a representative low-resource language spoken by over 10 million people worldwide. However, there is a relative lack of open-source datasets for Mongolian TTS. Therefore, we make public an open-so… ▽ More

    Submitted 11 December, 2022; originally announced January 2023.

    Comments: Accepted by NCMMSC'2022 (https://ncmmsc2022.ustc.edu.cn/main.htm)

  10. arXiv:2211.00285  [pdf, other

    eess.SP

    Binary sequence set optimization for CDMA applications via mixed-integer quadratic programming

    Authors: Alan Yang, Tara Mina, Grace Gao

    Abstract: Finding sets of binary sequences with low auto- and cross-correlation properties is a hard combinatorial optimization problem with numerous applications, including multiple-input-multiple-output (MIMO) radar and global navigation satellite systems (GNSS). The sum of squared correlations, sometimes referred to as the integrated sidelobe level (ISL), is a quartic function in the variables and is a c… ▽ More

    Submitted 14 March, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: IEEE ICASSP 2023

  11. arXiv:2210.15364  [pdf, other

    cs.SD cs.AI eess.AS

    Explicit Intensity Control for Accented Text-to-speech

    Authors: Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li

    Abstract: Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a variant of the standard version (L1). How to control the intensity of accent in the process of TTS is a very interesting research direction, and has attracted more and more attention. Recent work design a speaker-adversarial loss to disentangle the speaker and accent information, and then adjust the loss weig… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 figures. Submitted to ICASSP 2023. arXiv admin note: text overlap with arXiv:2209.10804

  12. arXiv:2210.15360  [pdf, other

    cs.CL cs.SD eess.AS

    FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis

    Authors: Yifan Hu, Rui Liu, Guanglai Gao, Haizhou Li

    Abstract: Conversational Text-to-Speech (TTS) aims to synthesis an utterance with the right linguistic and affective prosody in a conversational context. The correlation between the current utterance and the dialogue history at the utterance level was used to improve the expressiveness of synthesized speech. However, the fine-grained information in the dialogue history at the word level also has an importan… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures, 1 table. Submitted to ICASSP 2023. We release the source code at: https://github.com/walker-hyf/FCTalker

  13. arXiv:2210.09531   

    cs.RO cs.HC eess.SY

    The Brain-Inspired Cooperative Shared Control for Brain-Machine Interface

    Authors: Shengjie Zheng, Ling Liu, Junjie Yang, Lang Qian, Gang Gao, Xin Chen, Wenqi **, Chunshan Deng, Xiaojian Li

    Abstract: In the practical application of brain-machine interface technology, the problem often faced is the low information content and high noise of the neural signals collected by the electrode and the difficulty of decoding by the decoder, which makes it difficult for the robotic to obtain stable instructions to complete the task. The idea based on the principle of cooperative shared control can be achi… ▽ More

    Submitted 25 June, 2024; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: This article need to update the corrected figure and data

  14. arXiv:2210.03883  [pdf, other

    cs.CV eess.IV

    Rethinking the Detection Head Configuration for Traffic Object Detection

    Authors: Yi Shi, Jiang Wu, Shixuan Zhao, Gangyao Gao, Tao Deng, Hongmei Yan

    Abstract: Multi-scale detection plays an important role in object detection models. However, researchers usually feel blank on how to reasonably configure detection heads combining multi-scale features at different input resolutions. We find that there are different matching relationships between the object distribution and the detection head at different input resolutions. Based on the instructive findings… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: 26 pages, 4 figures, 7 tables

  15. arXiv:2209.10848  [pdf, other

    cs.SD cs.AI eess.AS

    MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline

    Authors: Yifan Hu, Pengkai Yin, Rui Liu, Feilong Bao, Guanglai Gao

    Abstract: This paper introduces a high-quality open-source text-to-speech (TTS) synthesis dataset for Mongolian, a low-resource language spoken by over 10 million people worldwide. The dataset, named MnTTS, consists of about 8 hours of transcribed audio recordings spoken by a 22-year-old professional female Mongolian announcer. It is the first publicly available dataset developed to promote Mongolian TTS ap… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: Accepted at the 2022 International Conference on Asian Language Processing (IALP2022)

  16. arXiv:2209.10804  [pdf, other

    cs.SD cs.CL eess.AS

    Controllable Accented Text-to-Speech Synthesis

    Authors: Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li

    Abstract: Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a variant of the standard version (L1). Accented TTS synthesis is challenging as L2 is different from L1 in both in terms of phonetic rendering and prosody pattern. Furthermore, there is no easy solution to the control of the accent intensity in an utterance. In this work, we propose a neural TTS architecture,… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: To be submitted for possible journal publication

  17. arXiv:2206.07229  [pdf, other

    cs.SD cs.LG eess.AS

    Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning

    Authors: Rui Liu, Berrak Sisman, Björn Schuller, Guanglai Gao, Haizhou Li

    Abstract: Emotion classification of speech and assessment of the emotion strength are required in applications such as emotional text-to-speech and voice conversion. The emotion attribute ranking function based on Support Vector Machine (SVM) was proposed to predict emotion strength for emotional speech corpus. However, the trained ranking function doesn't generalize to new domains, which limits the scope o… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: To appear in INTERSPEECH 2022. 5 pages, 4 figures. Substantial text overlap with arXiv:2110.03156

  18. arXiv:2204.13873  [pdf, other

    cs.CV eess.IV

    Multiple Degradation and Reconstruction Network for Single Image Denoising via Knowledge Distillation

    Authors: Juncheng Li, Hanhui Yang, Qiaosi Yi, Faming Fang, Guangwei Gao, Tieyong Zeng, Guixu Zhang

    Abstract: Single image denoising (SID) has achieved significant breakthroughs with the development of deep learning. However, the proposed methods are often accompanied by plenty of parameters, which greatly limits their application scenarios. Different from previous works that blindly increase the depth of the network, we explore the degradation mechanism of the noisy image and propose a lightweight Multip… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    Comments: Accepted by CVPR Workshop 2022

  19. arXiv:2204.07417  [pdf, other

    cs.RO cs.LG eess.SY

    Safe Reinforcement Learning Using Black-Box Reachability Analysis

    Authors: Mahmoud Selim, Amr Alanwar, Shreyas Kousik, Grace Gao, Marco Pavone, Karl H. Johansson

    Abstract: Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment, robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Re… ▽ More

    Submitted 21 November, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: This paper is accepted at IEEE Robotics and Automation Letters and International Conference on Robotics and Automation (ICRA)

  20. arXiv:2112.08655  [pdf, other

    cs.CV eess.IV

    Feature Distillation Interaction Weighting Network for Lightweight Image Super-Resolution

    Authors: Guangwei Gao, Wenjie Li, Juncheng Li, Fei Wu, Huimin Lu, Yi Yu

    Abstract: Convolutional neural networks based single-image super-resolution (SISR) has made great progress in recent years. However, it is difficult to apply these methods to real-world scenarios due to the computational and memory cost. Meanwhile, how to take full advantage of the intermediate features under the constraints of limited parameters and calculations is also a huge challenge. To alleviate these… ▽ More

    Submitted 11 April, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: 9 pages, 9 figures, 4 tables, AAAI2022

  21. arXiv:2112.02965  [pdf, other

    eess.SP

    A Novel Full-Polarization SAR Images Ship Detector Based on the Scattering Mechanisms and the Wave Polarization Anisotropy

    Authors: Chuan Zhang, Gui Gao, Linlin Zhang, C. Chen, S. Gao, Libo Yao, Shiquan Gou

    Abstract: Synthetic aperture radar (SAR) is considered being a good option for earth observation with its unique advantages. In this paper, we proposed an adaptive ship detector using full-polarization SAR images. First, by thoroughly investigating the scattering characteristics between ships and their background, and the wave polarization anisotropy, a novel ship detector is proposed by jointing the two ch… ▽ More

    Submitted 6 December, 2021; v1 submitted 6 December, 2021; originally announced December 2021.

  22. arXiv:2109.14335  [pdf, other

    eess.IV cs.CV

    A Systematic Survey of Deep Learning-based Single-Image Super-Resolution

    Authors: Juncheng Li, Zehua Pei, Wenjie Li, Guangwei Gao, Longguang Wang, Yingqian Wang, Tieyong Zeng

    Abstract: Single-image super-resolution (SISR) is an important task in image processing, which aims to enhance the resolution of imaging systems. Recently, SISR has made a huge leap and has achieved promising results with the help of deep learning (DL). In this survey, we give an overview of DL-based SISR methods and group them according to their design targets. Specifically, we first introduce the problem… ▽ More

    Submitted 12 April, 2024; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: 40 pages, 12 figures

  23. arXiv:2108.01750  [pdf, other

    eess.SY

    Ellipsotopes: Combining Ellipsoids and Zonotopes for Reachability Analysis and Fault Detection

    Authors: Shreyas Kousik, Adam Dai, Grace Gao

    Abstract: Ellipsoids are a common representation for reachability analysis, because they can be transformed efficiently under affine maps, and allow conservative approximation of Minkowski sums, which let one incorporate uncertainty and linearization error in a dynamical system by expanding the size of the reachable set. Zonotopes, a type of symmetric, convex polytope, are similarly frequently used due to e… ▽ More

    Submitted 21 June, 2022; v1 submitted 3 August, 2021; originally announced August 2021.

  24. arXiv:2103.14330  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Guided Training: A Simple Method for Single-channel Speaker Separation

    Authors: Hao Li, Xueliang Zhang, Guanglai Gao

    Abstract: Deep learning has shown a great potential for speech separation, especially for speech and non-speech separation. However, it encounters permutation problem for multi-speaker separation where both target and interference are speech. Permutation Invariant training (PIT) was proposed to solve this problem by permuting the order of the multiple speakers. Another way is to use an anchor speech, a shor… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: 5 pages

  25. JDSR-GAN: Constructing An Efficient Joint Learning Network for Masked Face Super-Resolution

    Authors: Guangwei Gao, Lei Tang, Fei Wu, Huimin Lu, Jian Yang

    Abstract: With the growing importance of preventing the COVID-19 virus, face images obtained in most video surveillance scenarios are low resolution with mask simultaneously. However, most of the previous face super-resolution solutions can not handle both tasks in one model. In this work, we treat the mask occlusion as image noise and construct a joint and collaborative learning network, called JDSR-GAN, f… ▽ More

    Submitted 29 January, 2023; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: IEEE Transactions on Multimedia, 8 pages, 7 figures

  26. Lightweight Image Super-Resolution with Multi-scale Feature Interaction Network

    Authors: Zhengxue Wang, Guangwei Gao, Juncheng Li, Yi Yu, Huimin Lu

    Abstract: Recently, the single image super-resolution (SISR) approaches with deep and complex convolutional neural network structures have achieved promising performance. However, those methods improve the performance at the cost of higher memory consumption, which is difficult to be applied for some mobile devices with limited storage and computing resources. To solve this problem, we present a lightweight… ▽ More

    Submitted 21 June, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: ICME2021, https://ieeexplore.ieee.org/abstract/document/9428136

  27. arXiv:2101.04835  [pdf, other

    cs.MA cs.RO eess.SY

    GPS Spoofing Mitigation and Timing Risk Analysis in Networked PMUs via Stochastic Reachability

    Authors: Sriramya Bhamidipati, Grace Xingxin Gao

    Abstract: To address PMU vulnerability against spoofing, we propose a set-valued state estimation technique known as Stochastic Reachability-based Distributed Kalman Filter (SR-DKF) that computes secure GPS timing across a network of receivers. Utilizing stochastic reachability, we estimate not only GPS time but also its stochastic reachable set, which is parameterized via probabilistic zonotope (p-Zonotope… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

  28. arXiv:2101.02850  [pdf, other

    eess.SP cs.LG

    Designing Low-Correlation GPS Spreading Codes with a Natural Evolution Strategy Machine Learning Algorithm

    Authors: Tara Yasmin Mina, Grace Xingxin Gao

    Abstract: With the birth of the next-generation GPS III constellation and the upcoming launch of the Navigation Technology Satellite-3 (NTS-3) testing platform to explore future technologies for GPS, we are indeed entering a new era of satellite navigation. Correspondingly, it is time to revisit the design methods of the GPS spreading code families. In this work, we develop a natural evolution strategy (NES… ▽ More

    Submitted 28 December, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

  29. arXiv:2008.05284  [pdf, other

    eess.AS cs.CL cs.SD

    Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS

    Authors: Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

    Abstract: Tacotron-based end-to-end speech synthesis has shown remarkable voice quality. However, the rendering of prosody in the synthesized speech remains to be improved, especially for long sentences, where prosodic phrasing errors can occur frequently. In this paper, we extend the Tacotron-based speech synthesis framework to explicitly model the prosodic phrase breaks. We propose a multi-task learning s… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: To appear in IEEE Signal Processing Letters (SPL)

  30. arXiv:2008.01490  [pdf, other

    cs.SD eess.AS

    Expressive TTS Training with Frame and Style Reconstruction Loss

    Authors: Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li

    Abstract: We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system to improve the expressiveness of speech. One of the key challenges in prosody modeling is the lack of reference that makes explicit modeling difficult. The proposed technique doesn't require prosody annotations from training data. It doesn't attempt to model prosody explicitly either, but rather encodes the associa… ▽ More

    Submitted 12 April, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  31. arXiv:2007.04517  [pdf, other

    eess.SY cs.MA

    Distributed Energy Trading and Scheduling among Microgrids via Multiagent Reinforcement Learning

    Authors: Guanyu Gao, Yonggang Wen, Xiaohu Wu, Ran Wang

    Abstract: The development of renewable energy generation empowers microgrids to generate electricity to supply itself and to trade the surplus on energy markets. To minimize the overall cost, a microgrid must determine how to schedule its energy resources and electrical loads and how to trade with others. The control decisions are influenced by various factors, such as energy storage, renewable energy yield… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

  32. arXiv:2006.06196  [pdf, other

    cs.CV cs.LG eess.IV

    An Edge Information and Mask Shrinking Based Image Inpainting Approach

    Authors: Huali Xu, Xiangdong Su, Meng Wang, Xiang Hao, Guanglai Gao

    Abstract: In the image inpainting task, the ability to repair both high-frequency and low-frequency information in the missing regions has a substantial influence on the quality of the restored image. However, existing inpainting methods usually fail to consider both high-frequency and low-frequency information simultaneously. To solve this problem, this paper proposes edge information and mask shrinking ba… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: Accepted by ICME2020

  33. SNR-Based Teachers-Student Technique for Speech Enhancement

    Authors: Xiang Hao, Xiangdong Su, Zhiyu Wang, Qiang Zhang, Huali Xu, Guanglai Gao

    Abstract: It is very challenging for speech enhancement methods to achieves robust performance under both high signal-to-noise ratio (SNR) and low SNR simultaneously. In this paper, we propose a method that integrates an SNR-based teachers-student technique and time-domain U-Net to deal with this problem. Specifically, this method consists of multiple teacher models and a student model. We first train the t… ▽ More

    Submitted 29 October, 2020; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: Published in 2020 IEEE International Conference on Multimedia and Expo (ICME 2020)

  34. Sub-Band Knowledge Distillation Framework for Speech Enhancement

    Authors: Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li

    Abstract: In single-channel speech enhancement, methods based on full-band spectral features have been widely studied. However, only a few methods pay attention to non-full-band spectral features. In this paper, we explore a knowledge distillation framework based on sub-band spectral map** for single-channel speech enhancement. Specifically, we divide the full frequency band into multiple sub-bands and pr… ▽ More

    Submitted 29 October, 2020; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: Published in Interspeech 2020

  35. arXiv:2002.00417  [pdf, other

    eess.AS cs.CL cs.SD

    WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

    Authors: Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

    Abstract: Tacotron-based text-to-speech (TTS) systems directly synthesize speech from text input. Such frameworks typically consist of a feature prediction network that maps character sequences to frequency-domain acoustic features, followed by a waveform reconstruction algorithm or a neural vocoder that generates the time-domain waveform from acoustic features. As the loss function is usually calculated on… ▽ More

    Submitted 6 April, 2020; v1 submitted 2 February, 2020; originally announced February 2020.

    Comments: To appear at Odyssey 2020, Tokyo, Japan

  36. arXiv:1911.02839  [pdf, other

    cs.CL cs.SD eess.AS

    Teacher-Student Training for Robust Tacotron-based TTS

    Authors: Rui Liu, Berrak Sisman, **gdong Li, Feilong Bao, Guanglai Gao, Haizhou Li

    Abstract: While neural end-to-end text-to-speech (TTS) is superior to conventional statistical methods in many ways, the exposure bias problem in the autoregressive models remains an issue to be resolved. The exposure bias problem arises from the mismatch between the training and inference process, that results in unpredictable performance for out-of-domain test data at run-time. To overcome this, we propos… ▽ More

    Submitted 11 February, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

    Comments: To appear at ICASSP2020, Barcelona, Spain

  37. arXiv:1910.02165  [pdf, other

    cs.RO cs.CV eess.IV

    SLAM-based Integrity Monitoring Using GPS and Fish-eye Camera

    Authors: Sriramya Bhamidipati, Grace Xingxin Gao

    Abstract: Urban navigation using GPS and fish-eye camera suffers from multipath effects in GPS measurements and data association errors in pixel intensities across image frames. We propose a Simultaneous Localization and Map** (SLAM)-based Integrity Monitoring (IM) algorithm to compute the position protection levels while accounting for multiple faults in both GPS and vision. We perform graph optimization… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: 32nd International Technical Meeting of the Satellite Division of the Institute of Navigation, ION GNSS+ 2019, Miami, FL, Sept 2019

  38. arXiv:1908.06846  [pdf

    eess.SP

    A photonic-assisted method based on the MDA technique for the frequency estimation precision improvement

    Authors: Guangyu Gao, Nai** Liu

    Abstract: A novel photonics-assisted method based on presampling and MDA technique is proposed for significantly improving the frequency estimation precision without introducing other complex algorithms. This method is also compatible with existing FFT-based high-precision estimation algorithms

    Submitted 31 July, 2019; originally announced August 2019.

    Comments: 3 pages,5 figs,conference paper

  39. arXiv:1901.04693  [pdf, other

    eess.SY cs.AI

    Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning

    Authors: Guanyu Gao, Jie Li, Yonggang Wen

    Abstract: Heating, Ventilation, and Air Conditioning (HVAC) is extremely energy-consuming, accounting for 40% of total building energy consumption. Therefore, it is crucial to design some energy-efficient building thermal control policies which can reduce the energy consumption of HVAC while maintaining the comfort of the occupants. However, implementing such a policy is challenging, because it involves var… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

  40. arXiv:1805.00362  [pdf

    eess.SP

    A code-free optical undersampling technique for broadband microwave spectrum measurement

    Authors: Guangyu Gao, Xueshuang Xiang, Qijun Liang, Nai** Liu

    Abstract: A novel broadband microwave (MW) spectrum measurement (BMSM) scheme based on code-free optical undersampling and homodyne detection is proposed. The fully analog generation of optical pulses with a far-less-than-Nyquist rate is only through modulating cascaded electrooptical modulators by a single RF tone instead of any high-speed coding sequence modulation. Homodyne detection will reduce the anal… ▽ More

    Submitted 31 July, 2019; v1 submitted 29 April, 2018; originally announced May 2018.

    Comments: 3 pages and 7 figures