Skip to main content

Showing 1–50 of 91 results for author: Jiao, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18840  [pdf

    eess.IV

    Shorter SPECT Scans Using Self-supervised Coordinate Learning to Synthesize Skipped Projection Views

    Authors: Zongyu Li, Yixuan Jia, Xiaojian Xu, Jason Hu, Jeffrey A. Fessler, Yuni K. Dewaraja

    Abstract: Purpose: This study addresses the challenge of extended SPECT imaging duration under low-count conditions, as encountered in Lu-177 SPECT imaging, by develo** a self-supervised learning approach to synthesize skipped SPECT projection views, thus shortening scan times in clinical settings. Methods: We employed a self-supervised coordinate-based learning technique, adapting the neural radiance fie… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 25 pages, 5568 words

  2. arXiv:2406.08806  [pdf, ps, other

    eess.SY

    Adaptive Cooperative Streaming of Holographic Video Over Wireless Networks: A Proximal Policy Optimization Solution

    Authors: Wanli Wen, Ji** Yan, Yulu Zhang, Zhen Huang, Liang Liang, Yunjian Jia

    Abstract: Adapting holographic video streaming to fluctuating wireless channels is essential to maintain consistent and satisfactory Quality of Experience (QoE) for users, which, however, is a challenging task due to the dynamic and uncertain characteristics of wireless networks. To address this issue, we propose a holographic video cooperative streaming framework designed for a generic wireless network in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in IEEE Wireless Communications Letters

  3. arXiv:2406.02133  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SimulTron: On-Device Simultaneous Speech to Speech Translation

    Authors: Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich

    Abstract: Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages. However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. SimulTron is a lightweight direct S2ST model that uses the st… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2406.02055  [pdf

    eess.SY

    Stochastic Carbon Footprint Tracing Methods in Power Systems

    Authors: Jiashuo Hu, Xiao-** Zhang, Youwei Jia

    Abstract: As the penetration of distributed energy resources (DER) and renewable energy sources (RES) increases, carbon footprint tracking requires more granular analysis results. Existing carbon footprint tracking methods focus on deterministic steady-state analysis where the high uncertainties of RES cannot be considered. Considering the deficiency of the existing deterministic method, this paper proposes… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2404.12598  [pdf, ps, other

    cs.LG eess.SY q-fin.CP q-fin.PM

    Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

    Authors: Yanwei Jia

    Abstract: This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (2023) the risk-… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 49 pages, 2 figures, 1 table

    MSC Class: 62L20; 68T05; 93E03; 93E20; 93E35

  6. arXiv:2403.15156  [pdf, other

    cs.RO cs.CV eess.SY

    Infrastructure-Assisted Collaborative Perception in Automated Valet Parking: A Safety Perspective

    Authors: Yukuan Jia, Jiawen Zhang, Shimeng Lu, Baokang Fan, Ruiqing Mao, Sheng Zhou, Zhisheng Niu

    Abstract: Environmental perception in Automated Valet Parking (AVP) has been a challenging task due to severe occlusions in parking garages. Although Collaborative Perception (CP) can be applied to broaden the field of view of connected vehicles, the limited bandwidth of vehicular communications restricts its application. In this work, we propose a BEV feature-based CP network architecture for infrastructur… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages, 7 figures, 4 tables, accepted by IEEE VTC2024-Spring

  7. arXiv:2403.10622  [pdf, other

    eess.IV cs.CV

    NeuralOCT: Airway OCT Analysis via Neural Fields

    Authors: Yining Jiao, Amy Oldenburg, Yinghan Xu, Srikamal Soundararajan, Carlton Zdanski, Julia Kimbell, Marc Niethammer

    Abstract: Optical coherence tomography (OCT) is a popular modality in ophthalmology and is also used intravascularly. Our interest in this work is OCT in the context of airway abnormalities in infants and children where the high resolution of OCT and the fact that it is radiation-free is important. The goal of airway OCT is to provide accurate estimates of airway geometry (in 2D and 3D) to assess airway abn… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  8. arXiv:2402.02694  [pdf, other

    eess.AS cs.LG cs.SD

    Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

    Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

    Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More

    Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  9. arXiv:2312.14482  [pdf, other

    eess.SP

    On Smart Morphing Wing Aircraft Robust Adaptive Beamforming

    Authors: Yizhen Jia, Hui Chen, Wen-Qin Wang, Jie Cheng

    Abstract: The smart morphing wing aircraft (SMWA) is a highly adaptable platform that can be widely used for intelligent warfare due to its real-time variable structure. The flexible conformal array (FCA) is a vital detection component of SMWA, when the deformation parameters of FCA are mismatched or array elements are mutually coupled, detection performance will be degraded. To overcome this problem and en… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Conference extended version

  10. arXiv:2309.07141  [pdf

    eess.SP cs.AI cs.LG

    Design of Recognition and Evaluation System for Table Tennis Players' Motor Skills Based on Artificial Intelligence

    Authors: Zhuo-yong Shi, Ye-tao Jia, Ke-xin Zhang, Ding-han Wang, Long-meng Ji, Yong Wu

    Abstract: With the rapid development of electronic science and technology, the research on wearable devices is constantly updated, but for now, it is not comprehensive for wearable devices to recognize and analyze the movement of specific sports. Based on this, this paper improves wearable devices of table tennis sport, and realizes the pattern recognition and evaluation of table tennis players' motor skill… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 34pages, 16figures

    MSC Class: 93-01 ACM Class: G.1; H.4

  11. arXiv:2308.13839  [pdf, other

    cs.RO eess.SY

    A Conflict Resolution Dataset Derived from Argoverse-2: Analysis of the Safety and Efficiency Impacts of Autonomous Vehicles at Intersections

    Authors: Guopeng Li, Yiru Jiao, Simeon C. Calvert, J. W. C. van Lint

    Abstract: As the deployment of autonomous vehicles (AVs) in mixed traffic flow becomes increasingly prevalent, ensuring safe and smooth interactions between AVs and human agents is of critical importance. How road users resolve conflicts at intersections has significant impacts on driving safety and traffic efficiency. These impacts depend on both the behaviours of AVs and humans' reactions to the presence… ▽ More

    Submitted 9 December, 2023; v1 submitted 26 August, 2023; originally announced August 2023.

    Comments: 20 pages, 16 figures

  12. arXiv:2307.08556  [pdf, other

    stat.ML cs.LG eess.IV

    Machine-Learning-based Colorectal Tissue Classification via Acoustic Resolution Photoacoustic Microscopy

    Authors: Shangqing Tong, Peng Ge, Yanan Jiao, Zhaofu Ma, Ziye Li, Longhai Liu, Feng Gao, Xiaohui Du, Fei Gao

    Abstract: Colorectal cancer is a deadly disease that has become increasingly prevalent in recent years. Early detection is crucial for saving lives, but traditional diagnostic methods such as colonoscopy and biopsy have limitations. Colonoscopy cannot provide detailed information within the tissues affected by cancer, while biopsy involves tissue removal, which can be painful and invasive. In order to impro… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  13. arXiv:2307.08239  [pdf, other

    eess.AS

    Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection

    Authors: Siwei Huang, Jianfeng Chen, Jisheng Bai, Yafei Jia, Dongzhe Zhang

    Abstract: DNN-based methods have shown high performance in sound event localization and detection(SELD). While in real spatial sound scenes, reverberation and the imbalanced presence of various sound events increase the complexity of the SELD task. In this paper, we propose an effective SELD system in real spatial scenes.In our approach, a dynamic kernel convolution module is introduced after the convolutio… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 11 pages, 6 figures

  14. arXiv:2306.04987  [pdf, other

    eess.AS cs.SD

    Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement

    Authors: Han Yin, Jisheng Bai, Mou Wang, Siwei Huang, Yafei Jia, Jianfeng Chen

    Abstract: 3D speech enhancement can effectively improve the auditory experience and plays a crucial role in augmented reality technology. However, traditional convolutional-based speech enhancement methods have limitations in extracting dynamic voice information. In this paper, we incorporate a dual-path recurrent neural network block into the U-Net to iteratively extract dynamic audio information in both t… ▽ More

    Submitted 19 November, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Published on IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC 2023)

  15. arXiv:2305.18921  [pdf, other

    eess.SY cs.HC cs.RO

    Large Car-following Data Based on Lyft level-5 Open Dataset: Following Autonomous Vehicles vs. Human-driven Vehicles

    Authors: Guopeng Li, Yiru Jiao, Victor L. Knoop, Simeon C. Calvert, J. W. C. van Lint

    Abstract: Car-Following (CF), as a fundamental driving behaviour, has significant influences on the safety and efficiency of traffic flow. Investigating how human drivers react differently when following autonomous vs. human-driven vehicles (HV) is thus critical for mixed traffic flow. Research in this field can be expedited with trajectory datasets collected by Autonomous Vehicles (AVs). However, trajector… ▽ More

    Submitted 21 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: 6 pages, 9 figures

  16. arXiv:2305.00579  [pdf, other

    eess.SY

    RAPID: Autonomous Multi-Agent Racing using Constrained Potential Dynamic Games

    Authors: Yixuan Jia, Maulik Bhatt, Negar Mehr

    Abstract: In this work, we consider the problem of autonomous racing with multiple agents where agents must interact closely and influence each other to compete. We model interactions among agents through a game-theoretical framework and propose an efficient algorithm for tractably solving the resulting game in real time. More specifically, we capture interactions among multiple agents through a constrained… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: 8 pages

  17. arXiv:2303.10510  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    A Deep Learning System for Domain-specific Speech Recognition

    Authors: Yanan Jia

    Abstract: As human-machine voice interfaces provide easy access to increasingly intelligent machines, many state-of-the-art automatic speech recognition (ASR) systems are proposed. However, commercial ASR systems usually have poor performance on domain-specific speech especially under low-resource settings. The author works with pre-trained DeepSpeech2 and Wav2Vec2 acoustic models to develop benefit-specifi… ▽ More

    Submitted 27 September, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: 4th International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2023)

  18. arXiv:2301.06304  [pdf

    eess.IV cs.CV

    LYSTO: The Lymphocyte Assessment Hackathon and Benchmark Dataset

    Authors: Yi** Jiao, Jeroen van der Laak, Shadi Albarqouni, Zhang Li, Tao Tan, Abhir Bhalerao, Jiabo Ma, Jiamei Sun, Johnathan Pocock, Josien P. W. Pluim, Navid Alemi Koohbanani, Raja Muhammad Saad Bashir, Shan E Ahmed Raza, Sibo Liu, Simon Graham, Suzanne Wetstein, Syed Ali Khurram, Thomas Watson, Nasir Rajpoot, Mitko Veta, Francesco Ciompi

    Abstract: We introduce LYSTO, the Lymphocyte Assessment Hackathon, which was held in conjunction with the MICCAI 2019 Conference in Shenzen (China). The competition required participants to automatically assess the number of lymphocytes, in particular T-cells, in histopathological images of colon, breast, and prostate cancer stained with CD3 and CD8 immunohistochemistry. Differently from other challenges se… ▽ More

    Submitted 13 April, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

    Comments: will be sumitted to IEEE-JBHI

    MSC Class: 68T07 ACM Class: I.4.9; I.5.4; I.2.1

  19. arXiv:2212.06299  [pdf

    eess.IV cs.CV cs.LG

    Interpretable Diabetic Retinopathy Diagnosis based on Biomarker Activation Map

    Authors: Pengxiao Zang, Tristan T. Hormel, Jie Wang, Yukun Guo, Steven T. Bailey, Christina J. Flaxel, David Huang, Thomas S. Hwang, Yali Jia

    Abstract: Deep learning classifiers provide the most accurate means of automatically diagnosing diabetic retinopathy (DR) based on optical coherence tomography (OCT) and its angiography (OCTA). The power of these models is attributable in part to the inclusion of hidden layers that provide the complexity required to achieve a desired task. However, hidden layers also render algorithm outputs difficult to in… ▽ More

    Submitted 26 June, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: This paper has been accepted by IEEE TBME

    ACM Class: I.2.0; I.4.0; J.3

  20. arXiv:2211.14830  [pdf, other

    eess.IV cs.CV

    Medical Image Segmentation Review: The success of U-Net

    Authors: Reza Azad, Ehsan Khodapanah Aghdam, Amelie Rauland, Yiwei Jia, Atlas Haddadi Avval, Afshin Bozorgpour, Sanaz Karimijafarbigloo, Joseph Paul Cohen, Ehsan Adeli, Dorit Merhof

    Abstract: Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model achieved tremendous attention from academic and indu… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: Submitted to the IEEE Transactions on Pattern Analysis and Machine Intelligence Journal

  21. Energy-Efficient Driving in Connected Corridors via Minimum Principle Control: Vehicle-in-the-Loop Experimental Verification in Mixed Fleets

    Authors: Tyler Ard, Longxiang Guo, Jihun Han, Yunyi Jia, Ardalan Vahidi, Dominik Karbowski

    Abstract: Connected and automated vehicles (CAVs) can plan and actuate control that explicitly considers performance, system safety, and actuation constraints in a manner more efficient than their human-driven counterparts. In particular, eco-driving is enabled through connected exchange of information from signalized corridors that share their upcoming signal phase and timing (SPaT). This is accomplished i… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: 13 Figures

  22. arXiv:2211.00115  [pdf, other

    cs.CL cs.SD eess.AS

    Textless Direct Speech-to-Speech Translation with Discrete Speech Representation

    Authors: Xinjian Li, Ye Jia, Chung-Cheng Chiu

    Abstract: Research on speech-to-speech translation (S2ST) has progressed rapidly in recent years. Many end-to-end systems have been proposed and show advantages over conventional cascade systems, which are often composed of recognition, translation and synthesis sub-systems. However, most of the end-to-end systems still rely on intermediate textual supervision during training, which makes it infeasible to w… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

  23. arXiv:2210.12361  [pdf

    eess.IV cs.CV

    MS-DCANet: A Novel Segmentation Network For Multi-Modality COVID-19 Medical Images

    Authors: Xiaoyu Pan, Huazheng Zhu, **glong Du, Guangtao Hu, Baoru Han, Yuanyuan Jia

    Abstract: The Coronavirus Disease 2019 (COVID-19) pandemic has increased the public health burden and brought profound disaster to humans. For the particularity of the COVID-19 medical images with blurred boundaries, low contrast and different infection sites, some researchers have improved the accuracy by adding more complexity. Also, they overlook the complexity of lesions, which hinder their ability to c… ▽ More

    Submitted 19 July, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: 21pages,13 figures,9 tables

    Journal ref: J Multidiscip Healthc. 2023;16:2023-2043

  24. arXiv:2210.07749   

    eess.AS cs.SD

    LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

    Authors: Yan Jia, Mi Hong, **gyu Hou, Kailong Ren, Sifan Ma, ** Wang, Fangzhen Peng, Yinglin Ji, Lin Yang, Junjie Wang

    Abstract: This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusi… ▽ More

    Submitted 16 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: There are experimental errors

  25. arXiv:2209.13786  [pdf, other

    cs.LG eess.SP

    A Parameter-free Nonconvex Low-rank Tensor Completion Model for Spatiotemporal Traffic Data Recovery

    Authors: Yang He, Yuheng Jia, Liyang Hu, Chengchuan An, Zhenbo Lu, **gxin Xia

    Abstract: Traffic data chronically suffer from missing and corruption, leading to accuracy and utility reduction in subsequent Intelligent Transportation System (ITS) applications. Noticing the inherent low-rank property of traffic data, numerous studies formulated missing traffic data recovery as a low-rank tensor completion (LRTC) problem. Due to the non-convexity and discreteness of the rank minimization… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 10 pages, 7 figures

  26. arXiv:2209.11451  [pdf, other

    cs.IT eess.SY

    FIAT: Fine-grained Information Audit for Trustless Transborder Data Flow

    Authors: Shuhao Zheng, Yanxi Lin, Yang Yu, Ye Yuan, Yongzheng Jia, Xue Liu

    Abstract: Auditing the information leakage of latent sensitive features during the transborder data flow has attracted sufficient attention from global digital regulators. However, there is missing a technical approach for the audit practice due to two technical challenges. Firstly, there is a lack of theory and tools for measuring the information of sensitive latent features in a dataset. Secondly, the tra… ▽ More

    Submitted 10 February, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

    Comments: 10 pages, 6 figures, 1 table

  27. arXiv:2208.13183  [pdf, other

    cs.SD eess.AS

    Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

    Authors: Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark

    Abstract: Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challenges is that models that have high-quality transfer capabilities can have issues in stability, making them impractical for user-facing critical tasks. T… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

    Comments: To be published in Interspeech 2022

  28. DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and Interconnected Self-driving

    Authors: Ruiqing Mao, **gyu Guo, Yukuan Jia, Yuxuan Sun, Sheng Zhou, Zhisheng Niu

    Abstract: Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving, which is a promising solution to the fundamental defect of stand-alone intelligence including blind zones and long-range perception. However, the lack of datasets has severely blocked the development of collaborative perception algorithms. In this work, we release DOLPHINS: Dataset for cOllaborative Per… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  29. arXiv:2203.13339  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation

    Authors: Ye Jia, Yifan Ding, Ankur Bapna, Colin Cherry, Yu Zhang, Alexis Conneau, Nobuyuki Morioka

    Abstract: End-to-end speech-to-speech translation (S2ST) without relying on intermediate text representations is a rapidly emerging frontier of research. Recent works have demonstrated that the performance of such direct S2ST systems is approaching that of conventional cascade S2ST when trained on comparable datasets. However, in practice, the performance of direct S2ST is bounded by the availability of pai… ▽ More

    Submitted 27 June, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Interspeech 2022

  30. arXiv:2203.00508  [pdf, ps, other

    cs.IT eess.SP

    Reconfigurable Intelligent Surface-Aided Spectrum Sharing Coexisting with Multiple Primary Networks

    Authors: Zhong Tian, Zhengchuan Chen, Min Wang, Yunjian Jia, Wanli Wen

    Abstract: Considering the spectrum sharing system (SSS) coexisting with multiple primary networks, we have employed a well-designed reconfigurable intelligent surface (RIS) to control the radio environments of wireless channels and relieve the scarcity of the spectrum resource in this work. Specifically, the enhancement of the spectral efficiency of the secondary user in the considered SSS is decomposed int… ▽ More

    Submitted 4 November, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

  31. arXiv:2201.03713  [pdf, other

    cs.CL cs.SD eess.AS

    CVSS Corpus and Massively Multilingual Speech-to-Speech Translation

    Authors: Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen

    Abstract: We introduce CVSS, a massively multilingual-to-English speech-to-speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English. CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems. Two versions of t… ▽ More

    Submitted 26 June, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: LREC 2022

  32. arXiv:2201.00167  [pdf, other

    cs.SD eess.AS

    Generating Adversarial Samples For Training Wake-up Word Detection Systems Against Confusing Words

    Authors: Haoxu Wang, Yan Jia, Zeqing Zhao, Xuyang Wang, Junjie Wang, Ming Li

    Abstract: Wake-up word detection models are widely used in real life, but suffer from severe performance degradation when encountering adversarial samples. In this paper we discuss the concept of confusing words in adversarial samples. Confusing words are commonly encountered, which are various kinds of words that sound similar to the predefined keywords. To enhance the wake word detection system's robustne… ▽ More

    Submitted 1 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2011.01460

  33. arXiv:2112.15354  [pdf, ps, other

    cs.IT eess.SP

    Statistical Device Activity Detection for OFDM-based Massive Grant-Free Access

    Authors: Yuhang Jia, Ying Cui, Wuyang Jiang

    Abstract: Existing works on grant-free access, proposed to support massive machine-type communication (mMTC) for the Internet of things (IoT), mainly concentrate on narrow band systems under flat fading. However, little is known about massive grant-free access for wideband systems under frequency-selective fading. This paper investigates massive grant-free access in a wideband system under frequency-selecti… ▽ More

    Submitted 31 December, 2021; originally announced December 2021.

    Comments: 30 pages, 7 figures, be submitted to IEEE Transactions on WIreless Communications

  34. arXiv:2112.13369  [pdf, other

    cs.RO eess.SP

    Stop Line Aided Cooperative Positioning of Connected Vehicles

    Authors: Xingqi Wang, Chaoyang Jiang, Shuxuan Sheng, Yanjie Xu, Yifei Jia

    Abstract: This paper develops a stop line aided cooperative positioning framework for connected vehicles, which creatively utilizes the location of the stop-line to achieve the positioning enhancement for a vehicular ad-hoc network (VANET) in intersection scenarios via Vehicle-to-Vehicle (V2V) communication. Firstly, a self-positioning correction scheme for the first stopped vehicle is presented, which appl… ▽ More

    Submitted 26 December, 2021; originally announced December 2021.

  35. arXiv:2111.14486  [pdf, other

    cs.LG eess.SP stat.ML

    Just Least Squares: Binary Compressive Sampling with Low Generative Intrinsic Dimension

    Authors: Yuling Jiao, Dingwei Li, Min Liu, Xiangliang Lu, Yuanyuan Yang

    Abstract: In this paper, we consider recovering $n$ dimensional signals from $m$ binary measurements corrupted by noises and sign flips under the assumption that the target signals have low generative intrinsic dimension, i.e., the target signals can be approximately generated via an $L$-Lipschitz generator $G: \mathbb{R}^k\rightarrow\mathbb{R}^{n}, k\ll n$. Although the binary measurements model is highly… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  36. arXiv:2107.08661  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Translatotron 2: High-quality direct speech-to-speech translation with voice preservation

    Authors: Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz

    Abstract: We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. Translatotron 2 consists of a speech encoder, a linguistic decoder, an acoustic synthesizer, and a single attention module that connects them together. Experimental results on three datasets consistently show that Translatotron 2 outperforms the original Translatotron by a large margin on… ▽ More

    Submitted 17 May, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: ICML 2022

  37. Improving the expressiveness of neural vocoding with non-affine Normalizing Flows

    Authors: Adam GabryÅ›, Yunlong Jiao, Viacheslav Klimkov, Daniel Korzekwa, Roberto Barra-Chicote

    Abstract: This paper proposes a general enhancement to the Normalizing Flows (NF) used in neural vocoding. As a case study, we improve expressive speech vocoding with a revamped Parallel Wavenet (PW). Specifically, we propose to extend the affine transformation of PW to the more expressive invertible non-affine function. The greater expressiveness of the improved PW leads to better-perceived signal quality… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021, 5 pages,3 figures

  38. arXiv:2106.02934  [pdf, other

    cs.SD eess.AS

    Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

    Authors: Yuanyuan Bao, Yanze Xu, Na Xu, Wen**g Yang, Hongfeng Li, Shicong Li, Yongtao Jia, Fei Xiang, **cheng He, Ming Li

    Abstract: Nowadays, there is a strong need to deploy the target speaker separation (TSS) model on mobile devices with a limitation of the model size and computational complexity. To better perform TSS for mobile voice communication, we first make a dual-channel dataset based on a specific scenario, LibriPhone. Specifically, to better mimic the real-case scenario, instead of simulating from the single-channe… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

  39. arXiv:2105.08280  [pdf, other

    eess.SY

    Peer-to-Peer Energy Cooperation in Building Community over A Lossy Network

    Authors: Cheng Lyu, Youwei Jia, Zhao Xu

    Abstract: Energy management of buildings is of vital importance for the urban low-carbon transition. This paper proposes a sustainable energy cooperation framework for the building community by communication-efficient peer-to-peer transaction. Firstly, the energy cooperation of buildings is formulated as a social welfare maximization problem, in which buildings may directly trade energy with neighbors. In a… ▽ More

    Submitted 19 June, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: 5 pages, 6 figures, accepted to IEEE PESGM 2021, Best Paper Award

  40. arXiv:2104.08114  [pdf, other

    physics.geo-ph cond-mat.mtrl-sci eess.IV

    AI-driven Bayesian inference of statistical microstructure descriptors from finite-frequency waves

    Authors: Wouter Klessens, Ivan Vasconcelos, Yang Jiao

    Abstract: The ability to image materials at the microscale from long-wavelength wave data is a major challenge to the geophysical, engineering and medical fields. Here, we present a framework to constrain microstructure geometry and properties from long-scale waves. To realistically quantify microstructures we use two-point statistics, from which we derive scale-dependent effective wave properties - wavespe… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

  41. arXiv:2104.04993  [pdf, other

    eess.AS

    The DKU System Description for The Interspeech 2021 Auto-KWS Challenge

    Authors: Yechen Wang, Yan Jia, Murong Ma, Zexin Cai, Ming Li

    Abstract: This paper introduces the system submitted by the DKU-SMIIP team for the Auto-KWS 2021 Challenge. Our implementation consists of a two-stage keyword spotting system based on query-by-example spoken term detection and a speaker verification system. We employ two different detection algorithms in our proposed keyword spotting system. The first stage adopts subsequence dynamic time war** for templa… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: 5 pages, 1 figures, submitted to INTERSPEECH

  42. arXiv:2104.04819  [pdf

    eess.SY

    Real-time Operation Optimization of Microgrids with Battery Energy Storage System: A Tube-based Model Predictive Control Approach

    Authors: Cheng Lyu, Youwei Jia, Zhao Xu

    Abstract: Battery energy storage systems (ESS) are widely used in microgrids to complement high renewables. However, the real-time energy management of microgrids with battery ESS is challenging in two aspects: 1) the evolution process of battery energy level is across-time coupled; 2) uncertainties unavoidably arise in the forecasting process for renewable generation. In this paper, a tube-based model pred… ▽ More

    Submitted 10 April, 2021; originally announced April 2021.

    Comments: 8 pages, 12 figures

  43. arXiv:2103.15060  [pdf, other

    cs.CL cs.SD eess.AS

    PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

    Authors: Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu

    Abstract: This paper introduces PnG BERT, a new encoder model for neural TTS. This model is augmented from the original BERT model, by taking both phoneme and grapheme representations of text as input, as well as the word-level alignment between them. It can be pre-trained on a large text corpus in a self-supervised manner, and fine-tuned in a TTS task. Experimental results show that a neural TTS model usin… ▽ More

    Submitted 7 June, 2021; v1 submitted 28 March, 2021; originally announced March 2021.

    Comments: Accepted to Interspeech 2021

  44. arXiv:2103.14574  [pdf, other

    cs.SD eess.AS

    Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

    Authors: Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, RJ Skerry-Ryan, Yonghui Wu

    Abstract: This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and an iterative reconstruction loss based on Soft Dynamic Time War**, this model can learn token-frame alignments as well as token durations automatica… ▽ More

    Submitted 29 August, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: Submitted to INTERSPEECH 2021

  45. arXiv:2102.07000  [pdf

    eess.SY

    Adaptive Optimization of Autonomous Vehicle Computational Resources for Performance and Energy Improvement

    Authors: Saurabh Jambotkar, Longxiang Guo, Yunyi Jia

    Abstract: Autonomous vehicles usually consume a large amount of computational power for their operations, especially for the tasks of sensing and perception with artificial intelligence algorithms. Such a computation may not only cost a significant amount of energy but also cause performance issues when the onboard computational resources are limited. To address this issue, this paper proposes an adaptive o… ▽ More

    Submitted 30 July, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: 7 pages

  46. arXiv:2102.01106  [pdf, other

    eess.AS cs.CL cs.SD

    Universal Neural Vocoding with Parallel WaveNet

    Authors: Yunlong Jiao, Adam Gabrys, Georgi Tinchev, Bartosz Putrycz, Daniel Korzekwa, Viacheslav Klimkov

    Abstract: We present a universal neural vocoder based on Parallel WaveNet, with an additional conditioning network called Audio Encoder. Our universal vocoder offers real-time high-quality speech synthesis on a wide range of use cases. We tested it on 43 internal speakers of diverse age and gender, speaking 20 languages in 17 unique styles, of which 7 voices and 5 styles were not exposed during training. We… ▽ More

    Submitted 15 February, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures. Accepted to ICASSP 2021

  47. arXiv:2101.01935  [pdf, other

    eess.AS

    The 2020 Personalized Voice Trigger Challenge: Open Database, Evaluation Metrics and the Baseline Systems

    Authors: Yan Jia, Xingming Wang, Xiaoyi Qin, Yin** Zhang, Xuyang Wang, Junjie Wang, Ming Li

    Abstract: The 2020 Personalized Voice Trigger Challenge (PVTC2020) addresses two different research problems a unified setup: joint wake-up word detection with speaker verification on close-talking single microphone data and far-field multi-channel microphone array data. Specially, the second task poses an additional cross-channel matching challenge on top of the far-field condition. To simulate the real-li… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  48. arXiv:2012.10239  [pdf

    eess.IV physics.optics q-bio.QM

    Computational interference microscopy enabled by deep learning

    Authors: Yuheng Jiao, Yuchen R. He, Mikhail E. Kandel, Xiaojun Liu, Wenlong Lu, Gabriel Popescu

    Abstract: Quantitative phase imaging (QPI) has been widely applied in characterizing cells and tissues. Spatial light interference microscopy (SLIM) is a highly sensitive QPI method, due to its partially coherent illumination and common path interferometry geometry. However, its acquisition rate is limited because of the four-frame phase-shifting scheme. On the other hand, off-axis methods like diffraction… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  49. arXiv:2011.01460  [pdf, other

    cs.LG cs.SD eess.AS

    Training Wake Word Detection with Synthesized Speech Data on Confusion Words

    Authors: Yan Jia, Zexin Cai, Murong Ma, Zeqing Zhao, Xuyang Wang, Junjie Wang, Ming Li

    Abstract: Confusing-words are commonly encountered in real-life keyword spotting applications, which causes severe degradation of performance due to complex spoken terms and various kinds of words that sound similar to the predefined keywords. To enhance the wake word detection system's robustness on such scenarios, we investigate two data augmentation setups for training end-to-end KWS systems. One is invo… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: Submitted to ICASSP 2021

  50. arXiv:2010.11439  [pdf, other

    cs.SD eess.AS

    Parallel Tacotron: Non-Autoregressive and Controllable TTS

    Authors: Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, Ron Weiss, Yonghui Wu

    Abstract: Although neural end-to-end text-to-speech models can synthesize highly natural speech, there is still room for improvements to its efficiency and naturalness. This paper proposes a non-autoregressive neural text-to-speech model augmented with a variational autoencoder-based residual encoder. This model, called \emph{Parallel Tacotron}, is highly parallelizable during both training and inference, a… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.