Skip to main content

Showing 1–50 of 68 results for author: Bai, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18558  [pdf, other

    cs.CV eess.IV

    BAISeg: Boundary Assisted Weakly Supervised Instance Segmentation

    Authors: Tengbo Wang, Yu Bai

    Abstract: How to extract instance-level masks without instance-level supervision is the main challenge of weakly supervised instance segmentation (WSIS). Popular WSIS methods estimate a displacement field (DF) via learning inter-pixel relations and perform clustering to identify instances. However, the resulting instance centroids are inherently unstable and vary significantly across different clustering al… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  2. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  3. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  4. GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression

    Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, Wen Gao

    Abstract: Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we p… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE TCSVT

  5. arXiv:2404.11275  [pdf, other

    cs.SD eess.AS

    Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation

    Authors: Ye Bai, Chenxing Li, Hao Li, Yuanyuan Zhao, Xiaorui Wang

    Abstract: In short video and live broadcasts, speech, singing voice, and background music often overlap and obscure each other. This complexity creates difficulties in structuring and recognizing the audio content, which may impair subsequent ASR and music understanding applications. This paper proposes a multi-task audio source separation (MTASS) based ASR model called JRSV, which Jointly Recognizes Speech… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by ICME 2024

  6. arXiv:2404.06393  [pdf, other

    cs.SD cs.AI eess.AS

    MuPT: A Generative Symbolic Music Pretrained Transformer

    Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

    Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  7. arXiv:2403.17392  [pdf, other

    cs.RO eess.SY nlin.AO

    Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrain

    Authors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki Ogura

    Abstract: Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable roboti… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  8. arXiv:2403.10585  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Solving General Noisy Inverse Problem via Posterior Sampling: A Policy Gradient Viewpoint

    Authors: Haoyue Tang, Tian Xie, Aosong Feng, Hanyu Wang, Chenyang Zhang, Yang Bai

    Abstract: Solving image inverse problems (e.g., super-resolution and inpainting) requires generating a high fidelity image that matches the given input (the low-resolution image or the masked image). By using the input image as guidance, we can leverage a pretrained diffusion generative model to solve a wide range of image inverse tasks without task specific model fine-tuning. To precisely estimate the guid… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted and to Appear, AISTATS 2024

  9. arXiv:2401.14007  [pdf, other

    eess.IV cs.CV

    Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression

    Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu

    Abstract: Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, color shifting, and texture loss, thereby compromising perceptual quality of images. To address these issues, this study presents an enhanced neural compression method designed for opti… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 7 pages, 4 figures

  10. arXiv:2310.14270  [pdf, other

    eess.AS cs.SD

    Diffusion-Based Adversarial Purification for Speaker Verification

    Authors: Yibo Bai, Xiao-Lei Zhang

    Abstract: Recently, automatic speaker verification (ASV) based on deep learning is easily contaminated by adversarial attacks, which is a new type of attack that injects imperceptible perturbations to audio signals so as to make ASV produce wrong decisions. This poses a significant threat to the security and reliability of ASV systems. To address this issue, we propose a Diffusion-Based Adversarial Purifica… ▽ More

    Submitted 24 October, 2023; v1 submitted 22 October, 2023; originally announced October 2023.

  11. arXiv:2309.10740  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

    Authors: Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi

    Abstract: Diffusion models are instrumental in text-to-audio (TTA) generation. Unfortunately, they suffer from slow inference due to an excessive number of queries to the underlying denoising network per generation. To address this bottleneck, we introduce ConsistencyTTA, a framework requiring only a single non-autoregressive network query, thereby accelerating TTA by hundreds of times. We achieve so by pro… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  12. arXiv:2309.05908  [pdf, other

    eess.SY

    Reset Controller Synthesis by Reach-avoid Analysis for Delay Hybrid Systems

    Authors: Han Su, Jiyu Zhu, Shenghua Feng, Yunjun Bai, Bin Gu, Jiang Liu, Mengfei Yang, Naijun Zhan

    Abstract: A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid… ▽ More

    Submitted 27 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 15 pages, 10 figures

  13. arXiv:2309.05906  [pdf, other

    eess.SY

    Correct-by-Construction for Hybrid Systems by Synthesizing Reset Controller

    Authors: Jiang Liu, Han Su, Yunjun Bai, Bin Gu, Bai Xue, Mengfei Yang, Naijun Zhan

    Abstract: Controller synthesis, including reset controller, feedback controller, and switching logic controller, provides an essential mechanism to guarantee the correctness and reliability of hybrid systems in a correct-by-construction manner. Unfortunately, reset controller synthesis is still in an infant stage in the literature, although it makes theoretical and practical significance. In this paper, we… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 26 pages, 8 figures

  14. arXiv:2307.15980  [pdf, other

    cs.LG eess.SY

    Initial State Interventions for Deconfounded Imitation Learning

    Authors: Samuel Pfrommer, Yatong Bai, Hyunin Lee, Somayeh Sojoudi

    Abstract: Imitation learning suffers from causal confusion. This phenomenon occurs when learned policies attend to features that do not causally influence the expert actions but are instead spuriously correlated. Causally confused agents produce low open-loop supervised loss but poor closed-loop performance upon deployment. We consider the problem of masking observed confounders in a disentangled representa… ▽ More

    Submitted 11 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

    Comments: 62nd IEEE Conference on Decision and Control

  15. arXiv:2306.16710  [pdf

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

    Authors: Simone Wills, Yu Bai, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik

    Abstract: Voicebots have provided a new avenue for supporting the development of language skills, particularly within the context of second language learning. Voicebots, though, have largely been geared towards native adult speakers. We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI, with a view to develo** a voicebot that can support children acquiring a f… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: Published on SLATE 2023, Esmad, Politecnico Do Porto, Portugal, 26-28 June, 2023, pp: 11:1-11:8

    Journal ref: 12th Symposium on Languages, Applications and Technologies (SLATE 2023) (p. 7:1-7:8)

  16. Visual-Aware Text-to-Speech

    Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei

    Abstract: Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction. For example, the speaker could take advantage of the listener's facial expression to adjust the tones, stressed syllables, or pauses. In this work, we present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and s… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: accepted as oral and top 3% paper by ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, 1-5

  17. arXiv:2306.04190  [pdf

    cs.CL cs.LG cs.SD eess.AS

    An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

    Authors: Yu Bai, Cristian Tejedor-Garcia, Ferdy Hubers, Catia Cucchiarini, Helmer Strik

    Abstract: The interest in employing automatic speech recognition (ASR) in applications for reading practice has been growing in recent years. In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read. We saw that ASR has potential at this stage of the reading process, as the results suggested that pup… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Published (double-blind peer-reviewed) on SPECOM 2021

    Journal ref: In: Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science, vol 12997. Springer, Cham

  18. arXiv:2306.02982  [pdf, other

    cs.CL eess.AS

    PolyVoice: Language Models for Speech to Speech Translation

    Authors: Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yu** Wang, Mingxuan Wang, Yuxuan Wang

    Abstract: We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synthesis language model. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt… ▽ More

    Submitted 13 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

  19. arXiv:2306.01232  [pdf, other

    eess.IV cs.CV

    Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

    Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray is often utilized for diagnosing common thoracic diseases. In recent years, many approaches have been proposed to handle the problem of automatic diagnosis based on chest X-rays. However, the scarcity of labeled data for related diseases still poses a huge challenge to an accurate diagnosis. In this paper, we focus on the thorax disease diagnostic problem and propose a novel deep r… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  20. arXiv:2305.12072  [pdf, other

    eess.IV cs.CV

    Chest X-ray Image Classification: A Causal Perspective

    Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray (CXR) is one of the most common and easy-to-get medical tests used to diagnose common diseases of the chest. Recently, many deep learning-based methods have been proposed that are capable of effectively classifying CXRs. Even though these techniques have worked quite well, it is difficult to establish whether what these algorithms actually learn is the cause-and-effect link between… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  21. arXiv:2305.12070  [pdf, other

    eess.IV cs.CV

    Instrumental Variable Learning for Chest X-ray Classification

    Authors: Weizhi Nie, Chen Zhang, Dan song, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray (CXR) is commonly employed to diagnose thoracic illnesses, but the challenge of achieving accurate automatic diagnosis through this method persists due to the complex relationship between pathology. In recent years, various deep learning-based approaches have been suggested to tackle this problem but confounding factors such as image resolution or noise problems often damage model… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  22. arXiv:2305.07278  [pdf, ps, other

    eess.SP

    Deep Learning for Asynchronous Massive Access with Data Frame Length Diversity

    Authors: Yanna Bai, Wei Chen, Bo Ai, Petar Popovski

    Abstract: Grant-free non-orthogonal multiple access has been regarded as a viable approach to accommodate access for a massive number of machine-type devices with small data packets. The sporadic activation of the devices creates a multiuser setup where it is suitable to use compressed sensing in order to detect the active devices and decode their data. We consider asynchronous access of machine-type device… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  23. A High-Performance Accelerator for Super-Resolution Processing on Embedded GPU

    Authors: Wenqian Zhao, Qi Sun, Yang Bai, Wenbo Li, Haisheng Zheng, Bei Yu, Martin D. F. Wong

    Abstract: Recent years have witnessed impressive progress in super-resolution (SR) processing. However, its real-time inference requirement sets a challenge not only for the model design but also for the on-chip implementation. In this paper, we implement a full-stack SR acceleration framework on embedded GPU devices. The special dictionary learning algorithm used in SR models was analyzed in detail and acc… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  24. arXiv:2301.12048  [pdf, other

    cs.CV eess.IV

    Making Reconstruction-based Method Great Again for Video Anomaly Detection

    Authors: Yizhou Wang, Can Qin, Yue Bai, Yi Xu, Xu Ma, Yun Fu

    Abstract: Anomaly detection in videos is a significant yet challenging problem. Previous approaches based on deep neural networks employ either reconstruction-based or prediction-based approaches. Nevertheless, existing reconstruction-based methods 1) rely on old-fashioned convolutional autoencoders and are poor at modeling temporal dependency; 2) are prone to overfit the training samples, leading to indist… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: Accepted by ICDM 2022

  25. arXiv:2301.10314  [pdf, other

    cs.HC cs.SD eess.AS

    WhisperWand: Simultaneous Voice and Gesture Tracking Interface

    Authors: Yang Bai, Irtaza Shahid, Harshvardhan Takawale, Nirupam Roy

    Abstract: This paper presents the design and implementation of WhisperWand, a comprehensive voice and motion tracking interface for voice assistants. Distinct from prior works, WhisperWand is a precise tracking interface that can co-exist with the voice interface on low sampling rate voice assistants. Taking handwriting as a specific application, it can also capture natural strokes and the individualized st… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

  26. arXiv:2209.14539  [pdf, other

    eess.SY physics.app-ph physics.optics

    Transmission Model for Resonant Beam SWIPT with Telescope Internal Modulator

    Authors: Wen Fang, Yunfeng Bai, Qingwen Liu, Shengli Zhou

    Abstract: To satisfy the long-range and energy self-sustaining communication needs of electronic devices in the Internet of Things (IoT), we introduce a simultaneous wireless information and power transfer (SWIPT) system using the resonant beam that incorporates a telescope modulator inside a cavity for suppressing diffraction losses. We theoretically analyze power transfer in the resonant beam system with… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  27. arXiv:2209.08326  [pdf, other

    eess.AS cs.CL

    Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

    Authors: Ye Bai, Jie Li, Wen**g Han, Hao Ni, Kaituo Xu, Zhuo Zhang, Cheng Yi, Xiaorui Wang

    Abstract: While transformers and their variant conformers show promising performance in speech recognition, the parameterized property leads to much memory cost during training and inference. Some works use cross-layer weight-sharing to reduce the parameters of the model. However, the inevitable loss of capacity harms the model performance. To address this issue, this paper proposes a parameter-efficient co… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: accepted in INTERSPEECH 2022

  28. arXiv:2209.05951  [pdf, ps, other

    cs.IT eess.SP

    Data-Driven Compressed Sensing for Massive Wireless Access

    Authors: Yanna Bai, Wei Chen, Feifei Sun, Bo Ai, Petar Popovski

    Abstract: The central challenge in massive machine-type communications (mMTC) is to connect a large number of uncoordinated devices through a limited spectrum. The typical mMTC communication pattern is sporadic, with short packets. This could be exploited in grant-free random access in which the activity detection, channel estimation, and data recovery are formulated as a sparse recovery problem and solved… ▽ More

    Submitted 28 September, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: in IEEE Communication Magazine vol:60, iss:11, 2022

  29. arXiv:2209.04847  [pdf, other

    eess.IV cs.CV

    Deep Lossy Plus Residual Coding for Lossless and Near-lossless Image Compression

    Authors: Yuanchao Bai, Xianming Liu, Kai Wang, Xiangyang Ji, Xiaolin Wu, Wen Gao

    Abstract: Lossless and near-lossless image compression is of paramount importance to professional users in many technical fields, such as medicine, remote sensing, precision engineering and scientific research. But despite rapidly growing research interests in learning-based image compression, no published method offers both lossless and near-lossless modes. In this paper, we propose a unified and powerful… ▽ More

    Submitted 10 January, 2024; v1 submitted 11 September, 2022; originally announced September 2022.

    Comments: manuscript accepted by TPAMI, source code:https://github.com/BYchao100/Deep-Lossy-Plus-Residual-Coding

  30. arXiv:2204.08187  [pdf, other

    math.OC eess.SY

    Securing Signal-free Intersections against Strategic Jamming Attacks: A Macroscopic Approach

    Authors: Yumeng Bai, Saurabh Amin, Xudong Wang, Li **

    Abstract: We consider the security-by-design of a signal-free intersection for connected and autonomous vehicles in the face of strategic jamming attacks. We use a fluid model to characterize macroscopic traffic flow through the intersection, where the saturation rate is derived from a vehicle coordination algorithm. We model jamming attacks as sudden increase in communication latency induced on vehicle-to-… ▽ More

    Submitted 18 September, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

    Comments: Accepted by 2022 IEEE Conference on Decision and Control(CDC)

  31. arXiv:2204.03329  [pdf

    cs.RO eess.SY

    Information-driven Path Planning for Hybrid Aerial Underwater Vehicles

    Authors: Zheng Zeng, Chengke Xiong, Xinyi Yuan, Yulin Bai, Yufei **, Di Lu, Lian Lian

    Abstract: This paper presents a novel Rapidly-exploring Adaptive Sampling Tree (RAST) algorithm for the adaptive sampling mission of a hybrid aerial underwater vehicle (HAUV) in an air-sea 3D environment. This algorithm innovatively combines the tournament-based point selection sampling strategy, the information heuristic search process and the framework of Rapidly-exploring Random Tree (RRT) algorithm. Hen… ▽ More

    Submitted 8 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  32. arXiv:2203.02291  [pdf, other

    cs.CV cs.SD eess.AS

    Freeform Body Motion Generation from Speech

    Authors: **g Xu, Wei Zhang, Yalong Bai, Qibin Sun, Tao Mei

    Abstract: People naturally conduct spontaneous body motions to enhance their speeches while giving talks. Body motion generation from speech is inherently difficult due to the non-deterministic map** from speech to body motions. Most existing works map speech to motion in a deterministic way by conditioning on certain styles, leading to sub-optimal results. Motivated by studies in linguistics, we decompos… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

  33. arXiv:2202.08433  [pdf, ps, other

    cs.SD cs.LG eess.AS

    ADD 2022: the First Audio Deep Synthesis Detection Challenge

    Authors: Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

    Abstract: Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake gam… ▽ More

    Submitted 26 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  34. arXiv:2112.09300  [pdf, other

    cs.CV eess.IV

    Towards End-to-End Image Compression and Analysis with Transformers

    Authors: Yuanchao Bai, Xu Yang, Xianming Liu, Junjun Jiang, Yaowei Wang, Xiangyang Ji, Wen Gao

    Abstract: We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression w… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted by AAAI 2022; Code: https://github.com/BYchao100/Towards-Image-Compression-and-Analysis-with-Transformers

  35. arXiv:2109.04960  [pdf, other

    eess.IV cs.CV

    Automatic Displacement and Vibration Measurement in Laboratory Experiments with A Deep Learning Method

    Authors: Yongsheng Bai, Ramzi M. Abduallah, Halil Sezen, Alper Yilmaz

    Abstract: This paper proposes a pipeline to automatically track and measure displacement and vibration of structural specimens during laboratory experiments. The latest Mask Regional Convolutional Neural Network (Mask R-CNN) can locate the targets and monitor their movement from videos recorded by a stationary camera. To improve precision and remove the noise, techniques such as Scale-invariant Feature Tran… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Journal ref: IEEE Sensors 2021

  36. arXiv:2108.00004  [pdf, ps, other

    cs.ET cs.IT eess.SP physics.optics

    Long-Range Optical Wireless Information and Power Transfer

    Authors: Yunfeng Bai, Qingwen Liu, Riqing Chen, Qingqing Zhang, Wei Wang

    Abstract: Simultaneous wireless information and power transfer (SWIPT) is a remarkable technology to support both the data and the energy transfer in the era of Internet of Things (IoT). In this paper, we proposed a long-range optical wireless information and power transfer system utilizing retro-reflectors, a gain medium, a telescope internal modulator to form the resonant beam, achieving high-power and hi… ▽ More

    Submitted 6 July, 2022; v1 submitted 29 July, 2021; originally announced August 2021.

  37. arXiv:2107.14458  [pdf, ps, other

    cs.ET cs.IT eess.SP

    High-Efficiency Resonant Beam Charging and Communication

    Authors: Yunfeng Bai, Qingwen Liu, Xin Wang, Yudan Gou, Bin Zhou, Zhiyong Bu

    Abstract: With the development of Internet of Things (IoT), demands of power and data for IoT devices increase drastically. In order to resolve the supply-demand contradiction, simultaneous wireless information and power transfer (SWIPT) has been envisioned as an enabling technology by providing high-power energy transfer and high-rate data delivering concurrently. In this paper, we introduce a high-efficie… ▽ More

    Submitted 4 January, 2024; v1 submitted 30 July, 2021; originally announced July 2021.

  38. arXiv:2105.13174  [pdf, other

    eess.SP

    Charging A Smartphone Over the Air: The Resonant Beam Charging Method

    Authors: Qingwen Liu, Mingqing Xiong, Mingqing Liu, Qingwei Jiang, Wen Fang, Yunfeng Bai

    Abstract: Wireless charging for mobile Internet of Things (IoT) devices such as smartphones is extremely difficult. To reduce energy dissipation during wireless transmission in mobile scenarios, laser or narrow radio beams with sophisticated tracking control are typically required. However, reaching the necessary tracking accuracy and reliability is really difficult. In this paper, inspired by the features… ▽ More

    Submitted 12 January, 2022; v1 submitted 24 May, 2021; originally announced May 2021.

  39. Continual Learning for Fake Audio Detection

    Authors: Haoxin Ma, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Chenglong Wang

    Abstract: Fake audio attack becomes a major threat to the speaker verification system. Although current detection approaches have achieved promising results on dataset-specific scenarios, they encounter difficulties on unseen spoofing data. Fine-tuning and retraining from scratch have been applied to incorporate new data. However, fine-tuning leads to performance degradation on previous data. Retraining tak… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: 5 pages, conference

    Journal ref: Proc. Interspeech 2021, 886-890

  40. arXiv:2104.03617  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Half-Truth: A Partially Fake Audio Detection Dataset

    Authors: Jiangyan Yi, Ye Bai, Jianhua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu

    Abstract: Diverse promising datasets have been designed to hold back the development of fake audio detection, such as ASVspoof databases. However, previous datasets ignore an attacking situation, in which the hacker hides some small fake clips in real speech audio. This poses a serious threat since that it is difficult to distinguish the small fake clip from the whole speech utterance. Therefore, this paper… ▽ More

    Submitted 15 December, 2023; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: accepted by Interspeech 2021

  41. arXiv:2104.02882  [pdf, other

    eess.AS cs.CL cs.SD

    FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

    Authors: Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen

    Abstract: Transducer-based models, such as RNN-Transducer and transformer-transducer, have achieved great success in speech recognition. A typical transducer model decodes the output sequence conditioned on the current acoustic state and previously predicted tokens step by step. Statistically, The number of blank tokens in the prediction results accounts for nearly 90\% of all tokens. It takes a lot of comp… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: Submitted to INTERSPEECH2021

  42. TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

    Authors: Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen, Xuefei Liu

    Abstract: The autoregressive (AR) models, such as attention-based encoder-decoder models and RNN-Transducer, have achieved great success in speech recognition. They predict the output sequence conditioned on the previous tokens and acoustic encoded states, which is inefficient on GPUs. The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech2021

  43. arXiv:2103.17015  [pdf, other

    eess.IV cs.CV

    Learning Scalable $\ell_\infty$-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression

    Authors: Yuanchao Bai, Xianming Liu, Wangmeng Zuo, Yaowei Wang, Xiangyang Ji

    Abstract: We propose a novel joint lossy image and residual compression framework for learning $\ell_\infty$-constrained near-lossless image compression. Specifically, we obtain a lossy reconstruction of the raw image through lossy image compression and uniformly quantize the corresponding residual to satisfy a given tight $\ell_\infty$ error bound. Suppose that the error bound is zero, i.e., lossless image… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

    Comments: Accepted by CVPR 2021; Code: https://github.com/BYchao100/Scalable-Near-lossless-Image-Compression

  44. arXiv:2103.15858  [pdf, other

    eess.IV cs.CV

    CateNorm: Categorical Normalization for Robust Medical Image Segmentation

    Authors: Junfei Xiao, Lequan Yu, Zongwei Zhou, Yutong Bai, Lei Xing, Alan Yuille, Yuyin Zhou

    Abstract: Batch normalization (BN) uniformly shifts and scales the activations based on the statistics of a batch of images. However, the intensity distribution of the background pixels often dominates the BN statistics because the background accounts for a large proportion of the entire image. This paper focuses on enhancing BN with the intensity distribution of foreground pixels, the one that really matte… ▽ More

    Submitted 4 August, 2022; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: Accepted by MICCAI 2022 Workshop on Domain Adaptation and Representation Transfer (DART)

  45. arXiv:2103.11565  [pdf, other

    eess.SY

    Switching Controller Synthesis for Delay Hybrid Systems under Perturbations

    Authors: Yunjun Bai, Ting Gan, Li Jiao, Bican Xia, Bai Xue, Naijun Zhan

    Abstract: Delays are ubiquitous in modern hybrid systems, which exhibit both continuous and discrete dynamical behaviors. Induced by signal transmission, conversion, the nature of plants, and so on, delays may appear either in the continuous evolution of a hybrid system such that the evolution depends not only on the present state but also on its execution history, or in the discrete switching between its d… ▽ More

    Submitted 21 March, 2021; originally announced March 2021.

  46. arXiv:2102.07594  [pdf, other

    cs.CL cs.AI eess.AS

    Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT

    Authors: Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

    Abstract: Attention-based encoder-decoder (AED) models have achieved promising performance in speech recognition. However, because the decoder predicts text tokens (such as characters or words) in an autoregressive manner, it is difficult for an AED model to predict all tokens in parallel. This makes the inference speed relatively slow. We believe that because the encoder already captures the whole speech u… ▽ More

    Submitted 29 August, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: 14 pages, 7 figures

  47. arXiv:2012.10533  [pdf, other

    eess.IV cs.CV

    Atlas-ISTN: Joint Segmentation, Registration and Atlas Construction with Image-and-Spatial Transformer Networks

    Authors: Matthew Sinclair, Andreas Schuh, Karl Hahn, Kersten Petersen, Ying Bai, James Batten, Michiel Schaap, Ben Glocker

    Abstract: Deep learning models for semantic segmentation are able to learn powerful representations for pixel-wise predictions, but are sensitive to noise at test time and do not guarantee a plausible topology. Image registration models on the other hand are able to warp known topologies to target images as a means of segmentation, but typically require large amounts of training data, and have not widely be… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

    Comments: 33 pages, 15 figures

  48. arXiv:2011.03098  [pdf

    cs.CV cs.LG eess.IV

    End-to-end Deep Learning Methods for Automated Damage Detection in Extreme Events at Various Scales

    Authors: Yongsheng Bai, Halil Sezen, Alper Yilmaz

    Abstract: Robust Mask R-CNN (Mask Regional Convolu-tional Neural Network) methods are proposed and tested for automatic detection of cracks on structures or their components that may be damaged during extreme events, such as earth-quakes. We curated a new dataset with 2,021 labeled images for training and validation and aimed to find end-to-end deep neural networks for crack detection in the field. With dat… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

  49. arXiv:2010.14798  [pdf, other

    cs.SD cs.CL eess.AS

    Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

    Authors: Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Zhengqi wen

    Abstract: Despite the recent significant advances witnessed in end-to-end (E2E) ASR system for code-switching, hunger for audio-text paired data limits the further improvement of the models' performance. In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage. The model is decoupled into two parts:… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figures

  50. arXiv:2010.14791  [pdf, other

    eess.AS

    One In A Hundred: Select The Best Predicted Sequence from Numerous Candidates for Streaming Speech Recognition

    Authors: Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen

    Abstract: The RNN-Transducers and improved attention-based encoder-decoder models are widely applied to streaming speech recognition. Compared with these two end-to-end models, the CTC model is more efficient in training and inference. However, it cannot capture the linguistic dependencies between the output tokens. Inspired by the success of two-pass end-to-end models, we introduce a transformer decoder an… ▽ More

    Submitted 3 April, 2021; v1 submitted 28 October, 2020; originally announced October 2020.