Skip to main content

Showing 1–50 of 129 results for author: Gong, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18625  [pdf, other

    cs.SD cs.AI eess.AS

    Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

    Authors: Liming Wang, Yuan Gong, Nauman Dawalatabad, Marco Vilela, Katerina Placek, Brian Tracey, Yishu Gong, Alan Premasiri, Fernando Vieira, James Glass

    Abstract: Automatic prediction of amyotrophic lateral sclerosis (ALS) disease progression provides a more efficient and objective alternative than manual approaches. We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predictor of ALS disease progression from longitudinal speech recordings of ALS patients. By taking advantage of high-quality pretrained speech features and… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.14931  [pdf, other

    eess.SP

    Multi-beam Training for Near-field Communications in High-frequency Bands

    Authors: Cong Zhou, Changsheng You, Zixuan Huang, Shuo Shi, Yi Gong, Chan-Byoung Chae, Kaibin Huang

    Abstract: In this paper, we study efficient multi-beam training design for near-field communications to reduce the beam training overhead of conventional single-beam training methods. In particular, the array-division based multi-beam training method, which is widely used in far-field communications, cannot be directly applied to the near-field scenario, since different sub-arrays may observe different user… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: In this paper, a novel near-field multi-beam training scheme is proposed by sparsely activating a portion of antennas to form a sparse linear array

  3. arXiv:2406.13205  [pdf

    eess.IV cs.CV

    Application of Computer Deep Learning Model in Diagnosis of Pulmonary Nodules

    Authors: Yutian Yang, Hongjie Qiu, Yulu Gong, Xiaoyi Liu, Yang Lin, Muqing Li

    Abstract: The 3D simulation model of the lung was established by using the reconstruction method. A computer aided pulmonary nodule detection model was constructed. The process iterates over the images to refine the lung nodule recognition model based on neural networks. It is integrated with 3D virtual modeling technology to improve the interactivity of the system, so as to achieve intelligent recognition… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    MSC Class: 68T10; 92C50

  4. arXiv:2406.11158  [pdf, other

    eess.SY

    Dynamic Modeling and Control for an Offshore Semisubmersible Floating Wind Turbine

    Authors: Yingjie Gong, Qinmin Yang, Hua Geng, Wenchao Meng, Lin Wang

    Abstract: Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a sem… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.10082  [pdf, other

    eess.AS cs.CV cs.SD

    Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

    Authors: Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James Glass

    Abstract: Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models such as Whisper are trained with hundreds of thousands of hours of data, and thus learn a better speech-to-text decoder. The huge training data differe… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024. Code https://github.com/roudimit/whisper-flamingo

  6. arXiv:2405.05446  [pdf, other

    cs.CV cs.AI cs.GR cs.LG eess.IV

    GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields

    Authors: Yuanhao Gong

    Abstract: The 3D Gaussian splatting methods are getting popular. However, they work directly on the signal, leading to a dense representation of the signal. Even with some techniques such as pruning or distillation, the results are still dense. In this paper, we propose to model the gradient of the original signal. The gradients are much sparser than the original signal. Therefore, the gradients use much le… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.09105

  7. arXiv:2404.19087  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Deep Reinforcement Learning for Advanced Longitudinal Control and Collision Avoidance in High-Risk Driving Scenarios

    Authors: Dianwei Chen, Yaobang Gong, Xianfeng Yang

    Abstract: Existing Advanced Driver Assistance Systems primarily focus on the vehicle directly ahead, often overlooking potential risks from following vehicles. This oversight can lead to ineffective handling of high risk situations, such as high speed, closely spaced, multi vehicle scenarios where emergency braking by one vehicle might trigger a pile up collision. To overcome these limitations, this study i… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  8. arXiv:2404.09105  [pdf, other

    cs.CV cs.AI cs.GR eess.IV

    EGGS: Edge Guided Gaussian Splatting for Radiance Fields

    Authors: Yuanhao Gong

    Abstract: The Gaussian splatting methods are getting popular. However, their loss function only contains the $\ell_1$ norm and the structural similarity between the rendered and input images, without considering the edges in these images. It is well-known that the edges in an image provide important information. Therefore, in this paper, we propose an Edge Guided Gaussian Splatting (EGGS) method that levera… ▽ More

    Submitted 22 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  9. arXiv:2404.07121  [pdf, other

    cs.IT eess.SP

    Digital Over-the-Air Computation: Achieving High Reliability via Bit-Slicing

    Authors: Jiawei Liu, Yi Gong, Kaibin Huang

    Abstract: 6G mobile networks aim to realize ubiquitous intelligence at the network edge via distributed learning, sensing, and data analytics. Their common operation is to aggregate high-dimensional data, which causes a communication bottleneck that cannot be resolved using traditional orthogonal multi-access schemes. A promising solution, called over-the-air computation (AirComp), exploits channels' wavefo… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  10. arXiv:2403.16212  [pdf, other

    eess.IV cs.CV cs.LG

    Leveraging Deep Learning and Xception Architecture for High-Accuracy MRI Classification in Alzheimer Diagnosis

    Authors: Shaojie Li, Haichen Qu, Xinqi Dong, Bo Dang, Hengyi Zang, Yulu Gong

    Abstract: Exploring the application of deep learning technologies in the field of medical diagnostics, Magnetic Resonance Imaging (MRI) provides a unique perspective for observing and diagnosing complex neurodegenerative diseases such as Alzheimer Disease (AD). With advancements in deep learning, particularly in Convolutional Neural Networks (CNNs) and the Xception network architecture, we are now able to a… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  11. arXiv:2403.14775  [pdf, ps, other

    cs.IT eess.SP

    RIS-Aided Cooperative Mobile Edge Computing: Computation Efficiency Maximization via Joint Uplink and Downlink Resource Allocation

    Authors: Zhenrong Liu, Zongze Li, Yi Gong, Yik-Chung Wu

    Abstract: In mobile edge computing (MEC) systems, the wireless channel condition is a critical factor affecting both the communication power consumption and computation rate of the offloading tasks. This paper exploits the idea of cooperative transmission and employing reconfigurable intelligent surface (RIS) in MEC to improve the channel condition and maximize computation efficiency (CE). The resulting pro… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Wireless Communications

  12. arXiv:2403.14244  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering

    Authors: Yuanhao Gong, Lantao Yu, Guanghui Yue

    Abstract: The 3D Gaussian splatting method has drawn a lot of attention, thanks to its high performance in training and high quality of the rendered image. However, it uses anisotropic Gaussian kernels to represent the scene. Although such anisotropic kernels have advantages in representing the geometry, they lead to difficulties in terms of computation, such as splitting or merging two kernels. In this pap… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  13. arXiv:2402.15939  [pdf

    eess.IV cs.LG

    Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI

    Authors: Zi Wang, Min Xiao, Yirong Zhou, Chengyan Wang, Naiming Wu, Yi Li, Yiwen Gong, Shufu Chang, Yinyin Chen, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Di Guo, Guang Yang, Xiaobo Qu

    Abstract: Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge leads to necessitate extensive training data in many deep learning reconstruction methods. This work proposes a novel and efficient approach, levera… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 10 pages, 11 figures, 3 tables

  14. arXiv:2402.06630  [pdf

    physics.soc-ph eess.SP

    The Dilemma of Standardizing Indoor Photovoltaic Characterisation: Embracing Diversity for Powering the IoT

    Authors: Zacharie Jehl Li-Kao, Kunal J. Tiwari, Sergio Giraldo, Marcel Placidi, Yuancai Gong, Arindam Basak, Taizo Kobayashi, Jon Major, Edgardo Saucedo

    Abstract: In this viewpoint contribution, we argue that the emerging landscape of indoor photovoltaics poses unique challenges that transcend the capabilities of a singular standard, unlike what the community has become accustomed with the success of the AM1.x standard for outdoor application. We aim at illustrating the pitfalls associated with a one-size-fits-all approach to standardisation, emphasising th… ▽ More

    Submitted 14 January, 2024; originally announced February 2024.

    Comments: References are currently missing

  15. arXiv:2401.15354  [pdf, other

    eess.IV cs.CV

    DeepGI: An Automated Approach for Gastrointestinal Tract Segmentation in MRI Scans

    Authors: Ye Zhang, Yulu Gong, Dongji Cui, Xinrui Li, Xinyu Shen

    Abstract: Gastrointestinal (GI) tract cancers pose a global health challenge, demanding precise radiotherapy planning for optimal treatment outcomes. This paper introduces a cutting-edge approach to automate the segmentation of GI tract regions in magnetic resonance imaging (MRI) scans. Leveraging advanced deep learning architectures, the proposed model integrates Inception-V4 for initial classification, UN… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  16. arXiv:2401.10345  [pdf, other

    eess.IV

    Attack and Defense Analysis of Learned Image Compression

    Authors: Tianyu Zhu, Heming Sun, Xiankui Xiong, Xuanpeng Zhu, Yong Gong, Minge **g, Yibo Fan

    Abstract: Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare… ▽ More

    Submitted 27 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  17. arXiv:2401.08887  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

    Authors: Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

    Abstract: We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: preprint

  18. arXiv:2312.14448  [pdf, other

    cs.NI eess.SP

    Quantum-Assisted Joint Caching and Power Allocation for Integrated Satellite-Terrestrial Networks

    Authors: Yu Zhang, Yanmin Gong, Lei Fan, Yu Wang, Zhu Han, Yuanxiong Guo

    Abstract: Low earth orbit (LEO) satellite network can complement terrestrial networks for achieving global wireless coverage and improving delay-sensitive Internet services. This paper proposes an integrated satellite-terrestrial network (ISTN) architecture to provide ground users with seamless and reliable content delivery services. For optimal service provisioning in this architecture, we formulate an opt… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  19. arXiv:2312.13683  [pdf, other

    eess.SP cs.IT

    Joint Channel Estimation and Cooperative Localization for Near-Field Ultra-Massive MIMO

    Authors: Ruoxiao Cao, Hengtao He, Xianghao Yu, Shenghui Song, Kaibin Huang, Jun Zhang, Yi Gong, Khaled B. Letaief

    Abstract: The next-generation (6G) wireless networks are expected to provide not only seamless and high data-rate communications, but also ubiquitous sensing services. By providing vast spatial degrees of freedom (DoFs), ultra-massive multiple-input multiple-output (UM-MIMO) technology is a key enabler for both sensing and communications in 6G. However, the adoption of UM-MIMO leads to a shift from the far… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Submit to JSAC

  20. arXiv:2311.09850  [pdf, other

    cs.IT eess.SP

    Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation

    Authors: Tianyu Liu, Changsheng You, Zeyang Hu, Chenyu Wu, Yi Gong, Kaibin Huang

    Abstract: Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challen… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 6 pages, 4 figures, accepted for IEEE Global Communication Conference (GLOBECOM) 2023 Workshop

  21. arXiv:2310.01342  [pdf, other

    cs.IT eess.SP

    Near-field Integrated Sensing and Communication: Opportunities and Challenges

    Authors: Jiayi Cong, Changsheng You, Jiapeng Li, Li Chen, Beixiong Zheng, Yuanwei Liu, Wen Wu, Yi Gong, Shi **, Rui Zhang

    Abstract: With the extremely large-scale array XL-array deployed in future wireless systems, wireless communication and sensing are expected to operate in the radiative near-field region, which needs to be characterized by the spherical rather than planar wavefronts. Unlike most existing works that considered far-field integrated sensing and communication (ISAC), we study in this article the new near-field… ▽ More

    Submitted 17 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: This work is submitted to IEEE for possible publication

  22. arXiv:2309.14405  [pdf, other

    cs.SD cs.AI eess.AS

    Joint Audio and Speech Understanding

    Authors: Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

    Abstract: Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perce… ▽ More

    Submitted 10 December, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted at ASRU 2023. Code, dataset, and pretrained models are at https://github.com/yuangongnd/ltu. Interactive demo at https://huggingface.co/spaces/yuangongfdu/ltu-2

  23. arXiv:2309.11161  [pdf, other

    cs.IT eess.SP

    Beamforming Design for RIS-Aided THz Wideband Communication Systems

    Authors: Yihang Jiang, Ziqin Zhou, Xiaoyang Li, Yi Gong

    Abstract: Benefiting from tens of GHz of bandwidth, terahertz (THz) communications has become a promising technology for future 6G networks. However, the conventional hybrid beamforming architecture based on frequency-independent phase-shifters is not able to cope with the beam split effect (BSE) in THz massive multiple-input multiple-output (MIMO) systems. Despite some work introducing the frequency-depend… ▽ More

    Submitted 21 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  24. arXiv:2309.07369  [pdf, other

    eess.AS cs.CL cs.SD

    Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

    Authors: Shaoshi Ling, Guoli Ye, Rui Zhao, Yifan Gong

    Abstract: Attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years. However, the joint optimization of acoustic model and language model in end-to-end manner has created challenges for text adaptation. In particular, effectively, quickly and inexpensively adapting text has become a primary concern for deploying AED systems in industry. To address this issue,… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  25. arXiv:2309.01875  [pdf, other

    cs.CV cs.LG cs.MM cs.PF eess.IV

    Gradient Domain Diffusion Models for Image Synthesis

    Authors: Yuanhao Gong

    Abstract: Diffusion models are getting popular in generative image and video synthesis. However, due to the diffusion process, they require a large number of steps to converge. To tackle this issue, in this paper, we propose to perform the diffusion process in the gradient domain, where the convergence becomes faster. There are two reasons. First, thanks to the Poisson equation, the gradient domain is mathe… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  26. arXiv:2308.10009  [pdf, other

    eess.SP

    Realizing In-Memory Baseband Processing for Ultra-Fast and Energy-Efficient 6G

    Authors: Qunsong Zeng, Jiawei Liu, Mingrui Jiang, Jun Lan, Yi Gong, Zhongrui Wang, Yida Li, Can Li, Jim Ignowski, Kaibin Huang

    Abstract: To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challenges, in-… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2205.03561

  27. arXiv:2307.16332  [pdf

    eess.AS

    Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text

    Authors: Eric Sun, **yu Li, Jian Xue, Yifan Gong

    Abstract: In end-to-end automatic speech recognition system, one of the difficulties for language expansion is the limited paired speech and text training data. In this paper, we propose a novel method to generate augmented samples with unpaired speech feature segments and text data for model pre-training, which has the advantage of low cost without using additional speech data. When mixing 20,000 hours aug… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

  28. arXiv:2307.08234  [pdf, other

    eess.AS

    Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

    Authors: Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong, Ed Lin, Michael Zeng

    Abstract: Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions. Pretrained large language models (LLMs) have the potential to improve the performance of E2E ASR. However, integrating a pretrained language model into an E2E speech recognition model has shown limited benefits due to the mismatches between text-based LL… ▽ More

    Submitted 2 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  29. Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers

    Authors: Yuan Gong, Sameer Khurana, Leonid Karlinsky, James Glass

    Abstract: In this paper, we focus on Whisper, a recent automatic speech recognition model trained with a massive 680k hour labeled speech corpus recorded in diverse conditions. We first show an interesting finding that while Whisper is very robust against real-world background sounds (e.g., music), its audio representation is actually not noise-invariant, but is instead highly correlated to non-speech sound… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted at Interspeech 2023. Code at https://github.com/yuangongnd/whisper-at

    Journal ref: Proceedings of Interspeech 2023

  30. arXiv:2307.00307  [pdf, other

    eess.IV

    Spatio-Temporal Classification of Lung Ventilation Patterns using 3D EIT Images: A General Approach for Individualized Lung Function Evaluation

    Authors: Shuzhe Chen, Li Li, Zhichao Lin, Ke Zhang, Ying Gong, Lu Wang, Xu Wu, Maokun Li, Yuanlin Song, Fan Yang, Shenheng Xu

    Abstract: The Pulmonary Function Test (PFT) is an widely utilized and rigorous classification test for lung function evaluation, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventila… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  31. arXiv:2305.10790  [pdf, other

    eess.AS cs.SD

    Listen, Think, and Understand

    Authors: Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass

    Abstract: The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is crucial for many applications. Although significant progress has been made in this area since the development of AudioSet, most existing models are designed to map audio inputs to pre-defined, discrete sound label sets. In contrast, humans possess the ability to not only classify sounds into general cat… ▽ More

    Submitted 19 February, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at ICLR 2024. Code, dataset, and models are available at https://github.com/YuanGongND/ltu. The interactive demo is at https://huggingface.co/spaces/yuangongfdu/ltu

  32. arXiv:2305.00428  [pdf, ps, other

    cs.IT eess.SP

    STAR-RIS-Aided Mobile Edge Computing: Computation Rate Maximization with Binary Amplitude Coefficients

    Authors: Zhenrong Liu, Zongze Li, Miaowen Wen, Yi Gong, Yik-Chung Wu

    Abstract: In this paper, simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS) is investigated in the multi-user mobile edge computing (MEC) system to improve the computation rate. Compared with traditional RIS-aided MEC, STAR-RIS extends the service coverage from half-space to full-space and provides new flexibility for improving the computation rate for end users. Howe… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

  33. arXiv:2304.07512  [pdf, other

    eess.AS

    Soft Label Coding for End-to-end Sound Source Localization With Ad-hoc Microphone Arrays

    Authors: Linfeng Feng, Yijun Gong, Xiao-Lei Zhang

    Abstract: Recently, an end-to-end two-dimensional sound source localization algorithm with ad-hoc microphone arrays formulates the sound source localization problem as a classification problem. The algorithm divides the target indoor space into a set of local areas, and predicts the local area where the speaker locates. However, the local areas are encoded by one-hot code, which may lose the connections bet… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Comments: 4pages, 2figures, conference

  34. arXiv:2303.10691  [pdf, other

    eess.SP

    Multi-Channel Attentive Feature Fusion for Radio Frequency Fingerprinting

    Authors: Yuan Zeng, Yi Gong, Jiawei Liu, Shangao Lin, Zidong Han, Ruoxiao Cao, Kaibin Huang, Khaled Ben Letaief

    Abstract: Radio frequency fingerprinting (RFF) is a promising device authentication technique for securing the Internet of things. It exploits the intrinsic and unique hardware impairments of the transmitters for RF device identification. In real-world communication systems, hardware impairments across transmitters are subtle, which are difficult to model explicitly. Recently, due to the superior performanc… ▽ More

    Submitted 23 June, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

  35. arXiv:2303.00786  [pdf

    cs.CL eess.AS

    Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

    Authors: Eric Sun, **yu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

    Abstract: We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference. Our method incorporates a gating mechanism and LID loss, enabling transformer experts to learn language-specific information. By combining gated transformer experts with shared transformer layers, we const… ▽ More

    Submitted 7 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  36. arXiv:2302.08525  [pdf, ps, other

    eess.SP

    Computation and Privacy Protection for Satellite-Ground Digital Twin Networks

    Authors: Yongkang Gong, Haipeng Yao Xiaonan Liu, Mehdi Bennis, Arumugam Nallanathan, Zhu Han

    Abstract: Satellite-ground integrated digital twin networks (SGIDTNs) are regarded as innovative network architectures for reducing network congestion, enabling nearly-instant data map** from the physical world to digital systems, and offering ubiquitous intelligence services to terrestrial users. However, the challenges, such as the pricing policy, the stochastic task arrivals, the time-varying satellite… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  37. arXiv:2212.12214  [pdf, ps, other

    eess.SP

    Trainable Proximal Gradient Descent Based Channel Estimation for mmWave Massive MIMO Systems

    Authors: Peicong Zheng, Xuantao Lyu, Yi Gong

    Abstract: In this letter, we address the problem of millimeter-Wave channel estimation in massive MIMO communication systems. Leveraging the sparsity of the mmWave channel in the beamspace, we formulate the estimation problem as a sparse signal recovery problem. To this end, we propose a deep learning based trainable proximal gradient descent network (TPGD-Net). The TPGD-Net unfolds the iterative proximal g… ▽ More

    Submitted 6 March, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

  38. arXiv:2212.00697  [pdf, ps, other

    cs.IT eess.SP

    Simultaneously Transmitting and Reflecting RIS-Aided Mobile Edge Computing: Computation Rate Maximization

    Authors: Zhenrong Liu, Zongze Li, Miaowen Wen, Yi Gong, Yik-Chung Wu

    Abstract: In this paper, the novel simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS), which enables full-space coverage on users located on both sides of the surface, is investigated in the multi-user mobile edge computing (MEC) system. A computation rate maximization problem is formulated via the joint design of the STAR-RIS phase shifts, reflection and transmission… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  39. arXiv:2210.10265  [pdf, other

    eess.AS cs.SD

    Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays

    Authors: Shupei Liu, Linfeng Feng, Yijun Gong, Chengdong Liang, Chen Zhang, Xiao-Lei Zhang, Xuelong Li

    Abstract: While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distri… ▽ More

    Submitted 1 April, 2024; v1 submitted 18 October, 2022; originally announced October 2022.

  40. arXiv:2210.08484  [pdf, other

    eess.AS cs.SD

    End-to-end Two-dimensional Sound Source Localization With Ad-hoc Microphone Arrays

    Authors: Yijun Gong, Shupei Liu, Xiao-Lei Zhang

    Abstract: Conventional sound source localization methods are mostly based on a single microphone array that consists of multiple microphones. They are usually formulated as the estimation of the direction of arrival problem. In this paper, we propose a deep-learning-based end-to-end sound source localization method with ad-hoc microphone arrays, where an ad-hoc microphone array is a set of randomly distribu… ▽ More

    Submitted 16 October, 2022; originally announced October 2022.

    Comments: 6 pages, 4 figures, coference

  41. arXiv:2210.07839  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Contrastive Audio-Visual Masked Autoencoder

    Authors: Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass

    Abstract: In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities. Subsequently, we propose the Contrastive Audio-Visual Masked Auto-Encoder (CAV-MAE) by combining contrastive learning and masked data modeling, two major self-supervised learning frameworks, to learn a joint and coordinated audio-visual representation. Our experiments… ▽ More

    Submitted 11 April, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: Accepted at ICLR 2023 as a notable top 25% paper. Code and pretrained models are at https://github.com/yuangongnd/cav-mae

  42. arXiv:2208.00061  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    UAVM: Towards Unifying Audio and Visual Models

    Authors: Yuan Gong, Alexander H. Liu, Andrew Rouditchenko, James Glass

    Abstract: Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified Audio-Visual Model (UAVM). The UAVM achieves a new state-of-the-art audio-visual event classification accuracy of 65.8% on VGGSound. More interestingly, we also find a few intriguing properties of UAVM that the modality-independent counterparts do… ▽ More

    Submitted 15 February, 2023; v1 submitted 29 July, 2022; originally announced August 2022.

    Comments: Published in Signal Processing Letters. Code at https://github.com/YuanGongND/uavm

    Journal ref: IEEE Signal Processing Letters, vol. 29, pp. 2437-2441, 2022

  43. arXiv:2207.12577  [pdf, other

    cs.CV cs.AR cs.LG eess.IV

    Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

    Authors: Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang

    Abstract: Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resource-limited platforms such as mobile devices. To mitigate this,… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  44. arXiv:2206.13135  [pdf

    cs.CL cs.SD eess.AS

    TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline

    Authors: Chengfei Li, Shuhao Deng, Yao** Wang, Guang**g Wang, Yaguang Gong, Changbin Chen, **feng Bai

    Abstract: This paper introduces a new corpus of Mandarin-English code-switching speech recognition--TALCS corpus, suitable for training and evaluating code-switching speech recognition systems. TALCS corpus is derived from real online one-to-one English teaching scenes in TAL education group, which contains roughly 587 hours of speech sampled at 16 kHz. To our best knowledge, TALCS corpus is the largest wel… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: accepted by INTERSPEECH 2022

  45. arXiv:2205.03561  [pdf

    eess.SP

    Realizing Ultra-Fast and Energy-Efficient Baseband Processing Using Analogue Resistive Switching Memory

    Authors: Qunsong Zeng, Jiawei Liu, Jun Lan, Yi Gong, Zhongrui Wang, Yida Li, Kaibin Huang

    Abstract: To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient (UFEE) baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challeng… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

  46. Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

    Authors: Yuan Gong, ** Yu, James Glass

    Abstract: Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring. However, existing datasets have a relatively small number of vocal sound samples or noisy labels. As a consequence, state-of-the-art audio event classification models may not perform well in detecting human vocal sounds. To support resear… ▽ More

    Submitted 17 June, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

    Comments: Accepted at ICASSP 2022. Dataset and code at https://github.com/YuanGongND/vocalsound Interactive Colab demo at https://colab.research.google.com/github/YuanGongND/vocalsound/blob/main/colab/VocalSound.ipynb

  47. Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

    Authors: Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang, James Glass

    Abstract: Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level). In this work, we explore modeling multi-aspect pronunciation assessment at mu… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: Accepted at ICASSP 2022. Code at https://github.com/YuanGongND/gopt Interactive Colab demo at https://colab.research.google.com/github/YuanGongND/gopt/blob/master/colab/GOPT_GPU.ipynb . ICASSP 2022

  48. arXiv:2205.02194  [pdf, other

    eess.SP

    Intelligent Reflecting Surface Aided Mobile Edge Computing With Binary Offloading: Energy Minimization for IoT Devices

    Authors: Yizhen Yang, Yi Gong, Yik-Chung Wu

    Abstract: Mobile edge computing (MEC) is envisioned as a promising technique to support computation-intensive and timecritical applications in future Internet of Things (IoT) era. However, the uplink transmission performance will be highly impacted by the hostile wireless channel, the low bandwidth, and the low transmission power of IoT devices. Recently, intelligent reflecting surface (IRS) has drawn much… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

  49. arXiv:2204.08958  [pdf, other

    cs.CV eess.IV

    MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment

    Authors: Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, Yujiu Yang

    Abstract: No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception. Unfortunately, existing NR-IQA methods are far from meeting the needs of predicting accurate quality scores on GAN-based distortion images. To this end, we propose Multi-dimension Attention Network for no-reference Image Quality Assessment (MANIQA) to impro… ▽ More

    Submitted 20 April, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

  50. arXiv:2203.07996  [pdf, other

    cs.SD cs.AI cs.CL cs.CV eess.AS

    Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

    Authors: Xichen Pan, Peiyu Chen, Yichen Gong, Helong Zhou, Xinbing Wang, Zhouhan Lin

    Abstract: Training Transformer-based models demands a large amount of data, while obtaining aligned and labelled data in multimodality is rather cost-demanding, especially for audio-visual speech recognition (AVSR). Thus it makes a lot of sense to make use of unlabelled unimodal data. On the other side, although the effectiveness of large-scale self-supervised learning is well established in both audio and… ▽ More

    Submitted 26 March, 2022; v1 submitted 24 February, 2022; originally announced March 2022.

    Comments: ACL2022 Main Conference