Skip to main content

Showing 1–50 of 126 results for author: Huang, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14931  [pdf, other

    eess.SP

    Multi-beam Training for Near-field Communications in High-frequency Bands

    Authors: Cong Zhou, Changsheng You, Zixuan Huang, Shuo Shi, Yi Gong, Chan-Byoung Chae, Kaibin Huang

    Abstract: In this paper, we study efficient multi-beam training design for near-field communications to reduce the beam training overhead of conventional single-beam training methods. In particular, the array-division based multi-beam training method, which is widely used in far-field communications, cannot be directly applied to the near-field scenario, since different sub-arrays may observe different user… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: In this paper, a novel near-field multi-beam training scheme is proposed by sparsely activating a portion of antennas to form a sparse linear array

  2. arXiv:2406.07890  [pdf, other

    eess.AS cs.CL cs.LG

    Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

    Authors: Anfeng Xu, Kevin Huang, Tiantian Feng, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  3. arXiv:2406.05806  [pdf, other

    cs.CL cs.SD eess.AS

    Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper

    Authors: Chih-Kai Yang, Kuan-Po Huang, Hung-yi Lee

    Abstract: This research explores the interaction between Whisper, a high-performing speech recognition model, and information in prompts. Our results unexpectedly show that Whisper may not fully grasp textual prompts as anticipated. Additionally, we find that performance improvement is not guaranteed even with stronger adherence to the topic information in textual prompts. It is also noted that English prom… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: In progress

  4. arXiv:2406.02963  [pdf, other

    cs.SD eess.AS

    Dataset-Distillation Generative Model for Speech Emotion Recognition

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

    Abstract: Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Em… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  5. arXiv:2405.17007  [pdf, other

    eess.SP

    Waveforms for Computing Over the Air

    Authors: Ana Pérez-Neira, Marc Martinez-Gost, Alphan Şahin, Saeed Razavikia, Carlo Fischione, Kaibin Huang

    Abstract: Over-the-air computation (AirComp) leverages the signal-superposition characteristic of wireless multiple access channels to perform mathematical computations. Initially introduced to enhance communication reliability in interference channels and wireless sensor networks, AirComp has more recently found applications in task-oriented communications, namely, for wireless distributed learning and in… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Feature article submitted at the IEEE Signal Processing Magazine

  6. arXiv:2405.16516  [pdf, other

    eess.IV cs.CV

    Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models

    Authors: Kun Huang, Xiao Ma, Yuhan Zhang, Na Su, Songtao Yuan, Yong Liu, Qiang Chen, Huazhu Fu

    Abstract: Optical coherence tomography (OCT) image analysis plays an important role in the field of ophthalmology. Current successful analysis models rely on available large datasets, which can be challenging to be obtained for certain tasks. The use of deep generative models to create realistic data emerges as a promising approach. However, due to limitations in hardware resources, it is still difficulty t… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Provisionally accepted for medical image computing and computer-assisted intervention (MICCAI) 2024

  7. arXiv:2405.08800  [pdf

    eess.SY

    Estimation of Participation Factors for Power System Oscillation from Measurements

    Authors: Tianwei Xia, Zhe Yu, Kai Sun, Di Shi, Kaiyang Huang

    Abstract: In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  8. arXiv:2404.17973  [pdf, other

    cs.IT eess.SP

    Over-the-Air Fusion of Sparse Spatial Features for Integrated Sensing and Edge AI over Broadband Channels

    Authors: Zhiyan Liu, Qiao Lan, Kaibin Huang

    Abstract: The 6G mobile networks are differentiated from 5G by two new usage scenarios - distributed sensing and edge AI. Their natural integration, termed integrated sensing and edge AI (ISEA), promised to create a platform for enabling environment perception to make intelligent decisions and take real-time actions. A basic operation in ISEA is for a fusion center to acquire and fuse features of spatial se… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE for possible publication

  9. Cost-effective company response policy for product co-creation in company-sponsored online community

    Authors: Jiamin Hu, Lu-Xing Yang, Xiaofan Yang, Kaifan Huang, Gang Li, Yong Xiang

    Abstract: Product co-creation based on company-sponsored online community has come to be a paradigm of develo** new products collaboratively with customers. In such a product co-creation campaign, the sponsoring company needs to interact intensively with active community members about the design scheme of the product. We call the collection of the rates of the company's response to active community member… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  10. arXiv:2404.07121  [pdf, other

    cs.IT eess.SP

    Digital Over-the-Air Computation: Achieving High Reliability via Bit-Slicing

    Authors: Jiawei Liu, Yi Gong, Kaibin Huang

    Abstract: 6G mobile networks aim to realize ubiquitous intelligence at the network edge via distributed learning, sensing, and data analytics. Their common operation is to aggregate high-dimensional data, which causes a communication bottleneck that cannot be resolved using traditional orthogonal multi-access schemes. A promising solution, called over-the-air computation (AirComp), exploits channels' wavefo… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  11. arXiv:2404.06806  [pdf, other

    cs.IT eess.SP

    Near-Optimal Channel Estimation for Dense Array Systems

    Authors: Mingyao Cui, Zijian Zhang, Linglong Dai, Kaibin Huang

    Abstract: By deploying a large number of antennas with sub-half-wavelength spacing in a compact space, dense array systems(DASs) can fully unleash the multiplexing-and-diversity gains of limited apertures. To acquire these gains, accurate channel state information acquisition is necessary but challenging due to the large antenna numbers. To overcome this obstacle, this paper reveals that exploiting the high… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 19 pages, 10 figures

  12. arXiv:2403.11693  [pdf, other

    cs.IT eess.SP

    Beamforming Design for Semantic-Bit Coexisting Communication System

    Authors: Maojun Zhang, Guangxu Zhu, Richeng **, Xiaoming Chen, Qingjiang Shi, Caijun Zhong, Kaibin Huang

    Abstract: Semantic communication (SemCom) is emerging as a key technology for future sixth-generation (6G) systems. Unlike traditional bit-level communication (BitCom), SemCom directly optimizes performance at the semantic level, leading to superior communication efficiency. Nevertheless, the task-oriented nature of SemCom renders it challenging to completely replace BitCom. Consequently, it is desired to c… ▽ More

    Submitted 22 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE for possible publication

  13. arXiv:2403.07954  [pdf, other

    cs.LG eess.SP

    Optimizing Polynomial Graph Filters: A Novel Adaptive Krylov Subspace Approach

    Authors: Keke Huang, Wencai Cao, Hoang Ta, Xiaokui Xiao, Pietro Liò

    Abstract: Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization. In… ▽ More

    Submitted 20 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  14. arXiv:2403.07338  [pdf, ps, other

    cs.IT cs.MM eess.SP

    D$^2$-JSCC: Digital Deep Joint Source-channel Coding for Semantic Communications

    Authors: Jianhao Huang, Kai Yuan, Chuan Huang, Kaibin Huang

    Abstract: Semantic communications (SemCom) have emerged as a new paradigm for supporting sixth-generation applications, where semantic features of data are transmitted using artificial intelligence algorithms to attain high communication efficiencies. Most existing SemCom techniques utilize deep neural networks (DNNs) to implement analog source-channel map**s, which are incompatible with existing digital… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  15. arXiv:2401.00273  [pdf, ps, other

    eess.AS cs.CL

    Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision

    Authors: Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee

    Abstract: This work evaluated several cutting-edge large-scale foundation models based on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2, and Whisper-large-v3, on three code-switched corpora. We found that self-supervised models can achieve performances close to the supervised model, indicating the effectiveness of multilingual self-supervised pre-training. We also observed that… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Submitted to ICASSP 2024 Self-supervision in Audio, Speech and Beyond workshop

  16. arXiv:2312.13683  [pdf, other

    eess.SP cs.IT

    Joint Channel Estimation and Cooperative Localization for Near-Field Ultra-Massive MIMO

    Authors: Ruoxiao Cao, Hengtao He, Xianghao Yu, Shenghui Song, Kaibin Huang, Jun Zhang, Yi Gong, Khaled B. Letaief

    Abstract: The next-generation (6G) wireless networks are expected to provide not only seamless and high data-rate communications, but also ubiquitous sensing services. By providing vast spatial degrees of freedom (DoFs), ultra-massive multiple-input multiple-output (UM-MIMO) technology is a key enabler for both sensing and communications in 6G. However, the adoption of UM-MIMO leads to a shift from the far… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Submit to JSAC

  17. arXiv:2312.12611  [pdf

    eess.SY math.DS math.NA

    A Semi-Analytical Approach for State-Space Electromagnetic Transient Simulation Using the Differential Transformation

    Authors: Min Xiong, Kaiyang Huang, Yang Liu, Rui Yao, Kai Sun, Feng Qiu

    Abstract: Electromagnetic transient (EMT) simulation is a crucial tool for power system dynamic analysis because of its detailed component modeling and high simulation accuracy. However, it suffers from computational burdens for large power grids since a tiny time step is typically required for accuracy. This paper proposes an efficient and accurate semi-analytical approach for state-space EMT simulations o… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  18. arXiv:2312.12153  [pdf, other

    cs.SD eess.AS

    Noise robust distillation of self-supervised speech models via correlation metrics

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen

    Abstract: Compared to large speech foundation models, small distilled models exhibit degraded noise robustness. The student's robustness can be improved by introducing noise at the inputs during pre-training. Despite this, using the standard distillation loss still yields a student with degraded performance. Thus, this paper proposes improving student robustness via distillation with correlation metrics. Te… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 6 pages

  19. arXiv:2312.09760  [pdf, other

    eess.AS cs.SD

    U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

    Authors: Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

    Abstract: Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabu… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ASRU2023

  20. arXiv:2312.09576  [pdf, other

    eess.IV cs.CV

    SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

    Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

    Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

  21. arXiv:2312.01644  [pdf

    eess.IV cs.CV

    TMSR: Tiny Multi-path CNNs for Super Resolution

    Authors: Chia-Hung Liu, Tzu-Hsin Hsieh, Kuan-Yu Huang, Pei-Yin Chen

    Abstract: In this paper, we proposed a tiny multi-path CNN-based Super-Resolution (SR) method, called TMSR. We mainly refer to some tiny CNN-based SR methods, under 5k parameters. The main contribution of the proposed method is the improved multi-path learning and self-defined activated function. The experimental results show that TMSR obtains competitive image quality (i.e. PSNR and SSIM) compared to the r… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 5 pages, 7 figures, published in the IEEE Eurasia Conference on IoT, Communication and Engineering proceedings 2023

  22. arXiv:2311.18177  [pdf, other

    cs.LG cs.SI eess.SP

    An Effective Universal Polynomial Basis for Spectral Graph Neural Networks

    Authors: Keke Huang, Pietro Liò

    Abstract: Spectral Graph Neural Networks (GNNs), also referred to as graph filters have gained increasing prevalence for heterophily graphs. Optimal graph filters rely on Laplacian eigendecomposition for Fourier transform. In an attempt to avert the prohibitive computations, numerous polynomial filters by leveraging distinct polynomials have been proposed to approximate the desired graph filters. However, p… ▽ More

    Submitted 5 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  23. arXiv:2311.09850  [pdf, other

    cs.IT eess.SP

    Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation

    Authors: Tianyu Liu, Changsheng You, Zeyang Hu, Chenyu Wu, Yi Gong, Kaibin Huang

    Abstract: Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challen… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 6 pages, 4 figures, accepted for IEEE Global Communication Conference (GLOBECOM) 2023 Workshop

  24. arXiv:2311.09028  [pdf, other

    cs.IT eess.SP

    Integrating Sensing, Communication, and Power Transfer: Multiuser Beamforming Design

    Authors: Ziqin Zhou, Xiaoyang Li, Guangxu Zhu, Jie Xu, Kaibin Huang, Shuguang Cui

    Abstract: In the sixth-generation (6G) networks, massive low-power devices are expected to sense environment and deliver tremendous data. To enhance the radio resource efficiency, the integrated sensing and communication (ISAC) technique exploits the sensing and communication functionalities of signals, while the simultaneous wireless information and power transfer (SWIPT) techniques utilizes the same signa… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: This paper has been submitted to IEEE for possible publication

  25. arXiv:2311.07986  [pdf, other

    cs.IT eess.SP

    On the View-and-Channel Aggregation Gain in Integrated Sensing and Edge AI

    Authors: Xu Chen, Khaled B. Letaief, Kaibin Huang

    Abstract: Sensing and edge artificial intelligence (AI) are two key features of the sixth-generation (6G) mobile networks. Their natural integration, termed Integrated sensing and edge AI (ISEA), is envisioned to automate wide-ranging Internet-of-Tings (IoT) applications. To achieve a high sensing accuracy, multi-view features are uploaded to an edge server for aggregation and inference using an AI model. T… ▽ More

    Submitted 27 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 34 pages, 8 figures

  26. arXiv:2311.02911  [pdf, other

    eess.SP eess.SY

    Goal-Oriented Wireless Communication Resource Allocation for Cyber-Physical Systems

    Authors: Cheng Feng, Kedi Zheng, Yi Wang, Kaibin Huang, Qixin Chen

    Abstract: The proliferation of novel industrial applications at the wireless edge, such as smart grids and vehicle networks, demands the advancement of cyber-physical systems. The performance of CPSs is closely linked to the last-mile wireless communication networks, which often become bottlenecks due to their inherent limited resources. Current CPS operations often treat wireless communication networks as… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Submitted to IEEE ComSoc journal for possible publications. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2310.09466  [pdf, ps, other

    cs.IT eess.SP

    Robust Anti-jamming Communications with DMA-Based Reconfigurable Heterogeneous Array

    Authors: Kaizhi Huang, Wenyu Jiang, Yajun Chen, Liang **, Qingqing Wu, Xiaoling Hu

    Abstract: In the future commercial and military communication systems, anti-jamming remains a critical issue. Existing homogeneous or heterogeneous arrays with a limited degrees of freedom (DoF) and high consumption are unable to meet the requirements of communication in rapidly changing and intense jamming environments. To address these challenges, we propose a reconfigurable heterogeneous array (RHA) arch… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  28. arXiv:2310.09053  [pdf, other

    cs.RO cs.AI eess.SY

    DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control

    Authors: Kevin Huang, Rwik Rana, Alexander Spitzer, Guanya Shi, Byron Boots

    Abstract: Precise arbitrary trajectory tracking for quadrotors is challenging due to unknown nonlinear dynamics, trajectory infeasibility, and actuation limits. To tackle these challenges, we present Deep Adaptive Trajectory Tracking (DATT), a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world. DATT builds o… ▽ More

    Submitted 13 December, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

  29. arXiv:2310.04657  [pdf, other

    eess.AS cs.SD

    Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

    Authors: Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

    Abstract: The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control t… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  30. arXiv:2310.03018  [pdf, other

    eess.AS cs.CL cs.SD

    Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

    Authors: Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-yi Lee

    Abstract: We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech enco… ▽ More

    Submitted 18 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2024 (v2)

  31. arXiv:2310.02971  [pdf, other

    eess.AS cs.CL eess.SP

    Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model

    Authors: Kai-Wei Chang, Ming-Hsin Chen, Yun-** Lin, **g Neng Hsu, Paul Kuo-Ming Huang, Chien-yu Huang, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder… ▽ More

    Submitted 14 November, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted to IEEE ASRU 2023

  32. arXiv:2310.01867  [pdf, other

    eess.AS cs.SD

    Audio-visual child-adult speaker classification in dyadic interactions

    Authors: Anfeng Xu, Kevin Huang, Tiantian Feng, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Interactions involving children span a wide range of important domains from learning to clinical diagnostic and therapeutic contexts. Automated analyses of such interactions are motivated by the need to seek accurate insights and offer scale and robustness across diverse and wide-ranging conditions. Identifying the speech segments belonging to the child is a critical step in such modeling. Convent… ▽ More

    Submitted 9 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: In review for ICASSP 2024, 5 pages

  33. arXiv:2309.17185  [pdf, other

    cs.IT eess.SP

    Meta Reinforcement Learning for Fast Spectrum Sharing in Vehicular Networks

    Authors: Kai Huang, Le Liang, Shi **, Geoffrey Ye Li

    Abstract: In this paper, we investigate the problem of fast spectrum sharing in vehicle-to-everything communication. In order to improve the spectrum efficiency of the whole system, the spectrum of vehicle-to-infrastructure links is reused by vehicle-to-vehicle links. To this end, we model it as a problem of deep reinforcement learning and tackle it with proximal policy optimization. A considerable number o… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: This paper has been accepted by China Communications

  34. arXiv:2309.16937  [pdf, other

    cs.CL cs.SD eess.AS

    SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition

    Authors: Hongfei Xue, Qijie Shao, Kaixun Huang, Peikun Chen, Jie Liu, Lei Xie

    Abstract: Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally. While self-supervised learning (SSL) models, like MMS, have demonstrated their effectiveness in multilingual ASR, it is worth noting that various layers' representations potentially contain distinct information that has not been fully leveraged. In this study, w… ▽ More

    Submitted 27 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures. Accepted by ICME 2024

  35. arXiv:2309.12200  [pdf, other

    eess.SP cs.LG cs.SI

    A Variational Auto-Encoder Enabled Multi-Band Channel Prediction Scheme for Indoor Localization

    Authors: Ruihao Yuan, Kaixuan Huang, Pan Yang, Shunqing Zhang

    Abstract: Indoor localization is getting increasing demands for various cutting-edged technologies, like Virtual/Augmented reality and smart home. Traditional model-based localization suffers from significant computational overhead, so fingerprint localization is getting increasing attention, which needs lower computation cost after the fingerprint database is built. However, the accuracy of indoor localiza… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  36. Slimmed optical neural networks with multiplexed neuron sets and a corresponding backpropagation training algorithm

    Authors: Yi-Feng Liu, Rui-Yao Ren, Dai-Bao Hou, Hai-Zhong Weng, Bo-Wen Wang, Ke-Jie Huang, Xing Lin, Feng Liu, Chen-Hui Li, Chao-Yuan **

    Abstract: Due to their intrinsic capabilities on parallel signal processing, optical neural networks (ONNs) have attracted extensive interests recently as a potential alternative to electronic artificial neural networks (ANNs) with reduced power consumption and low latency. Preliminary confirmation of the parallelism in optical computing has been widely done by applying the technology of wavelength division… ▽ More

    Submitted 13 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

    Journal ref: Liu YF, Ren RY, Hou DB, Weng HZ, Wang BW, Huang KJ, Lin X, Liu F, Li CH, ** CY. Slimmed Optical Neural Networks with Multiplexed Neuron Sets and a Corresponding Backpropagation Training Algorithm. Intell. Comput. 2024;3:Article 0070

  37. arXiv:2308.10009  [pdf, other

    eess.SP

    Realizing In-Memory Baseband Processing for Ultra-Fast and Energy-Efficient 6G

    Authors: Qunsong Zeng, Jiawei Liu, Mingrui Jiang, Jun Lan, Yi Gong, Zhongrui Wang, Yida Li, Can Li, Jim Ignowski, Kaibin Huang

    Abstract: To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challenges, in-… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2205.03561

  38. arXiv:2308.06432  [pdf, other

    eess.IV cs.CV cs.LG

    Learn Single-horizon Disease Evolution for Predictive Generation of Post-therapeutic Neovascular Age-related Macular Degeneration

    Authors: Yuhan Zhang, Kun Huang, Mingchao Li, Songtao Yuan, Qiang Chen

    Abstract: Most of the existing disease prediction methods in the field of medical image processing fall into two classes, namely image-to-category predictions and image-to-parameter predictions. Few works have focused on image-to-image predictions. Different from multi-horizon predictions in other fields, ophthalmologists prefer to show more confidence in single-horizon predictions due to the low tolerance… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  39. arXiv:2308.04598  [pdf, other

    cs.CV eess.IV

    1st Place Solution for CVPR2023 BURST Long Tail and Open World Challenges

    Authors: Kaer Huang

    Abstract: Currently, Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories that contain only a few dozen of categories, lacking the ability to handle diverse objects in real-world videos. As TAO and BURST datasets release, we have the opportunity to research VIS in long-tailed and open-world scenarios. Traditional VIS methods are eva… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  40. Successive Pose Estimation and Beam Tracking for mmWave Vehicular Communication Systems

    Authors: Cen Liu, Guangxu Zhu, Fan Liu, Yuanwei Liu, Kaibin Huang

    Abstract: The millimeter wave (mmWave) radar sensing-aided communications in vehicular mobile communication systems is investigated. To alleviate the beam training overhead under high mobility scenarios, a successive pose estimation and beam tracking (SPEBT) scheme is proposed to facilitate mmWave communications with the assistance of mmWave radar sensing. The proposed SPEBT scheme first resorts to a Fast C… ▽ More

    Submitted 5 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

    Comments: An extended version of a conference submission. 7 pages, 5 figures

    Journal ref: IEEE Global Communications Conference Workshops (GC Wkshps) 2023

  41. arXiv:2307.14350  [pdf, other

    eess.SP math.OC

    Joint Batching and Scheduling for High-Throughput Multiuser Edge AI with Asynchronous Task Arrivals

    Authors: Yihan Cang, Ming Chen, Kaibin Huang

    Abstract: In this paper, we study joint batching and (task) scheduling to maximise the throughput (i.e., the number of completed tasks) under the practical assumptions of heterogeneous task arrivals and deadlines. The design aims to optimise the number of batches, their starting time instants, and the task-batch association that determines batch sizes. The joint optimisation problem is complex due to multip… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  42. arXiv:2307.05362  [pdf, other

    eess.SP cs.LG

    SleepEGAN: A GAN-enhanced Ensemble Deep Learning Model for Imbalanced Classification of Sleep Stages

    Authors: Xuewei Cheng, Ke Huang, Yi Zou, Shujie Ma

    Abstract: Deep neural networks have played an important role in automatic sleep stage classification because of their strong representation and in-model feature transformation abilities. However, class imbalance and individual heterogeneity which typically exist in raw EEG signals of sleep data can significantly affect the classification performance of any machine learning algorithms. To solve these two pro… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 20 pages, 6 figures

  43. arXiv:2306.17580  [pdf, other

    cs.NI eess.SP

    Timely and Massive Communication in 6G: Pragmatics, Learning, and Inference

    Authors: Deniz Gündüz, Federico Chiariotti, Kaibin Huang, Anders E. Kalør, Szymon Kobus, Petar Popovski

    Abstract: 5G has expanded the traditional focus of wireless systems to embrace two new connectivity types: ultra-reliable low latency and massive communication. The technology context at the dawn of 6G is different from the past one for 5G, primarily due to the growing intelligence at the communicating nodes. This has driven the set of relevant communication problems beyond reliable transmission towards sem… ▽ More

    Submitted 26 September, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Submitted for publication to IEEE BITS (revised version preprint)

  44. arXiv:2306.06603  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Task-Oriented Integrated Sensing, Computation and Communication for Wireless Edge AI

    Authors: Hong Xing, Guangxu Zhu, Dongzhu Liu, Haifeng Wen, Kaibin Huang, Kaishun Wu

    Abstract: With the advent of emerging IoT applications such as autonomous driving, digital-twin and metaverse etc. featuring massive data sensing, analyzing and inference as well critical latency in beyond 5G (B5G) networks, edge artificial intelligence (AI) has been proposed to provide high-performance computation of a conventional cloud down to the network edge. Recently, convergence of wireless sensing,… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: 18 pages, 6 figures, submitted for possible journal publication

  45. arXiv:2306.00804  [pdf, other

    cs.SD cs.CL eess.AS

    Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

    Authors: Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie

    Abstract: By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words. However, for real-world voice assistants, always biasing on such personalized words with high prediction scores can significantly degrade the performance of recognizing common words. To address this issue, we propose an adaptive contextual bias… ▽ More

    Submitted 15 August, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

  46. arXiv:2305.12493  [pdf, other

    eess.AS cs.CL cs.SD

    Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

    Authors: Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie

    Abstract: Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context… ▽ More

    Submitted 12 July, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted by interspeech2023

  47. arXiv:2304.14302  [pdf

    physics.app-ph eess.SY physics.optics

    In-memory photonic dot-product engine with electrically programmable weight banks

    Authors: Wen Zhou, Bowei Dong, Nikolaos Farmakidis, Xuan Li, Nathan Youngblood, Kairan Huang, Yuhan He, C. David Wright, Wolfram H. P. Pernice, Harish Bhaskaran

    Abstract: Electronically reprogrammable photonic circuits based on phase-change chalcogenides present an avenue to resolve the von-Neumann bottleneck; however, implementation of such hybrid photonic-electronic processing has not achieved computational success. Here, we achieve this milestone by demonstrating an in-memory photonic-electronic dot-product engine, one that decouples electronic programming of ph… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

  48. arXiv:2303.10691  [pdf, other

    eess.SP

    Multi-Channel Attentive Feature Fusion for Radio Frequency Fingerprinting

    Authors: Yuan Zeng, Yi Gong, Jiawei Liu, Shangao Lin, Zidong Han, Ruoxiao Cao, Kaibin Huang, Khaled Ben Letaief

    Abstract: Radio frequency fingerprinting (RFF) is a promising device authentication technique for securing the Internet of things. It exploits the intrinsic and unique hardware impairments of the transmitters for RF device identification. In real-world communication systems, hardware impairments across transmitters are subtle, which are difficult to model explicitly. Recently, due to the superior performanc… ▽ More

    Submitted 23 June, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

  49. arXiv:2303.00873  [pdf, other

    math.OC eess.SY

    State estimation for control: an approach for output-feedback stochastic MPC

    Authors: Mohammad S. Ramadan, Robert R. Bitmead, Ke Huang

    Abstract: The paper provides a new approach to the determination of a single state value for stochastic output feedback problems using paradigms from Model Predictive Control, particularly the distinction between open-loop and closed-loop control and between deterministic optimal control and stochastic optimal control. The State Selection Algorithm is presented and relies on given dynamics and constraints,… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: 15 pages, 10 figures, 2 algorithms

  50. arXiv:2302.12757  [pdf, other

    eess.AS cs.CL cs.SD

    Ensemble knowledge distillation of self-supervised speech models

    Authors: Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-yi Lee

    Abstract: Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerw… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023