Skip to main content

Showing 1–50 of 78 results for author: Sha, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09589  [pdf, other

    eess.AS

    Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

    Authors: Yiwen Shao, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Daniel Povey, Sanjeev Khudanpur

    Abstract: In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge. Traditional approaches often rely on microphone array configurations and the information of the target speaker's location or voiceprint. This study introduces the Solo Spatial Feature (S… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at Interspeech 2024

  2. arXiv:2406.00492  [pdf, other

    eess.IV cs.CV cs.LG

    SAM-VMNet: Deep Neural Networks For Coronary Angiography Vessel Segmentation

    Authors: Xueying Zeng, Baixiang Huang, Yu Luo, Guangyu Wei, Songyan He, Yushuang Shao

    Abstract: Coronary artery disease (CAD) is one of the most prevalent diseases in the cardiovascular field and one of the major contributors to death worldwide. Computed Tomography Angiography (CTA) images are regarded as the authoritative standard for the diagnosis of coronary artery disease, and by performing vessel segmentation and stenosis detection on CTA images, physicians are able to diagnose coronary… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  3. arXiv:2405.16889  [pdf

    eess.SP

    Extraction of In-Phase and Quadrature Components by Time-Encoding Sampling

    Authors: Y. H. Shao, S. Y. Chen, H. Z. Yang, F. Xi, H. Hong, Z. Liu

    Abstract: Time encoding machine (TEM) is a biologically-inspired scheme to perform signal sampling using timing. In this paper, we study its application to the sampling of bandpass signals. We propose an integrate-and-fire TEM scheme by which the in-phase (I) and quadrature (Q) components are extracted through reconstruction. We design the TEM according to the signal bandwidth and amplitude instead of upper… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 30 pages, 8 figures

  4. arXiv:2405.11493  [pdf, other

    cs.CV cs.IT eess.SP

    Point Cloud Compression with Implicit Neural Representations: A Unified Framework

    Authors: Hongning Ruan, Yulin Shao, Qianqian Yang, Liang Zhao, Dusit Niyato

    Abstract: Point clouds have become increasingly vital across various applications thanks to their ability to realistically depict 3D objects and scenes. Nevertheless, effectively compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we present a pioneering point cloud compression framework capable of handling both geometry and attribute components. Unlike… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 6 Pages, 6 Figures, submitted to IEEE ICCC

  5. arXiv:2405.09698  [pdf, other

    eess.SP

    A Deep Joint Source-Channel Coding Scheme for Hybrid Mobile Multi-hop Networks

    Authors: Chenghong Bian, Yulin Shao, Deniz Gündüz

    Abstract: Efficient data transmission across mobile multi-hop networks that connect edge devices to core servers presents significant challenges, particularly due to the variability in link qualities between wireless and wired segments. This variability necessitates a robust transmission scheme that transcends the limitations of existing deep joint source-channel coding (DeepJSCC) strategies, which often st… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Submitted to possible IEEE journal

  6. arXiv:2405.09552  [pdf, other

    eess.IV cs.AI cs.CV

    ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

    Authors: Jiayi Wang, Yi-An Mao, Xiaoyu Ma, Sicen Guo, Yuting Shao, Xiao Lv, Wenting Han, Mark Christopher, Linda M. Zangwill, Yanlong Bi, Rui Fan

    Abstract: Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose seman… ▽ More

    Submitted 2 June, 2024; v1 submitted 15 April, 2024; originally announced May 2024.

  7. arXiv:2404.16376  [pdf, ps, other

    cs.IT cs.MA eess.SY

    A Hypergraph Approach to Distributed Broadcast

    Authors: Qi Cao, Yulin Shao, Fan Yang

    Abstract: This paper explores the distributed broadcast problem within the context of network communications, a critical challenge in decentralized information dissemination. We put forth a novel hypergraph-based approach to address this issue, focusing on minimizing the number of broadcasts to ensure comprehensive data sharing among all network users. A key contribution of our work is the establishment of… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  8. arXiv:2404.09500  [pdf

    physics.optics eess.IV

    On-chip Real-time Hyperspectral Imager with Full CMOS Resolution Enabled by Massively Parallel Neural Network

    Authors: Junren Wen, Haiqi Gao, Weiming Shi, Shuaibo Feng, Lingyun Hao, Yujie Liu, Liang Xu, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang

    Abstract: Traditional spectral imaging methods are constrained by the time-consuming scanning process, limiting the application in dynamic scenarios. One-shot spectral imaging based on reconstruction has been a hot research topic recently and the primary challenges still lie in both efficient fabrication techniques suitable for mass production and the high-speed, high-accuracy reconstruction algorithm for r… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  9. arXiv:2404.00510  [pdf, other

    cs.CV eess.IV

    Denoising Low-dose Images Using Deep Learning of Time Series Images

    Authors: Yang Shao, Toshie Yaguchi, Toshiaki Tanigaki

    Abstract: Digital image devices have been widely applied in many fields, including scientific imaging, recognition of individuals, and remote sensing. As the application of these imaging technologies to autonomous driving and measurement, image noise generated when observation cannot be performed with a sufficient dose has become a major problem. Machine learning denoise technology is expected to be the sol… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  10. arXiv:2403.13615  [pdf, other

    cs.IT eess.SP

    MIMO Channel as a Neural Function: Implicit Neural Representations for Extreme CSI Compression in Massive MIMO Systems

    Authors: Haotian Wu, Maojun Zhang, Yulin Shao, Krystian Mikolajczyk, Deniz Gündüz

    Abstract: Acquiring and utilizing accurate channel state information (CSI) can significantly improve transmission performance, thereby holding a crucial role in realizing the potential advantages of massive multiple-input multiple-output (MIMO) technology. Current prevailing CSI feedback approaches improve precision by employing advanced deep-learning methods to learn representative CSI features for a subse… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    MSC Class: 94A24 ACM Class: E.4

  11. arXiv:2403.10613  [pdf, other

    eess.SP cs.IT

    Process-and-Forward: Deep Joint Source-Channel Coding Over Cooperative Relay Networks

    Authors: Chenghong Bian, Yulin Shao, Haotian Wu, Emre Ozfatura, Deniz Gunduz

    Abstract: This paper introduces an innovative deep joint source-channel coding (DeepJSCC) approach to image transmission over a cooperative relay channel. The relay either amplifies and forwards a scaled version of its received signal, referred to as DeepJSCC-AF, or leverages neural networks to extract relevant features about the source signal before forwarding it to the destination, which we call DeepJSCC-… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Submitted for possible IEEE journal

  12. arXiv:2403.00321  [pdf, other

    cs.IT cs.LG eess.SP eess.SY

    DEEP-IoT: Downlink-Enhanced Efficient-Power Internet of Things

    Authors: Yulin Shao

    Abstract: At the heart of the Internet of Things (IoT) -- a domain witnessing explosive growth -- the imperative for energy efficiency and the extension of device lifespans has never been more pressing. This paper presents DEEP-IoT, a revolutionary communication paradigm poised to redefine how IoT devices communicate. Through a pioneering "listen more, transmit less" strategy, DEEP-IoT challenges and transf… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  13. arXiv:2401.05182  [pdf, other

    cs.IT eess.SP

    Integrated Sensing and Communication with Reconfigurable Distributed Antenna and Reflecting Surface: Joint Beamforming and Mode Selection

    Authors: **** Zhang, **tao Wang, Yulin Shao, Shaodan Ma

    Abstract: This paper presents a new integrated sensing and communication (ISAC) framework, leveraging the recent advancements of reconfigurable distributed antenna and reflecting surface (RDARS). RDARS is a programmable surface structure comprising numerous elements, each of which can be flexibly configured to operate either in a reflection mode, resembling a passive reconfigurable intelligent surface (RIS)… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 13 pages, 9 figures

  14. arXiv:2401.00658  [pdf, other

    cs.IT cs.LG cs.MM eess.SP

    Point Cloud in the Air

    Authors: Yulin Shao, Chenghong Bian, Li Yang, Qianqian Yang, Zhaoyang Zhang, Deniz Gunduz

    Abstract: Acquisition and processing of point clouds (PCs) is a crucial enabler for many emerging applications reliant on 3D spatial data, such as robot navigation, autonomous vehicles, and augmented reality. In most scenarios, PCs acquired by remote sensors must be transmitted to an edge server for fusion, segmentation, or inference. Wireless transmission of PCs not only puts on increased burden on the alr… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  15. arXiv:2311.14780  [pdf, other

    eess.IV physics.optics

    Wavelength-multiplexed Multi-mode EUV Reflection Ptychography based on Automatic-Differentiation

    Authors: Yifeng Shao, Sven Weerdenburg, Jacob Seifert, H. Paul Urbach, Allard P. Mosk, Wim Coene

    Abstract: Ptychographic extreme ultraviolet (EUV) diffractive imaging has emerged as a promising candidate for the next-generation metrology solutions in the semiconductor industry, as it can image wafer samples in reflection geometry at the nanoscale. This technique has surged attention recently, owing to the significant progress in high-harmonic generation (HHG) EUV sources and advancements in both hardwa… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  16. arXiv:2311.07580  [pdf, other

    eess.IV physics.comp-ph physics.optics

    Noise-robust latent vector reconstruction in ptychography using deep generative models

    Authors: Jacob Seifert, Yifeng Shao, Allard P. Mosk

    Abstract: Computational imaging is increasingly vital for a broad spectrum of applications, ranging from biological to material sciences. This includes applications where the object is known and sufficiently sparse, allowing it to be described with a reduced number of parameters. When no explicit parameterization is available, a deep generative model can be trained to represent an object in a low-dimensiona… ▽ More

    Submitted 14 January, 2024; v1 submitted 18 October, 2023; originally announced November 2023.

    Journal ref: Opt. Express 32(1), 1020-1033 (2024)

  17. arXiv:2311.07028  [pdf, other

    eess.SP

    A Hybrid Joint Source-Channel Coding Scheme for Mobile Multi-hop Networks

    Authors: Chenghong Bian, Yulin Shao, Deniz Gunduz

    Abstract: We propose a novel hybrid joint source-channel coding (JSCC) scheme for robust image transmission over multi-hop networks. In the considered scenario, a mobile user wants to deliver an image to its destination over a mobile cellular network. We assume a practical setting, where the links between the nodes belonging to the mobile core network are stable and of high quality, while the link between t… ▽ More

    Submitted 7 February, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

    Comments: Accepted to IEEE International Conference on Communications (ICC), 2024. Source code will be released soon

  18. arXiv:2311.00146  [pdf, other

    eess.AS cs.AI

    RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios

    Authors: Yiwen Shao, Shi-Xiong Zhang, Dong Yu

    Abstract: Automatic speech recognition (ASR) on multi-talker recordings is challenging. Current methods using 3D spatial data from multi-channel audio and visual cues focus mainly on direct waves from the target speaker, overlooking reflection wave impacts, which hinders performance in reverberant environments. Our research introduces RIR-SF, a novel spatial feature based on room impulse response (RIR) that… ▽ More

    Submitted 11 June, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: Accepted for presentation at Interspeech 2024

  19. arXiv:2310.16367  [pdf, other

    eess.AS

    UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

    Authors: Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu

    Abstract: The speech field is evolving to solve more challenging scenarios, such as multi-channel recordings with multiple simultaneous talkers. Given the many types of microphone setups out there, we present the UniX-Encoder. It's a universal encoder designed for multiple tasks, and worked with any microphone array, in both solo and multi-talker environments. Our research enhances previous multi-channel sp… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  20. arXiv:2310.03901  [pdf, other

    eess.AS cs.SD

    Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset

    Authors: Yiwen Shao

    Abstract: Multi-channel multi-talker speech recognition presents formidable challenges in the realm of speech processing, marked by issues such as background noise, reverberation, and overlap** speech. Overcoming these complexities requires leveraging contextual cues to separate target speech from a cacophonous mix, enabling accurate recognition. Among these cues, the 3D spatial feature has emerged as a c… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  21. arXiv:2309.00470  [pdf, other

    cs.IT eess.IV

    Deep Joint Source-Channel Coding for Adaptive Image Transmission over MIMO Channels

    Authors: Haotian Wu, Yulin Shao, Chenghong Bian, Krystian Mikolajczyk, Deniz Gündüz

    Abstract: This paper introduces a vision transformer (ViT)-based deep joint source and channel coding (DeepJSCC) scheme for wireless image transmission over multiple-input multiple-output (MIMO) channels, denoted as DeepJSCC-MIMO. We consider DeepJSCC-MIMO for adaptive image transmission in both open-loop and closed-loop MIMO systems. The novel DeepJSCC-MIMO architecture surpasses the classical separation-b… ▽ More

    Submitted 7 May, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: text overlap with arXiv:2210.15347

    MSC Class: 94A24 ACM Class: E.4

  22. arXiv:2308.02436  [pdf, other

    eess.IV physics.comp-ph physics.optics

    Maximum-likelihood estimation in ptychography in the presence of Poisson-Gaussian noise statistics

    Authors: Jacob Seifert, Yifeng Shao, Rens van Dam, Dorian Bouchet, Tristan van Leeuwen, Allard P. Mosk

    Abstract: Optical measurements often exhibit mixed Poisson-Gaussian noise statistics, which hampers image quality, particularly under low signal-to-noise ratio (SNR) conditions. Computational imaging falls short in such situations when solely Poissonian noise statistics are assumed. In response to this challenge, we define a loss function that explicitly incorporates this mixed noise nature. By using maximu… ▽ More

    Submitted 11 October, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: Contains main and supplementary documents

    Journal ref: Opt. Lett. 48, 6027-6030 (2023)

  23. arXiv:2307.07319  [pdf, other

    eess.SP

    The Power of Large Language Models for Wireless Communication System Development: A Case Study on FPGA Platforms

    Authors: Yuyang Du, Soung Chang Liew, Kexin Chen, Yulin Shao

    Abstract: Large language models (LLMs) have garnered significant attention across various research disciplines, including the wireless communication community. There have been several heated discussions on the intersection of LLMs and wireless technologies. While recent studies have demonstrated the ability of LLMs to generate hardware description language (HDL) code for simple computation tasks, develo**… ▽ More

    Submitted 7 November, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

  24. arXiv:2306.09101  [pdf, other

    cs.IT eess.SP

    Transformer-aided Wireless Image Transmission with Channel Feedback

    Authors: Haotian Wu, Yulin Shao, Emre Ozfatura, Krystian Mikolajczyk, Deniz Gündüz

    Abstract: This paper presents a novel wireless image transmission paradigm that can exploit feedback from the receiver, called DeepJSCC-ViT-f. We consider a block feedback channel model, where the transmitter receives noiseless/noisy channel output feedback after each block. The proposed scheme employs a single encoder to facilitate transmission over multiple blocks, refining the receiver's estimation at ea… ▽ More

    Submitted 14 February, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    MSC Class: 94A24 ACM Class: E.4

  25. arXiv:2306.09014  [pdf, other

    eess.IV

    Geometric Wide-Angle Camera Calibration: A Review and Comparative Study

    Authors: Jianzhu Huai, Yuan Zhuang, Yuxin Shao, Grzegorz Jozkow, Binliang Wang, Yijia He, Alper Yilmaz

    Abstract: Wide-angle cameras are widely used in photogrammetry and autonomous systems which rely on the accurate metric measurements derived from images. To find the geometric relationship between incoming rays and image pixels, geometric camera calibration (GCC) has been actively developed. Aiming to provide practical calibration guidelines, this work surveys the existing GCC tools and evaluates the repres… ▽ More

    Submitted 27 March, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 18 pages, 12 figures

  26. arXiv:2306.08730  [pdf, other

    eess.SP cs.MM

    Wireless Point Cloud Transmission

    Authors: Chenghong Bian, Yulin Shao, Deniz Gunduz

    Abstract: 3D point cloud is a three-dimensional data format generated by LiDARs and depth sensors, and is being increasingly used in a large variety of applications. This paper presents a novel solution called SEmantic Point cloud Transmission (SEPT), for the transmission of point clouds over wireless channels with limited bandwidth. At the transmitter, SEPT encodes the point cloud via an iterative downsamp… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: 7 pages

  27. arXiv:2305.13161  [pdf, ps, other

    eess.SP cs.IT

    DeepJSCC-l++: Robust and Bandwidth-Adaptive Wireless Image Transmission

    Authors: Chenghong Bian, Yulin Shao, Deniz Gunduz

    Abstract: This paper presents a novel vision transformer (ViT) based deep joint source channel coding (DeepJSCC) scheme, dubbed DeepJSCC-l++, which can be adaptive to multiple target bandwidth ratios as well as different channel signal-to-noise ratios (SNRs) using a single model. To achieve this, we train the proposed DeepJSCC-l++ model with different bandwidth ratios and SNRs, which are fed to the model as… ▽ More

    Submitted 30 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to IEEE Global Communications Conference 2023. Code available at https://github.com/aprilbian/deepjscc-lplusplus

  28. arXiv:2305.11651  [pdf, other

    cs.IT cs.MA cs.PF eess.SY

    Channel Cycle Time: A New Measure of Short-term Fairness

    Authors: Pengfei Shen, Yulin Shao, Haoyuan Pan, Lu Lu, Yonina C. Eldar

    Abstract: This paper puts forth a new metric, dubbed channel cycle time (CCT), to measure the short-term fairness of communication networks. CCT characterizes the average duration between two consecutive successful transmissions of a user, during which all other users successfully accessed the channel at least once. In contrast to existing short-term fairness measures, CCT provides more comprehensive insigh… ▽ More

    Submitted 14 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  29. arXiv:2305.04395  [pdf, other

    cs.IT eess.SP

    Optical Integrated Sensing and Communication

    Authors: Runxin Zhang, Yulin Shao, Menghan Li, Lu Lu

    Abstract: This paper explores a new paradigm of optical integrated sensing and communication (O-ISAC). Our investigation reveals that optical communication and optical sensing are two inherently complementary technologies. On the one hand, optical communication provides the necessary illumination for optical sensing. On the other hand, optical sensing provides environmental information for optical communica… ▽ More

    Submitted 23 May, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

  30. arXiv:2212.06596  [pdf, other

    cs.IT eess.SP

    Broadband Digital Over-the-Air Computation for Wireless Federated Edge Learning

    Authors: Lizhao You, Xinbo Zhao, Rui Cao, Yulin Shao, Liqun Fu

    Abstract: This paper presents the first orthogonal frequency-division multiplexing(OFDM)-based digital over-the-air computation (AirComp) system for wireless federated edge learning, where multiple edge devices transmit model data simultaneously using non-orthogonal OFDM subcarriers, and the edge server aggregates data directly from the superimposed signal. Existing analog AirComp systems often assume perfe… ▽ More

    Submitted 5 July, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: 20 pages. arXiv admin note: text overlap with arXiv:2111.10508

  31. arXiv:2211.06705  [pdf, ps, other

    cs.IT eess.SP

    Deep Joint Source-Channel Coding Over Cooperative Relay Networks

    Authors: Chenghong Bian, Yulin Shao, Haotian Wu, Deniz Gunduz

    Abstract: This paper presents a novel deep joint source-channel coding (DeepJSCC) scheme for image transmission over a half-duplex cooperative relay channel. Specifically, we apply DeepJSCC to two basic modes of cooperative communications, namely amplify-and-forward (AF) and decode-and-forward (DF). In DeepJSCC-AF, the relay simply amplifies and forwards its received signal. In DeepJSCC-DF, on the other han… ▽ More

    Submitted 18 March, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN) 2024, code available via this link https://github.com/aprilbian/Relay_JSCC

  32. arXiv:2211.01730  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    Feedback is Good, Active Feedback is Better: Block Attention Active Feedback Codes

    Authors: Emre Ozfatura, Yulin Shao, Amin Ghazanfari, Alberto Perotti, Branislav Popovic, Deniz Gunduz

    Abstract: Deep neural network (DNN)-assisted channel coding designs, such as low-complexity neural decoders for existing codes, or end-to-end neural-network-based auto-encoder designs are gaining interest recently due to their improved performance and flexibility; particularly for communication scenarios in which high-performing structured code designs do not exist. Communication in the presence of feedback… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  33. arXiv:2210.16985  [pdf, other

    eess.SP cs.IT

    Space-time design for deep joint source channel coding of images Over MIMO channels

    Authors: Chenghong Bian, Yulin Shao, Haotian Wu, Deniz Gunduz

    Abstract: We propose novel deep joint source-channel coding (DeepJSCC) algorithms for wireless image transmission over multi-input multi-output (MIMO) Rayleigh fading channels, when channel state information (CSI) is available only at the receiver. We consider two different schemes; one exploiting the spatial diversity and the other exploiting the spatial multiplexing gain of the MIMO channel, respectively.… ▽ More

    Submitted 20 June, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

    Comments: Accepted to SPAWC 2023, 5 pages

  34. arXiv:2210.15347  [pdf, other

    cs.IT eess.IV

    Vision Transformer for Adaptive Image Transmission over MIMO Channels

    Authors: Haotian Wu, Yulin Shao, Chenghong Bian, Krystian Mikolajczyk, Deniz Gündüz

    Abstract: This paper presents a vision transformer (ViT) based joint source and channel coding (JSCC) scheme for wireless image transmission over multiple-input multiple-output (MIMO) systems, called ViT-MIMO. The proposed ViT-MIMO architecture, in addition to outperforming separation-based benchmarks, can flexibly adapt to different channel conditions without requiring retraining. Specifically, exploiting… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    MSC Class: 94A24 ACM Class: E.4

  35. arXiv:2208.08342  [pdf, other

    cs.IT cs.LG eess.IV eess.SP

    Semantic Communications with Discrete-time Analog Transmission: A PAPR Perspective

    Authors: Yulin Shao, Deniz Gunduz

    Abstract: Recent progress in deep learning (DL)-based joint source-channel coding (DeepJSCC) has led to a new paradigm of semantic communications. Two salient features of DeepJSCC-based semantic communications are the exploitation of semantic-aware features directly from the source signal, and the discrete-time analog transmission (DTAT) of these features. Compared with traditional digital communications, s… ▽ More

    Submitted 29 December, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Keywords: semantic communication, DeepJSCC, discrete-time analog transmission, PAPR

  36. arXiv:2207.14631  [pdf, other

    cs.IT eess.SP

    Phase Code Discovery for Pulse Compression Radar: A Genetic Algorithm Approach

    Authors: Xinyan Xie, Runxin Zhang, Yulin Shao, Lu Lu

    Abstract: Discovering sequences with desired properties has long been an interesting intellectual pursuit. In pulse compression radar (PCR), discovering phase codes with low aperiodic autocorrelations is essential for a good estimation performance. The design of phase code, however, is mathematically non-trivial as the aperiodic autocorrelation properties of a sequence are intractable to characterize. In th… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

    Comments: Keywords: Genetic algorithm, pulse compression radar, phase code, mismatched receiver, signal-to-clutter ratio

  37. arXiv:2207.06309  [pdf, other

    cs.IT cs.NI eess.SY

    Dynamic gNodeB Sleep Control for Energy-Conserving 5G Radio Access Network

    Authors: Pengfei Shen, Yulin Shao, Qi Cao, Lu Lu

    Abstract: 5G radio access network (RAN) is consuming much more energy than legacy RAN due to the denser deployments of gNodeBs (gNBs) and higher single-gNB power consumption. In an effort to achieve an energy-conserving RAN, this paper develops a dynamic on-off switching paradigm, where the ON/OFF states of gNBs can be dynamically configured according to the evolvements of the associated users. We formulate… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: Keywords: Base station sleep control, 5G, radio access network, Markov decision process, greedy policy, index policy

  38. arXiv:2207.03605  [pdf, other

    cs.LG cs.IT cs.MA eess.SY

    Learning-based Autonomous Channel Access in the Presence of Hidden Terminals

    Authors: Yulin Shao, Yucheng Cai, Taotao Wang, Ziyang Guo, Peng Liu, Jiajun Luo, Deniz Gunduz

    Abstract: We consider the problem of autonomous channel access (AutoCA), where a group of terminals tries to discover a communication strategy with an access point (AP) via a common wireless channel in a distributed fashion. Due to the irregular topology and the limited communication range of terminals, a practical challenge for AutoCA is the hidden terminal problem, which is notorious in wireless networks… ▽ More

    Submitted 2 December, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Keywords: multiple channel access, hidden terminal, multi-agent deep reinforcement learning, Wi-Fi, proximal policy optimization

  39. arXiv:2206.09457  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    All you need is feedback: Communication with block attention feedback codes

    Authors: Emre Ozfatura, Yulin Shao, Alberto Perotti, Branislav Popovic, Deniz Gunduz

    Abstract: Deep learning based channel code designs have recently gained interest as an alternative to conventional coding algorithms, particularly for channels for which existing codes do not provide effective solutions. Communication over a feedback channel is one such problem, for which promising results have recently been obtained by employing various deep learning architectures. In this paper, we introd… ▽ More

    Submitted 5 October, 2022; v1 submitted 19 June, 2022; originally announced June 2022.

  40. arXiv:2206.08864  [pdf, other

    cs.LG cs.MM cs.SD eess.AS

    Avoid Overfitting User Specific Information in Federated Keyword Spotting

    Authors: Xin-Chun Li, **-Lin Tang, Shaoming Song, Bingshuai Li, Yinchuan Li, Yunfeng Shao, Le Gan, De-Chuan Zhan

    Abstract: Keyword spotting (KWS) aims to discriminate a specific wake-up word from other signals precisely and efficiently for different users. Recent works utilize various deep networks to train KWS models with all users' speech data centralized without considering data privacy. Federated KWS (FedKWS) could serve as a solution without directly sharing users' data. However, the small amount of data, differe… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022

  41. Channel-Adaptive Wireless Image Transmission with OFDM

    Authors: Haotian Wu, Yulin Shao, Krystian Mikolajczyk, Deniz Gündüz

    Abstract: We present a learning-based channel-adaptive joint source and channel coding (CA-JSCC) scheme for wireless image transmission over multipath fading channels. The proposed method is an end-to-end autoencoder architecture with a dual-attention mechanism employing orthogonal frequency division multiplexing (OFDM) transmission. Unlike the previous works, our approach is adaptive to channel-gain and no… ▽ More

    Submitted 8 September, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: IEEE Wireless Communications Letters

    MSC Class: 94A24 ACM Class: E.4

  42. arXiv:2204.03851  [pdf, other

    eess.AS cs.CR cs.SD

    Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser

    Authors: Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak

    Abstract: Adversarial attacks are a threat to automatic speech recognition (ASR) systems, and it becomes imperative to propose defenses to protect them. In this paper, we perform experiments to show that K2 conformer hybrid ASR is strongly affected by white-box adversarial attacks. We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint mod… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  43. arXiv:2203.09932  [pdf, other

    eess.SP

    Efficient FFT Computation in IFDMA Transceivers

    Authors: Yuyang Du, Soung Chang Liew, Yulin Shao

    Abstract: Interleaved Frequency Division Multiple Access (IFDMA) has the salient advantage of lower Peak-to-Average Power Ratio (PAPR) than its competitors like Orthogonal FDMA (OFDMA). A recent research effort put forth a new IFDMA transceiver design significantly less complex than conventional IFDMA transceivers. The new IFDMA transceiver design reduces the complexity by exploiting a certain correspondenc… ▽ More

    Submitted 5 March, 2022; originally announced March 2022.

  44. arXiv:2203.01429   

    cs.SD eess.AS

    SMTNet: Hierarchical cavitation intensity recognition based on sub-main transfer network

    Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

    Abstract: With the rapid development of smart manufacturing, data-driven machinery health management has been of growing attention. In situations where some classes are more difficult to be distinguished compared to others and where classes might be organised in a hierarchy of categories, current DL methods can not work well. In this study, a novel hierarchical cavitation intensity recognition framework usi… ▽ More

    Submitted 12 July, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: we need update this paper

  45. A multi-task learning for cavitation detection and cavitation intensity recognition of valve acoustic signals

    Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

    Abstract: With the rapid development of smart manufacturing, data-driven machinery health management has received a growing attention. As one of the most popular methods in machinery health management, deep learning (DL) has achieved remarkable successes. However, due to the issues of limited samples and poor separability of different cavitation states of acoustic signals, which greatly hinder the eventual… ▽ More

    Submitted 20 April, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2202.13226

    Journal ref: Engineering Applications of Artificial Intelligence, 113 (2022), 104904

  46. Regional-Local Adversarially Learned One-Class Classifier Anomalous Sound Detection in Global Long-Term Space

    Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

    Abstract: Anomalous sound detection (ASD) is one of the most significant tasks of mechanical equipment monitoring and maintaining in complex industrial systems. In practice, it is vital to precisely identify abnormal status of the working mechanical system, which can further facilitate the failure troubleshooting. In this paper, we propose a multi-pattern adversarial learning one-class classification framew… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

    Journal ref: KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2022

  47. An acoustic signal cavitation detection framework based on XGBoost with adaptive selection feature engineering

    Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

    Abstract: Valves are widely used in industrial and domestic pipeline systems. However, during their operation, they may suffer from the occurrence of the cavitation, which can cause loud noise, vibration and damage to the internal components of the valve. Therefore, monitoring the flow status inside valves is significantly beneficial to prevent the additional cost induced by cavitation. In this paper, a nov… ▽ More

    Submitted 1 March, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Journal ref: Measurement 192 (2022), 110897

  48. arXiv:2202.03433  [pdf, other

    eess.IV cs.CV

    A Coarse-to-fine Morphological Approach With Knowledge-based Rules and Self-adapting Correction for Lung Nodules Segmentation

    Authors: Xinliang Fu, Jiayin Zheng, Juanyun Mai, Yanbo Shao, Minghao Wang, Linyu Li, Zhaoqi Diao, Yulong Chen, Jianyu Xiao, Jian You, Airu Yin, Yang Yang, Xiangcheng Qiu, **sheng Tao, Bo Wang, Hua Ji

    Abstract: The segmentation module which precisely outlines the nodules is a crucial step in a computer-aided diagnosis(CAD) system. The most challenging part of such a module is how to achieve high accuracy of the segmentation, especially for the juxtapleural, non-solid and small nodules. In this research, we present a coarse-to-fine methodology that greatly improves the thresholding method performance with… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  49. arXiv:2201.13392   

    eess.IV cs.CV

    MHSnet: Multi-head and Spatial Attention Network with False-Positive Reduction for Pulmonary Nodules Detection

    Authors: Juanyun Mai, Minghao Wang, Jiayin Zheng, Yanbo Shao, Zhaoqi Diao, Xinliang Fu, Yulong Chen, Jianyu Xiao, Jian You, Airu Yin, Yang Yang, Xiangcheng Qiu, **sheng Tao, Bo Wang, Hua Ji

    Abstract: The mortality of lung cancer has ranked high among cancers for many years. Early detection of lung cancer is critical for disease prevention, cure, and mortality rate reduction. However, existing detection methods on pulmonary nodules introduce an excessive number of false positive proposals in order to achieve high sensitivity, which is not practical in clinical situations. In this paper, we prop… ▽ More

    Submitted 12 May, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: We have to revise the experiment results and conclusions

  50. arXiv:2112.01766  [pdf, other

    cs.CV eess.IV

    Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior

    Authors: Feng Zhang, Yuanjie Shao, Yishi Sun, Kai Zhu, Changxin Gao, Nong Sang

    Abstract: Deep learning-based methods for low-light image enhancement typically require enormous paired training data, which are impractical to capture in real-world scenarios. Recently, unsupervised approaches have been explored to eliminate the reliance on paired training data. However, they perform erratically in diverse real-world scenarios due to the absence of priors. To address this issue, we propose… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: submitted to IEEE Transactions on Image Processing