Skip to main content

Showing 1–50 of 121 results for author: Yang, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00614  [pdf, other

    cs.RO cs.CV eess.IV

    Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Gras** in Dexterous Robotics

    Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

    Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool gras** remains unresolved. To address this, we pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

  2. arXiv:2406.14067  [pdf

    physics.optics eess.SP

    A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

    Authors: Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding, Shiyang Liu, Fangyi Yang, Chi Jiang, Yang Chen

    Abstract: In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 12 figures, 1 table

  3. arXiv:2406.05681  [pdf, other

    cs.SD eess.AS

    Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling

    Authors: Yuepeng Jiang, Tao Li, Fengyu Yang, Lei Xie, Meng Meng, Yujun Wang

    Abstract: Recent research in zero-shot speech synthesis has made significant progress in speaker similarity. However, current efforts focus on timbre generalization rather than prosody modeling, which results in limited naturalness and expressiveness. To address this, we introduce a novel speech synthesis model trained on large-scale datasets, including both timbre and hierarchical prosody modeling. As timb… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, accepted by Interspeech2024

  4. arXiv:2405.18844  [pdf, other

    cs.IT eess.SP

    Optical IRS for Visible Light Communication: Modeling, Design, and Open Issues

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) offers a new and effective approach to resolving the line-of-sight blockage issue in visible light communication (VLC) by enabling redirection of light to bypass obstacles, thereby dramatically enhancing indoor VLC coverage and reliability. This article provides a comprehensive overview of OIRS for VLC, including channel modeling, design techniques, an… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  5. arXiv:2405.02873  [pdf, other

    eess.SP

    Target Localization with Macro and Micro Base Stations Cooperative Sensing

    Authors: Haotian Liu, Zhiqing Wei, Furong Yang, Huici Wu, Kaifeng Han, Zhiyong Feng

    Abstract: Addressing the communication and sensing demands of sixth-generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered traction in academia and industry. With the sensing limitation of single base station (BS), multi-BS cooperative sensing is regarded as a promising solution. The coexistence and overlapped coverage of macro BS (MBS) and micro BS (MiBS) are… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 7 pages 6 figures, submitted to 2024 IEEE GLOBECOM

  6. arXiv:2405.02066  [pdf, other

    cs.CV eess.IV

    WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights

    Authors: Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim

    Abstract: The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representat… ▽ More

    Submitted 27 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  7. arXiv:2404.16376  [pdf, ps, other

    cs.IT cs.MA eess.SY

    A Hypergraph Approach to Distributed Broadcast

    Authors: Qi Cao, Yulin Shao, Fan Yang

    Abstract: This paper explores the distributed broadcast problem within the context of network communications, a critical challenge in decentralized information dissemination. We put forth a novel hypergraph-based approach to address this issue, focusing on minimizing the number of broadcasts to ensure comprehensive data sharing among all network users. A key contribution of our work is the establishment of… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  8. arXiv:2404.14778  [pdf, other

    cs.IT eess.SP

    Channel Estimation for Optical Intelligent Reflecting Surface-Assisted VLC System: A Joint Space-Time Sampling Approach

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) has attracted increasing attention due to its capability of overcoming signal blockages in visible light communication (VLC), an emerging technology for the next-generation advanced transceivers. However, current works on OIRS predominantly assume known channel state information (CSI), which is essential to practical OIRS configuration. To bridge such… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  9. arXiv:2404.14706  [pdf, other

    cs.IT eess.SP

    Channel Estimation for Optical IRS-Assisted VLC System via Spatial Coherence

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) has been considered a promising technology for visible light communication (VLC) by constructing visual line-of-sight propagation paths to address the signal blockage issue. However, the existing works on OIRSs are mostly based on perfect channel state information (CSI), whose acquisition appears to be challenging due to the passive nature of the OIRS.… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  10. arXiv:2403.01598  [pdf, other

    eess.IV cs.AI cs.CV

    APISR: Anime Production Inspired Real-World Anime Super-Resolution

    Authors: Boyang Wang, Fengyu Yang, Xihang Yu, Chao Zhang, Hanbin Zhao

    Abstract: While real-world anime super-resolution (SR) has gained increasing attention in the SR community, existing methods still adopt techniques from the photorealistic domain. In this paper, we analyze the anime production workflow and rethink how to use characteristics of it for the sake of the real-world anime SR. First, we argue that video networks and datasets are not necessary for anime SR due to t… ▽ More

    Submitted 4 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  11. arXiv:2402.08093  [pdf, other

    cs.LG cs.CL eess.AS

    BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

    Authors: Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman

    Abstract: We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data, achieving a new state-of-the-art in speech naturalness. It deploys a 1-billion-parameter autoregressive Transformer that converts ra… ▽ More

    Submitted 15 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: v1.1 (fixed typos)

  12. arXiv:2402.00320  [pdf

    eess.IV

    DARCS: Memory-Efficient Deep Compressed Sensing Reconstruction for Acceleration of 3D Whole-Heart Coronary MR Angiography

    Authors: Zhihao Xue, Fan Yang, Juan Gao, Zhuo Chen, Hao Peng, Chao Zou, Hang **, Chenxi Hu

    Abstract: Three-dimensional coronary magnetic resonance angiography (CMRA) demands reconstruction algorithms that can significantly suppress the artifacts from a heavily undersampled acquisition. While unrolling-based deep reconstruction methods have achieved state-of-the-art performance on 2D image reconstruction, their application to 3D reconstruction is hindered by the large amount of memory needed to tr… ▽ More

    Submitted 2 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: 10 pages, 8 figures

  13. arXiv:2401.11961  [pdf, other

    eess.SY

    Enhancing Safety in Nonlinear Systems: Design and Stability Analysis of Adaptive Cruise Control

    Authors: Fan Yang, Haoqi Li, Maolong Lv, Jiang** Hu, Qingrui Zhou, Bijoy K. Ghosh

    Abstract: The safety of autonomous driving systems, particularly self-driving vehicles, remains of paramount concern. These systems exhibit affine nonlinear dynamics and face the challenge of executing predefined control tasks while adhering to state and input constraints to mitigate risks. However, achieving safety control within the framework of control input constraints, such as collision avoidance and m… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 11pages,9figures

  14. arXiv:2401.09552  [pdf, other

    physics.app-ph eess.SP

    Centralized active reconfigurable intelligent surface: Architecture, path loss analysis and experimental verification

    Authors: Changhao Liu, Fan Yang, Shenheng Xu, Yezhen Li, Maokun Li

    Abstract: Reconfigurable intelligent surfaces (RISs) are promising candidate for the 6G communication. Recently, active RIS has been proposed to compensate the multiplicative fading effect inherent in passive RISs. However, conventional distributed active RISs, with at least one amplifier per element, are costly, complex, and power-intensive. To address these challenges, this paper proposes a novel architec… ▽ More

    Submitted 18 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  15. arXiv:2401.02961  [pdf, other

    cs.LG cs.CV eess.IV physics.optics

    A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design

    Authors: Manna Dai, Yang Jiang, Feng Yang, Joyjit Chattoraj, Yingzhi Xia, Xinxing Xu, Weijiang Zhao, My Ha Dao, Yong Liu

    Abstract: Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that… ▽ More

    Submitted 18 October, 2023; originally announced January 2024.

  16. arXiv:2312.16422  [pdf, other

    eess.AS cs.SD

    Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

    Authors: **bo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, Jun Yang

    Abstract: Environment shifts and conflicts present significant challenges for learning-based sound event localization and detection (SELD) methods. SELD systems, when trained in particular acoustic settings, often show restricted generalization capabilities for diverse acoustic environments. Furthermore, it is notably costly to obtain annotated samples for spatial sound events. Deploying a SELD system in a… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: 13 pages, 11 figures

  17. arXiv:2312.13654  [pdf, other

    cs.IT eess.SP math.OC

    Free Space Optical Integrated Sensing and Communication Based on DCO-OFDM: Performance Metrics and Resource Allocation

    Authors: Yunfeng Wen, Fang Yang, Jian Song, Zhu Han

    Abstract: As one of the six usage scenarios of the sixth generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered considerable attention, and numerous studies have been conducted on radio-frequency (RF)-ISAC. Benefitting from the communication and sensing capabilities of an optical system, free space optical (FSO)-ISAC becomes a potential complement to RF-ISAC. I… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 13 pages, 8 figures

  18. arXiv:2312.13640  [pdf, other

    eess.SP cs.IT

    Optical Integrated Sensing and Communication: Architectures, Potentials and Challenges

    Authors: Yunfeng Wen, Fang Yang, Jian Song, Zhu Han

    Abstract: Integrated sensing and communication (ISAC) is viewed as a crucial component of future mobile networks and has gained much interest in both academia and industry. Similar to the emergence of radio-frequency (RF) ISAC, the integration of free space optical communication and optical sensing yields optical ISAC (O-ISAC), which is regarded as a powerful complement to its RF counterpart. In this articl… ▽ More

    Submitted 10 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 7 pages, 5 figures

  19. arXiv:2311.15556  [pdf, other

    cs.CV eess.IV

    PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

    Authors: Jiquan Yuan, Xinyan Cao, Chang** Li, Fanyi Yang, **long Lin, Xixin Cao

    Abstract: As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natura… ▽ More

    Submitted 29 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 18 pages

  20. arXiv:2311.00996  [pdf, other

    eess.IV cs.CV

    VCISR: Blind Single Image Super-Resolution with Video Compression Synthetic Data

    Authors: Boyang Wang, Bowen Liu, Shiyu Liu, Fengyu Yang

    Abstract: In the blind single image super-resolution (SISR) task, existing works have been successful in restoring image-level unknown degradations. However, when a single video frame becomes the input, these works usually fail to address degradations caused by video compression, such as mosquito noise, ringing, blockiness, and staircase noise. In this work, we for the first time, present a video compressio… ▽ More

    Submitted 22 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

  21. arXiv:2310.17505  [pdf, other

    eess.SY

    Free Space Optical Communication for Inter-Satellite Link: Architecture, Potentials and Trends

    Authors: Guanhua Wang, Fang Yang, Jian Song, Zhu Han

    Abstract: The sixth-generation (6G) network is expected to achieve global coverage based on the space-air-ground integrated network, and the latest satellite network will play an important role in it. The introduction of inter-satellite links (ISLs) can significantly improve the throughput of the satellite network, and recently gets lots of attention from both academia and industry. In this paper, we illust… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  22. arXiv:2310.05368  [pdf, other

    cs.AI cs.MA cs.SD eess.AS

    Measuring Acoustics with Collaborative Multiple Agents

    Authors: Yinfeng Yu, Changan Chen, Lele Cao, Fangkai Yang, Fuchun Sun

    Abstract: As humans, we hear sound every second of our life. The sound we hear is often affected by the acoustics of the environment surrounding us. For example, a spacious hall leads to more reverberation. Room Impulse Responses (RIR) are commonly used to characterize environment acoustics as a function of the scene geometry, materials, and source/receiver locations. Traditionally, RIRs are measured by set… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Main paper (9 pages and 5 figures and 2 tables) and appendix (16 pages and 13 figures and 10 tables). Accepted for publication by IJCAI 2023

  23. arXiv:2309.08838  [pdf, other

    cs.CV eess.IV

    AOSR-Net: All-in-One Sandstorm Removal Network

    Authors: Yazhong Si, Xulong Zhang, Fan Yang, Jianzong Wang, Ning Cheng, **g Xiao

    Abstract: Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios. In addition, these approaches often adopt a strategy of color correction followed by dust removal, which makes the algorithm structure too complex. To solve the issue, we introduce a novel image restoration model, named all-in-one… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023)

  24. arXiv:2308.08847  [pdf, other

    eess.AS cs.SD

    META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection

    Authors: **bo Hu, Yin Cao, Ming Wu, Feiran Yang, Ziying Yu, Wenwu Wang, Mark D. Plumbley, Jun Yang

    Abstract: For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such as different sizes of rooms, different reverberation times, and different background noise, may be reasons for a learning-based system to fail. On the… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Submitted to DCASE 2023 Workshop

  25. arXiv:2307.16709  [pdf, other

    cs.CL eess.AS

    Multilingual context-based pronunciation learning for Text-to-Speech

    Authors: Giulia Comini, Manuel Sam Ribeiro, Fan Yang, Heereen Shim, Jaime Lorenzo-Trueba

    Abstract: Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end. Given a language, a lexicon can be collected offline and Grapheme-to-Phoneme (G2P) relationships are usually modeled in order to predict the pronunciation for out-of-vocabulary (OOV) words. Additionally, post-lexical phonology, often defined in the form of rule-based systems, is used to co… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, 5 tables. Interspeech 2023

  26. A Message Passing Detection based Affine Frequency Division Multiplexing Communication System

    Authors: Lifan Wu, Shan Luo, Dongxiao Song, Fan Yang, Rong** Lin

    Abstract: The next generation of wireless communication technology is anticipated to address the communication reliability challenges encountered in high-speed mobile communication scenarios. An Orthogonal Time Frequency Space (OTFS) system has been introduced as a solution that effectively mitigates these issues. However, OTFS is associated with relatively high pilot overhead and multiuser multiplexing ove… ▽ More

    Submitted 30 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

    Comments: 8 pages, 7 figures

  27. arXiv:2307.13220  [pdf

    eess.IV cs.AI physics.med-ph

    One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

    Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Mei**g Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

    Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 38 pages, 19 figures, 5 tables

  28. arXiv:2307.00307  [pdf, other

    eess.IV

    Spatio-Temporal Classification of Lung Ventilation Patterns using 3D EIT Images: A General Approach for Individualized Lung Function Evaluation

    Authors: Shuzhe Chen, Li Li, Zhichao Lin, Ke Zhang, Ying Gong, Lu Wang, Xu Wu, Maokun Li, Yuanlin Song, Fan Yang, Shenheng Xu

    Abstract: The Pulmonary Function Test (PFT) is an widely utilized and rigorous classification test for lung function evaluation, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventila… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  29. arXiv:2306.06734  [pdf, ps, other

    cs.IT eess.SP

    MLE-based Device Activity Detection under Rician Fading for Massive Grant-free Access with Perfect and Imperfect Synchronization

    Authors: Wang Liu, Ying Cui, Feng Yang, Lianghui Ding, Jun Sun

    Abstract: Most existing studies on massive grant-free access, proposed to support massive machine-type communications (mMTC) for the Internet of things (IoT), assume Rayleigh fading and perfect synchronization for simplicity. However, in practice, line-of-sight (LoS) components generally exist, and time and frequency synchronization are usually imperfect. This paper systematically investigates maximum likel… ▽ More

    Submitted 11 January, 2024; v1 submitted 11 June, 2023; originally announced June 2023.

  30. arXiv:2305.09833  [pdf, other

    eess.IV cs.CV

    Segmentation of Aortic Vessel Tree in CT Scans with Deep Fully Convolutional Networks

    Authors: Shaofeng Yuan, Feng Yang

    Abstract: Automatic and accurate segmentation of aortic vessel tree (AVT) in computed tomography (CT) scans is crucial for early detection, diagnosis and prognosis of aortic diseases, such as aneurysms, dissections and stenosis. However, this task remains challenges, due to the complexity of aortic vessel tree and amount of CT angiography data. In this technical report, we use two-stage fully convolutional… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: 7 pages, 1 figure, 1 table

  31. arXiv:2305.09798  [pdf

    cs.CL cs.HC eess.SY stat.AP

    The Ways of Words: The Impact of Word Choice on Information Engagement and Decision Making

    Authors: Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan Yang, Jennifer Romano

    Abstract: Little research has explored how information engagement (IE), the degree to which individuals interact with and use information in a manner that manifests cognitively, behaviorally, and affectively. This study explored the impact of phrasing, specifically word choice, on IE and decision making. Synthesizing two theoretical models, User Engagement Theory UET and Information Behavior Theory IBT, a t… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    MSC Class: 28-08 ACM Class: H.5.2; H.1.2

  32. arXiv:2305.07270  [pdf, other

    cs.CV cs.RO eess.IV

    SSD-MonoDETR: Supervised Scale-aware Deformable Transformer for Monocular 3D Object Detection

    Authors: Xuan He, Fan Yang, Kailun Yang, Jiacheng Lin, Haolong Fu, Meng Wang, ** Yuan, Zhiyong Li

    Abstract: Transformer-based methods have demonstrated superior performance for monocular 3D object detection recently, which aims at predicting 3D attributes from a single 2D image. Most existing transformer-based methods leverage both visual and depth representations to explore valuable query points on objects, and the quality of the learned query points has a great impact on detection accuracy. Unfortunat… ▽ More

    Submitted 1 September, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). Code will be made publicly available at https://github.com/mikasa3lili/SSD-MonoDETR

  33. arXiv:2301.13648  [pdf, other

    eess.IV cs.CV

    CSDN: Combing Shallow and Deep Networks for Accurate Real-time Segmentation of High-definition Intravascular Ultrasound Images

    Authors: Shaofeng Yuan, Feng Yang

    Abstract: Intravascular ultrasound (IVUS) is the preferred modality for capturing real-time and high resolution cross-sectional images of the coronary arteries, and evaluating the stenosis. Accurate and real-time segmentation of IVUS images involves the delineation of lumen and external elastic membrane borders. In this paper, we propose a two-stream framework for efficient segmentation of 60 MHz high resol… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: 5 pages, 2 figures, 1 table, submitted to the 20th IEEE International Symposium on Biomedical Imaging (IEEE ISBI 2023)

  34. arXiv:2301.04032  [pdf

    eess.IV cs.CV

    Does image resolution impact chest X-ray based fine-grained Tuberculosis-consistent lesion segmentation?

    Authors: Sivaramakrishnan Rajaraman, Feng Yang, Ghada Zamzmi, Zhiyun Xue, Sameer Antani

    Abstract: Deep learning (DL) models are state-of-the-art in segmenting anatomical and disease regions of interest (ROIs) in medical images. Particularly, a large number of DL-based techniques have been reported using chest X-rays (CXRs). However, these models are reportedly trained on reduced image resolutions for reasons related to the lack of computational resources. Literature is sparse in discussing the… ▽ More

    Submitted 27 January, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

    Comments: 17 pages, 7 figures, 5 tables

  35. arXiv:2301.00161  [pdf, other

    cs.IT cs.AR eess.SP eess.SY

    Active RISs: Signal Modeling, Asymptotic Analysis, and Beamforming Design

    Authors: Zijian Zhang, Linglong Dai, Xibi Chen, Changhao Liu, Fan Yang, Robert Schober, H. Vincent Poor

    Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a candidate technology for future 6G networks. However, due to the "multiplicative fading" effect, the existing passive RISs only achieve a negligible capacity gain in environments with strong direct links. In this paper, the concept of active RISs is studied to overcome this fundamental limitation. Unlike the existing passive RISs that re… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

    Comments: Accepted by IEEE GLOBECOM 2022. This paper includes a 64-element active RIS aided wireless communication prototype and the field test results. The journal version is at: arXiv:2103.15154. Simulation codes are provided at: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

    Journal ref: IEEE GLOBECOM 2022

  36. arXiv:2212.03435  [pdf, other

    cs.SD cs.CL eess.AS

    Improve Bilingual TTS Using Dynamic Language and Phonology Embedding

    Authors: Fengyu Yang, Jian Luan, Yujun Wang

    Abstract: In most cases, bilingual TTS needs to handle three types of input scripts: first language only, second language only, and second language embedded in the first language. In the latter two situations, the pronunciation and intonation of the second language are usually quite different due to the influence of the first language. Therefore, it is a big challenge to accurately model the pronunciation a… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP2023

  37. arXiv:2211.02475  [pdf

    eess.IV cs.CV

    Generalizability of Deep Adult Lung Segmentation Models to the Pediatric Population: A Retrospective Study

    Authors: Sivaramakrishnan Rajaraman, Feng Yang, Ghada Zamzmi, Zhiyun Xue, Sameer Antani

    Abstract: Lung segmentation in chest X-rays (CXRs) is an important prerequisite for improving the specificity of diagnoses of cardiopulmonary diseases in a clinical decision support system. Current deep learning models for lung segmentation are trained and evaluated on CXR datasets in which the radiographic projections are captured predominantly from the adult population. However, the shape of the lungs is… ▽ More

    Submitted 25 May, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: 33 pages, 8 figures, and 8 tables

    ACM Class: I.4.6

  38. arXiv:2211.02256  [pdf

    eess.IV cs.CV

    ISA-Net: Improved spatial attention network for PET-CT tumor segmentation

    Authors: Zhengyong Huang, Sijuan Zou, Guoshuai Wang, Zixiang Chen, Hao Shen, Haiyan Wang, Na Zhang, Lu Zhang, Fan Yang, Haining Wangg, Dong Liang, Tianye Niu, Xiaohua Zhuc, Zhanli Hua

    Abstract: Achieving accurate and automated tumor segmentation plays an important role in both clinical practice and radiomics research. Segmentation in medicine is now often performed manually by experts, which is a laborious, expensive and error-prone task. Manual annotation relies heavily on the experience and knowledge of these experts. In addition, there is much intra- and interobserver variation. There… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  39. arXiv:2210.15847  [pdf, other

    eess.SY

    Distributed Optimal Control of Graph Symmetric Systems via Graph Filters

    Authors: Fengjun Yang, Fernando Gama, Somayeh Sojoudi, Nikolai Matni

    Abstract: Designing distributed optimal controllers subject to communication constraints is a difficult problem unless structural assumptions are imposed on the underlying dynamics and information exchange structure, e.g., sparsity, delay, or spatial invariance. In this paper, we borrow ideas from graph signal processing and define and analyze a class of Graph Symmetric Systems (GSSs), which are systems tha… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  40. arXiv:2209.01802  [pdf, other

    eess.AS cs.SD

    Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation Chains

    Authors: **bo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, Jun Yang

    Abstract: Sound event localization and detection (SELD) is a joint task of sound event detection and direction-of-arrival estimation. In DCASE 2022 Task 3, types of data transform from computationally generated spatial recordings to recordings of real-sound scenes. Our system submitted to the DCASE 2022 Task 3 is based on our previous proposed Event-Independent Network V2 (EINV2) with a novel data augmentat… ▽ More

    Submitted 9 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

    Comments: Submitted to DCASE 2022 Workshop. Code is available at https://github.com/**bo-Hu/DCASE2022-TASK3

  41. arXiv:2208.04756  [pdf, other

    cs.SD eess.AS

    DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation

    Authors: Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang

    Abstract: A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal Processing (DDSP), we propose a new vocoder named SawSing for singing voices. SawSing synthesizes the harmonic part of singing voices by filtering a sawtooth source signal with a linear time-variant finite impulse response… ▽ More

    Submitted 18 August, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Accepted at ISMIR 2022

    Journal ref: International Society for Music Information Retrieval (ISMIR) 2022

  42. arXiv:2208.00141  [pdf, other

    eess.SY

    Distributed Scheduling at Non-Signalized Intersections with Mixed Cooperative and Non-Cooperative Vehicles

    Authors: Feihong Yang, Yuan Shen

    Abstract: Intersection management with mixed cooperative and non-cooperative vehicles is crucial in next-generation transportation systems. For fully non-cooperative systems, a minimax scheduling framework was established, while it is inefficient in mixed systems as the benefit of cooperation is not exploited. This letter focuses on the efficient scheduling in mixed systems and proposes a two-stage policy t… ▽ More

    Submitted 7 August, 2022; v1 submitted 30 July, 2022; originally announced August 2022.

  43. arXiv:2207.00943  [pdf, other

    cs.CV eess.IV

    Degradation-Guided Meta-Restoration Network for Blind Super-Resolution

    Authors: Fuzhi Yang, Huan Yang, Yanhong Zeng, Jianlong Fu, Hongtao Lu

    Abstract: Blind super-resolution (SR) aims to recover high-quality visual textures from a low-resolution (LR) image, which is usually degraded by down-sampling blur kernels and additive noises. This task is extremely difficult due to the challenges of complicated image degradations in the real-world. Existing SR approaches either assume a predefined blur kernel or a fixed noise, which limits these approache… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

  44. arXiv:2206.14465  [pdf, other

    cs.IT eess.SP

    Intelligent Reflecting Surface for MIMO VLC: Joint Design of Surface Configuration and Transceiver Signal Processing

    Authors: Shiyuan Sun, Fang Yang, Jian Song, Rui Zhang

    Abstract: With the capability of reconfiguring the wireless electromagnetic environment, intelligent reflecting surface (IRS) is a new paradigm for designing future wireless communication systems. In this paper, we consider optical IRS for improving the performance of visible light communication (VLC) under a multiple-input and multiple-output (MIMO) setting. Specifically, we focus on the downlink communica… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

  45. arXiv:2206.09065  [pdf

    eess.IV cs.CV

    Free-form Lesion Synthesis Using a Partial Convolution Generative Adversarial Network for Enhanced Deep Learning Liver Tumor Segmentation

    Authors: Yingao Liu, Fei Yang, Yidong Yang

    Abstract: Automatic deep learning segmentation models has been shown to improve both the segmentation efficiency and the accuracy. However, training a robust segmentation model requires considerably large labeled training samples, which may be impractical. This study aimed to develop a deep learning framework for generating synthetic lesions that can be used to enhance network training. The lesion synthesis… ▽ More

    Submitted 25 October, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: The paper is under review by JACMP-Journal of Applied Medical Physics

  46. arXiv:2206.07893  [pdf, other

    cs.CV cs.MM eess.IV

    PeQuENet: Perceptual Quality Enhancement of Compressed Video with Adaptation- and Attention-based Network

    Authors: Sai** Zhang, Luis Herranz, Marta Mrak, Marc Gorriz Blanch, Shuai Wan, Fuzheng Yang

    Abstract: In this paper we propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos. Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model. The attention module exploits global receptive fields that can capture and align long-range correlations between consecutive frames, which can be beneficial for… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  47. arXiv:2206.07490  [pdf, other

    eess.SP

    Demo: low-power communications based on RIS and AI for 6G

    Authors: Mingyao Cui, Zidong Wu, Yuhao Chen, Shenheng Xu, Fan Yang, Linglong Dai

    Abstract: Ultra-massive multiple-input-multiple-output (UM-MIMO) is promising to meet the high rate requirements for future 6G. However, due to the large number of antennas and high path loss, the hardware power consumption and computing power consumption of UM-MIMO will be unaffordable. To address this problem, we implement a low-power communication system based on reconfigurable intelligent surface (RIS)… ▽ More

    Submitted 21 May, 2022; originally announced June 2022.

    Comments: 2 pages, 3 figures. This paper has received the IEEE ICC 2022 outstanding demo award

  48. arXiv:2206.06065  [pdf

    eess.IV cs.CV

    Deep ensemble learning for segmenting tuberculosis-consistent manifestations in chest radiographs

    Authors: Sivaramakrishnan Rajaraman, Feng Yang, Ghada Zamzmi, Peng Guo, Zhiyun Xue, Sameer K Antani

    Abstract: Automated segmentation of tuberculosis (TB)-consistent lesions in chest X-rays (CXRs) using deep learning (DL) methods can help reduce radiologist effort, supplement clinical decision-making, and potentially result in improved patient treatment. The majority of works in the literature discuss training automatic segmentation models using coarse bounding box annotations. However, the granularity of… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: 13 pages, 6 figures

    MSC Class: 68T07

  49. arXiv:2205.06754  [pdf, other

    eess.IV cs.CV

    Slimmable Video Codec

    Authors: Zhaocheng Liu, Luis Herranz, Fei Yang, Sai** Zhang, Shuai Wan, Marta Mrak, Marc Górriz Blanch

    Abstract: Neural video compression has emerged as a novel paradigm combining trainable multilayer neural networks and machine learning, achieving competitive rate-distortion (RD) performances, but still remaining impractical due to heavy neural architectures, with large memory and computational demands. In addition, models are usually optimized for a single RD tradeoff. Recent slimmable image codecs can dyn… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: Computer Vision and Pattern Recognition Workshop(CLIC2022)

  50. Limited-memory BFGS Optimisation of Phase-Only Computer-Generated Hologram for Fraunhofer Diffraction

    Authors: **ze Sha, Andrew Kadis, Fan Yang, Timothy D. Wilkinson

    Abstract: We implement a novel limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimisation algorithm with cross entropy (CE) loss function, to produce phase-only computer-generated hologram (CGH) for holographic displays, with validation on a binary-phase modulation holographic projector.

    Submitted 10 May, 2022; originally announced May 2022.

    Journal ref: Digital Holography and 3-D Imaging 2022 Technical Digest Series (Optica Publishing Group, 2022), paper W3A.3