Skip to main content

Showing 1–50 of 137 results for author: Yuan, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18310  [pdf, other

    cs.CV cs.LG eess.IV

    Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution

    Authors: Wenting Chen, Jie Liu, Tommy W. S. Chow, Yixuan Yuan

    Abstract: Pathology image are essential for accurately interpreting lesion cells in cytopathology screening, but acquiring high-resolution digital slides requires specialized equipment and long scanning times. Though super-resolution (SR) techniques can alleviate this problem, existing deep learning models recover pathology image in a black-box manner, which can lead to untruthful biological details and mis… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE TRANSACTIONS ON MEDICAL IMAGING (TMI)

  2. arXiv:2406.12270  [pdf, other

    cs.IT eess.SP

    Sparse MIMO for ISAC: New Opportunities and Challenges

    Authors: Xinrui Li, Hongqi Min, Yong Zeng, Shi **, Linglong Dai, Yifei Yuan, Rui Zhang

    Abstract: Multiple-input multiple-output (MIMO) has been a key technology of wireless communications for decades. A typical MIMO system employs antenna arrays with the inter-antenna spacing being half of the signal wavelength, which we term as compact MIMO. Looking forward towards the future sixth-generation (6G) mobile communication networks, MIMO system will achieve even finer spatial resolution to not on… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.02918  [pdf, other

    eess.IV cs.CV

    U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation

    Authors: Chenxin Li, Xinyu Liu, Wuyang Li, Cheng Wang, Hengyu Liu, Yixuan Yuan

    Abstract: U-Net has become a cornerstone in various visual applications such as image segmentation and diffusion probability models. While numerous innovative designs and improvements have been introduced by incorporating transformers or MLPs, the networks are still limited to linearly modeling patterns as well as the deficient interpretability. To address these challenges, our intuition is inspired by the… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  4. arXiv:2405.18356  [pdf, other

    eess.IV cs.CV

    Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

    Authors: Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan, Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang, Zongwei Zhou

    Abstract: The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to Medical Image Analysis

  5. arXiv:2405.10825  [pdf, other

    eess.SY cs.LG

    Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

    Authors: Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili **, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu

    Abstract: Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks bas… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  6. arXiv:2405.00233  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

    Authors: Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these chal… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Demo and code: https://haoheliu.github.io/SemantiCodec/

  7. arXiv:2404.17806  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

    Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

    Abstract: Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introd… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Preprint submitted to IEEE MLSP 2024

  8. arXiv:2404.03068  [pdf, other

    cs.IT eess.SP

    Multiple UAV-Assisted Cooperative DF Relaying in Multi-User Massive MIMO IoT Systems

    Authors: Mobeen Mahmood, Yicheng Yuan, Tho Le-Ngoc

    Abstract: This work considers a multi-user massive multiple-input multiple-output (MU-mMIMO) Internet-of-Things (IoT) system, where multiple unmanned aerial vehicles (UAVs) operating as decode-and-forward (DF) relays connect the base station (BS) to a large number of IoT devices. To maximize the total achievable rate, we propose a novel joint optimization problem of hybrid beamforming (HBF), multiple UAV re… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication in IEEE ICC 2024. arXiv admin note: text overlap with arXiv:2309.11748

  9. arXiv:2404.01611  [pdf

    cs.LG cs.SD eess.AS

    Audio Simulation for Sound Source Localization in Virtual Evironment

    Authors: Yi Di Yuan, Swee Liang Wong, Jonathan Pan

    Abstract: Non-line-of-sight localization in signal-deprived environments is a challenging yet pertinent problem. Acoustic methods in such predominantly indoor scenarios encounter difficulty due to the reverberant nature. In this study, we aim to locate sound sources to specific locations within a virtual environment by leveraging physically grounded sound propagation simulations and machine learning methods… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 2024 IEEE World Forum on Public Safety Technology

  10. arXiv:2403.07390  [pdf, other

    eess.IV cs.CV

    Learning Correction Errors via Frequency-Self Attention for Blind Image Super-Resolution

    Authors: Haochen Sun, Yan Yuan, Lijuan Su, Haotian Shao

    Abstract: Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a res… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 16 pages

  11. arXiv:2403.00605  [pdf, other

    eess.SP

    Channel Measurements and Modeling for Dynamic Vehicular ISAC Scenarios at 28 GHz

    Authors: Zhengyu Zhang, Ruisi He, Bo Ai, Mi Yang, Xuejian Zhang, Ziyi Qi, Yuan Yuan

    Abstract: Integrated sensing and communication (ISAC) is a promising technology for 6G, with the goal of providing end-to-end information processing and inherent perception capabilities for future communication systems. Within ISAC emerging application scenarios, vehicular ISAC technologies have the potential to enhance traffic efficiency and safety through integration of communication and synchronized perc… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  12. arXiv:2403.00569  [pdf, other

    eess.SP

    Characterization of Wireless Channel Semantics: A New Paradigm

    Authors: Zhengyu Zhang, Ruisi He, Mi Yang, Xuejian Zhang, Ziyi Qi, Yuan Yuan, Bo Ai

    Abstract: Recently, deep learning enabled semantic communications have been developed to understand transmission content from semantic level, which realize effective and accurate information transfer. Aiming to the vision of sixth generation (6G) networks, wireless devices are expected to have native perception and intelligent capabilities, which associate wireless channel with surrounding environments from… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  13. arXiv:2402.16663  [pdf, other

    eess.IV cs.CV

    UN-SAM: Universal Prompt-Free Segmentation for Generalized Nuclei Images

    Authors: Zhen Chen, Qing Xu, Xinyu Liu, Yixuan Yuan

    Abstract: In digital pathology, precise nuclei segmentation is pivotal yet challenged by the diversity of tissue types, staining protocols, and imaging conditions. Recently, the segment anything model (SAM) revealed overwhelming performance in natural scenarios and impressive adaptation to medical imaging. Despite these advantages, the reliance of labor-intensive manual annotation as segmentation prompts se… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 10 pages, 6 figures

  14. arXiv:2401.15434  [pdf

    eess.IV cs.CV cs.LG

    Decentralized Gossip Mutual Learning (GML) for brain tumor segmentation on multi-parametric MRI

    Authors: **gyun Chen, Yading Yuan

    Abstract: Federated Learning (FL) enables collaborative model training among medical centers without sharing private data. However, traditional FL risks on server failures and suboptimal performance on local data due to the nature of centralized model aggregation. To address these issues, we present Gossip Mutual Learning (GML), a decentralized framework that uses Gossip Protocol for direct peer-to-peer com… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: 3 pages, 1 figure, accepted to IEEE EMBS 2023. arXiv admin note: text overlap with arXiv:2401.06180

  15. arXiv:2401.07012  [pdf

    cs.LG eess.SY stat.ML

    An ADRC-Incorporated Stochastic Gradient Descent Algorithm for Latent Factor Analysis

    Authors: **li Li, Ye Yuan

    Abstract: High-dimensional and incomplete (HDI) matrix contains many complex interactions between numerous nodes. A stochastic gradient descent (SGD)-based latent factor analysis (LFA) model is remarkably effective in extracting valuable information from an HDI matrix. However, such a model commonly encounters the problem of slow convergence because a standard SGD algorithm only considers the current learni… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  16. arXiv:2401.06180  [pdf

    eess.IV cs.DC cs.LG

    Decentralized Gossip Mutual Learning (GML) for automatic head and neck tumor segmentation

    Authors: **gyun Chen, Yading Yuan

    Abstract: Federated learning (FL) has emerged as a promising strategy for collaboratively training complicated machine learning models from different medical centers without the need of data sharing. However, the traditional FL relies on a central server to orchestrate the global model training among clients. This makes it vulnerable to the failure of the model server. Meanwhile, the model trained based on… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 6 pages, 1 figure, accepted to SPIE Medical Imaging 2024

  17. arXiv:2312.05256  [pdf, other

    eess.IV cs.AI

    Holistic Evaluation of GPT-4V for Biomedical Imaging

    Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, **gyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

    Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  18. arXiv:2312.00550  [pdf, ps, other

    eess.SP

    Novel 3D Geometry-Based Stochastic Models for Non-Isotropic MIMO Vehicle-to-Vehicle Channels

    Authors: Yi Yuan, Cheng-Xiang Wang, Xiang Cheng, Bo Ai, David I. Laurenson

    Abstract: This paper proposes a novel three-dimensional (3D) theoretical regular-shaped geometry-based stochastic model (RS-GBSM) and the corresponding sum-of-sinusoids (SoS) simulation model for non-isotropic multiple-input multiple-output (MIMO) vehicle-to-vehicle (V2V) Ricean fading channels. The proposed RS-GBSM, combining line-of-sight (LoS) components, a two-sphere model, and an elliptic-cylinder mode… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  19. arXiv:2312.00535  [pdf, other

    eess.SP cs.LG

    RIS-Based On-the-Air Semantic Communications -- a Diffractional Deep Neural Network Approach

    Authors: Shuyi Chen, Yingzhe Hui, Yifan Qin, Yueyi Yuan, Weixiao Meng, Xuewen Luo, Hsiao-Hwa Chen

    Abstract: Semantic communication has gained significant attention recently due to its advantages in achieving higher transmission efficiency by focusing on semantic information instead of bit-level information. However, current AI-based semantic communication methods require digital hardware for implementation. With the rapid advancement on reconfigurable intelligence surfaces (RISs), a new approach called… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 17 pages, 5 figures, accepted by IEEE WCM

  20. arXiv:2311.07630  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation

    Authors: Zhaojian Li, Bin Zhao, Yuan Yuan

    Abstract: Binaural stereo audio is recorded by imitating the way the human ear receives sound, which provides people with an immersive listening experience. Existing approaches leverage autoencoders and directly exploit visual spatial information to synthesize binaural stereo, resulting in a limited representation of visual guidance. For the first time, we propose a visually guided generative adversarial ap… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  21. arXiv:2309.11642  [pdf

    q-bio.TO eess.IV

    High-content stimulated Raman histology of human breast cancer

    Authors: Hongli Ni, Chinmayee Prabhu Dessai, Haonan Lin, Wei Wang, Shaoxiong Chen, Yuhao Yuan, Xiaowei Ge, Jianpeng Ao, Nolan Vild, Ji-Xin Cheng

    Abstract: Histological examination is crucial for cancer diagnosis, including hematoxylin and eosin (H&E) staining for map** morphology and immunohistochemistry (IHC) staining for revealing chemical information. Recently developed two-color stimulated Raman histology could bypass the complex tissue processing to mimic H&E-like morphology. Yet, the underlying chemical features are not revealed, compromisin… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 6 figures

  22. arXiv:2309.08051  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Retrieval-Augmented Text-to-Audio Generation

    Authors: Yi Yuan, Haohe Liu, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

    Abstract: Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such as AudioCaps, are biased in their generation performance. Specifically, they excel in generating common audio classes while underperforming in the rare ones, thus degrading the overall generation performance. We refer… ▽ More

    Submitted 5 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  23. arXiv:2308.14117  [pdf, other

    eess.SY math.OC

    Cross-Entropy-Based Approach to Multi-Objective Electric Vehicle Charging Infrastructure Planning

    Authors: **hao Li, Yu Hui Yuan, Qiushi Cui, Hao Wang

    Abstract: Pure electric vehicles (PEVs) are increasingly adopted to decarbonize the transport sector and mitigate global warming. However, the inadequate PEV charging infrastructure may hinder the further adoption of PEVs in the large-scale traffic network, which calls for effective planning solutions for the charging station (CS) placement. The deployment of charging infrastructure inevitably increases the… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: IEEE I&CPS Asia 2023 (2023 IEEE IAS Industrial and Commercial Power System Asia Conference)

  24. arXiv:2308.05734  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

    Authors: Haohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yu** Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley

    Abstract: Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learn… ▽ More

    Submitted 11 May, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. Project page is https://audioldm.github.io/audioldm2

  25. arXiv:2308.05037  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Separate Anything You Describe

    Authors: Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable interface for digital audio applications. Recent works on LASS, despite attaining promising separation performance on specific sources (e.g., musical instr… ▽ More

    Submitted 27 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Code, benchmark and pre-trained models: https://github.com/Audio-AGI/AudioSep

  26. arXiv:2307.14335  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    WavJourney: Compositional Audio Creation with Large Language Models

    Authors: Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, **hua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

    Abstract: Despite breakthroughs in audio generation models, their capabilities are often confined to domain-specific conditions such as speech transcriptions and audio captions. However, real-world audio creation aims to generate harmonious audio containing various elements such as speech, music, and sound effects with controllable conditions, which is challenging to address using existing audio generation… ▽ More

    Submitted 26 November, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: GitHub: https://github.com/Audio-AGI/WavJourney

  27. arXiv:2306.10359  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Text-Driven Foley Sound Generation With Latent Diffusion Model

    Authors: Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu Wang

    Abstract: Foley sound generation aims to synthesise the background sound for multimedia content. Previous models usually employ a large development set with labels as input (e.g., single numbers or one-hot vector). In this work, we propose a diffusion model based system for Foley sound generation with text conditions. To alleviate the data scarcity issue, our model is initially pre-trained with large-scale… ▽ More

    Submitted 18 September, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

    Comments: Submit to DCASE-workshop 2023, an extension and supersedes the previous technical report arXiv:2305.15905

  28. arXiv:2306.10275  [pdf, other

    eess.SY

    Multi-Scale Simulation of Complex Systems: A Perspective of Integrating Knowledge and Data

    Authors: Huandong Wang, Huan Yan, Can Rong, Yuan Yuan, Fenyu Jiang, Zhenyu Han, Hongjie Sui, Depeng **, Yong Li

    Abstract: Complex system simulation has been playing an irreplaceable role in understanding, predicting, and controlling diverse complex systems. In the past few decades, the multi-scale simulation technique has drawn increasing attention for its remarkable ability to overcome the challenges of complex system simulation with unknown mechanisms and expensive computational costs. In this survey, we will syste… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

  29. arXiv:2305.15905  [pdf, other

    cs.SD cs.MM eess.AS

    Latent Diffusion Model Based Foley Sound Generation System For DCASE Challenge 2023 Task 7

    Authors: Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Mark D. Plumbley, Wenwu Wang

    Abstract: Foley sound presents the background sound for multimedia content and the generation of Foley sound involves computationally modelling sound effects with specialized techniques. In this work, we proposed a system for DCASE 2023 challenge task 7: Foley Sound Synthesis. The proposed system is based on AudioLDM, which is a diffusion-based text-to-audio generation model. To alleviate the data-hungry pr… ▽ More

    Submitted 15 September, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: DCASE 2023 task 7 technical report, ranked 1st in the challenge

  30. arXiv:2303.03857  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study

    Authors: Yi Yuan, Haohe Liu, **hua Liang, Xubo Liu, Mark D. Plumbley, Wenwu Wang

    Abstract: Deep neural networks have recently achieved breakthroughs in sound generation with text prompts. Despite their promising performance, current text-to-sound generation models face issues on small-scale datasets (e.g., overfitting), significantly limiting their performance. In this paper, we investigate the use of pre-trained AudioLDM, the state-of-the-art model for text-to-audio generation, as the… ▽ More

    Submitted 11 March, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: EUSIPCO 2023

  31. arXiv:2303.01927  [pdf, other

    cs.IT eess.SP

    A Generalized Nyquist-Shannon Sampling Theorem Using the Koopman Operator

    Authors: Zhexuan Zeng, Ye Yuan

    Abstract: The sampling theorem plays a fundamental role for the recovery of continuous-time signals from discrete-time samples in the field of signal processing. The sampling theorem of non-band-limited signals has evolved into one of the most challenging problems. In this work, a generalized sampling theorem -- which builds on the Koopman operator -- is proved for signals in generator-bounded space (Theore… ▽ More

    Submitted 6 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

  32. arXiv:2301.12503  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

    Authors: Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley

    Abstract: Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLA… ▽ More

    Submitted 9 September, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: Accepted by ICML 2023. Demo and implementation at https://audioldm.github.io. Evaluation toolbox at https://github.com/haoheliu/audioldm_eval

  33. CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

    Authors: Jie Liu, Yixiao Zhang, Jie-Neng Chen, Junfei Xiao, Yongyi Lu, Bennett A. Landman, Yixuan Yuan, Alan Yuille, Yucheng Tang, Zongwei Zhou

    Abstract: An increasing number of public datasets have shown a marked impact on automated organ segmentation and tumor detection. However, due to the small size and partially labeled problem of each dataset, as well as a limited investigation of diverse types of tumors, the resulting models are often limited to segmenting specific organs/tumors and ignore the semantics of anatomical structures, nor can they… ▽ More

    Submitted 17 August, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: ICCV-2023; Rank first in Medical Segmentation Decathlon (MSD) Competition

  34. arXiv:2212.07960  [pdf, other

    eess.SY cs.HC eess.IV

    Beyond the Metaverse: XV (eXtended meta/uni/Verse)

    Authors: Steve Mann, Yu Yuan, Tom Furness, Joseph Paradiso, Thomas Coughlin

    Abstract: We propose the term and concept XV (eXtended meta/omni/uni/Verse) as an alternative to, and generalization of, the shared/social virtual reality widely known as ``metaverse''. XV is shared/social XR. We, and many others, use XR (eXtended Reality) as a broad umbrella term and concept to encompass all the other realities, where X is an ``anything'' variable, like in mathematics, to denote any realit… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: 9 pages, 10 figures, presented at the IEEE Standards Association panel entitled "Behind and Beyond the Metaverse", Thurs. Dec. 8th, 2022. This work is entitled "Beyond the Metaverse: XV (eXtended meta/uni/Verse)" and was presented that day from 2:15pm to 3:30pm EST (Eastern Standard Time)

  35. arXiv:2212.05808  [pdf

    eess.IV cs.CV

    Z-SSMNet: A Zonal-aware Self-Supervised Mesh Network for Prostate Cancer Detection and Diagnosis in bpMRI

    Authors: Yuan Yuan, Euijoon Ahn, Dagan Feng, Mohamad Khadra, **man Kim

    Abstract: Prostate cancer (PCa) is one of the most prevalent cancers in men and many people around the world die from clinically significant PCa (csPCa). Early diagnosis of csPCa in bi-parametric MRI (bpMRI), which is non-invasive, cost-effective, and more efficient compared to multiparametric MRI (mpMRI), can contribute to precision care for PCa. The rapid rise in artificial intelligence (AI) algorithms ar… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: 8 pages, 1 figure, PI-CAI challenge

  36. arXiv:2212.03357  [pdf, other

    cs.LG eess.SP

    Contactless Oxygen Monitoring with Gated Transformer

    Authors: Hao He, Yuan Yuan, Ying-Cong Chen, Peng Cao, Dina Katabi

    Abstract: With the increasing popularity of telehealth, it becomes critical to ensure that basic physiological signals can be monitored accurately at home, with minimal patient overhead. In this paper, we propose a contactless approach for monitoring patients' blood oxygen at home, simply by analyzing the radio signals in the room, without any wearable devices. We extract the patients' respiration from the… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: 19 pages, Workshop on Learning from Time Series for Health, NeurIPS 2022

  37. arXiv:2212.00595  [pdf, other

    cs.CV eess.IV

    Ghost-free High Dynamic Range Imaging via Hybrid CNN-Transformer and Structure Tensor

    Authors: Yu Yuan, Jiaqi Wu, Zhongliang **g, Henry Leung, Han Pan

    Abstract: Eliminating ghosting artifacts due to moving objects is a challenging problem in high dynamic range (HDR) imaging. In this letter, we present a hybrid model consisting of a convolutional encoder and a Transformer decoder to generate ghost-free HDR images. In the encoder, a context aggregation network and non-local attention block are adopted to optimize multi-scale features and capture both global… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  38. arXiv:2211.09206  [pdf, other

    cs.CV eess.IV

    Learning to Kindle the Starlight

    Authors: Yu Yuan, Jiaqi Wu, Lindong Wang, Zhongliang **g, Henry Leung, Shuyuan Zhu, Han Pan

    Abstract: Capturing highly appreciated star field images is extremely challenging due to light pollution, the requirements of specialized hardware, and the high level of photographic skills needed. Deep learning-based techniques have achieved remarkable results in low-light image enhancement (LLIE) but have not been widely applied to star field image enhancement due to the lack of training data. To address… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  39. arXiv:2211.05309  [pdf

    eess.SY

    Generic Cryo-CMOS Device Modeling and EDACompatible Platform for Reliable Cryogenic IC Design

    Authors: Zhidong Tang, Zewei Wang, Yumeng Yuan, Chang He, Xin Luo, Ao Guo, Renhe Chen, Yongqi Hu, Longfei Yang, Chengwei Cao, Linlin Liu, Liujiang Yu, Ganbing Shang, Yongfeng Cao, Shoumian Chen, Yuhang Zhao, Shaojian Hu, Xufeng Kou

    Abstract: This paper outlines the establishment of a generic cryogenic CMOS database in which key electrical parameters and transfer characteristics of the MOSFETs are quantified as functions of device size, temperature/frequency responses. Meanwhile, comprehensive device statistical study is conducted to evaluate the influence of variation and mismatch effects at low temperatures. Furthermore, by incorpora… ▽ More

    Submitted 9 February, 2024; v1 submitted 9 November, 2022; originally announced November 2022.

  40. arXiv:2210.04630  [pdf, other

    physics.optics eess.IV

    Non-invasive color imaging through scattering medium under broadband illumination

    Authors: Yunong Sun, Jianbin Liu, Hui Chen, Zhuoran Xi, Yu Zhou, Yuchen He, Huaibin Zheng, Zhuo Xu, Yuan Yuan

    Abstract: Due to the complex of mixed spectral point spread function within memory effect range, it is unreliable and slow to use speckle correlation technology for non-invasive imaging through scattering medium under broadband illumination. The contrast of the speckles will drastically drop as the light source's spectrum width increases. Here, we propose a method for producing the optical transfer function… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  41. arXiv:2209.12406  [pdf, other

    eess.IV cs.CV

    A heterogeneous group CNN for image super-resolution

    Authors: Chunwei Tian, Yanning Zhang, Wangmeng Zuo, Chia-Wen Lin, David Zhang, Yixuan Yuan

    Abstract: Convolutional neural networks (CNNs) have obtained remarkable performance via deep architectures. However, these CNNs often achieve poor robustness for image super-resolution (SR) under complex scenes. In this paper, we present a heterogeneous group SR CNN (HGSRCNN) via leveraging structure information of different types to obtain a high-quality image. Specifically, each heterogeneous group block… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  42. arXiv:2209.11451  [pdf, other

    cs.IT eess.SY

    FIAT: Fine-grained Information Audit for Trustless Transborder Data Flow

    Authors: Shuhao Zheng, Yanxi Lin, Yang Yu, Ye Yuan, Yongzheng Jia, Xue Liu

    Abstract: Auditing the information leakage of latent sensitive features during the transborder data flow has attracted sufficient attention from global digital regulators. However, there is missing a technical approach for the audit practice due to two technical challenges. Firstly, there is a lack of theory and tools for measuring the information of sensitive latent features in a dataset. Secondly, the tra… ▽ More

    Submitted 10 February, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

    Comments: 10 pages, 6 figures, 1 table

  43. arXiv:2209.04093  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Learning Audio-Visual embedding for Person Verification in the Wild

    Authors: Peiwen Sun, Shanshan Zhang, Zishan Liu, Yougen Yuan, Taotao Zhang, Honggang Zhang, Pengfei Hu

    Abstract: It has already been observed that audio-visual embedding is more robust than uni-modality embedding for person verification. Here, we proposed a novel audio-visual strategy that considers aggregators from a fusion perspective. First, we introduced weight-enhanced attentive statistics pooling for the first time in face verification. We find that a strong correlation exists between modalities during… ▽ More

    Submitted 26 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

  44. arXiv:2206.14777  [pdf, other

    cs.IT eess.SP

    System-level Simulation of Reconfigurable Intelligent Surface assisted Wireless Communications System

    Authors: Qi Gu, Dan Wu, Xin Su, Hanning Wang, **gyuan Cui, Yifei Yuan

    Abstract: Reconfigurable intelligent surface (RIS) is an emerging technique employing metasurface to reflect the signal from the source node to the destination node. By smartly reconfiguring the electromagnetic (EM) properties of the metasurface and adjusting the EM parameters of the reflected radio waves, RIS can turn the uncontrollable propagation environment into an artificially reconfigurable space, and… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

  45. arXiv:2206.01741  [pdf, other

    eess.IV cs.CV

    Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image Segmentation

    Authors: Yanglan Ou, Ye Yuan, Xiaolei Huang, Stephen T. C. Wong, John Volpi, James Z. Wang, Kelvin Wong

    Abstract: We present a new encoder-decoder Vision Transformer architecture, Patcher, for medical image segmentation. Unlike standard Vision Transformers, it employs Patcher blocks that segment an image into large patches, each of which is further divided into small patches. Transformers are applied to the small patches within a large patch, which constrains the receptive field of each pixel. We intentionall… ▽ More

    Submitted 29 May, 2023; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: MICCAI 2022

  46. arXiv:2205.14548  [pdf, other

    cs.CV eess.IV

    Image Super-resolution with An Enhanced Group Convolutional Neural Network

    Authors: Chunwei Tian, Yixuan Yuan, Shichao Zhang, Chia-Wen Lin, Wangmeng Zuo, David Zhang

    Abstract: CNNs with strong learning abilities are widely chosen to resolve super-resolution problem. However, CNNs depend on deeper network architectures to improve performance of image super-resolution, which may increase computational cost in general. In this paper, we present an enhanced super-resolution group CNN (ESRGCNN) with a shallow architecture by fully fusing deep and wide channel features to ext… ▽ More

    Submitted 31 July, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

  47. arXiv:2204.14021  [pdf, ps, other

    eess.SY math.DS math.OC

    A Sampling Theorem for Exact Identification of Continuous-time Nonlinear Dynamical Systems

    Authors: Zhexuan Zeng, Zuogong Yue, Alexandre Mauroy, Jorge Goncalves, Ye Yuan

    Abstract: Low sampling frequency challenges the exact identification of the continuous-time (CT) dynamical system from sampled data, even when its model is identifiable. The necessary and sufficient condition is proposed -- which is built from Koopman operator -- to the exact identification of the CT system from sampled data. The condition gives a Nyquist-Shannon-like critical frequency for exact identifica… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

  48. Federated Learning Enables Big Data for Rare Cancer Boundary Detection

    Authors: Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer , et al. (254 additional authors not shown)

    Abstract: Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc… ▽ More

    Submitted 25 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS

  49. arXiv:2204.08127  [pdf

    eess.IV cs.CV

    Parallel Network with Channel Attention and Post-Processing for Carotid Arteries Vulnerable Plaque Segmentation in Ultrasound Images

    Authors: Yanchao Yuan, Cancheng Li, Lu Xu, Ke Zhang, Yang Hua, Jicong Zhang

    Abstract: Carotid arteries vulnerable plaques are a crucial factor in the screening of atherosclerosis by ultrasound technique. However, the plaques are contaminated by various noises such as artifact, speckle noise, and manual segmentation may be time-consuming. This paper proposes an automatic convolutional neural network (CNN) method for plaque segmentation in carotid ultrasound images using a small data… ▽ More

    Submitted 17 April, 2022; originally announced April 2022.

    Comments: 16 pages,6 figures

  50. Dual-Stage Approach Toward Hyperspectral Image Super-Resolution

    Authors: Qiang Li, Yuan Yuan, ** Jia, Qi Wang

    Abstract: Hyperspectral image produces high spectral resolution at the sacrifice of spatial resolution. Without reducing the spectral resolution, improving the resolution in the spatial domain is a very challenging problem. Motivated by the discovery that hyperspectral image exhibits high similarity between adjacent bands in a large spectral range, in this paper, we explore a new structure for hyperspectral… ▽ More

    Submitted 9 April, 2022; originally announced April 2022.