Skip to main content

Showing 1–50 of 114 results for author: Sun, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.12646  [pdf, other

    eess.IV cs.AI cs.CV

    An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

    Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Ya**g Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

    Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to MICCAI-2024

  3. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao **, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, **g Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  4. arXiv:2406.08177  [pdf, other

    eess.IV cs.CV

    One-Step Effective Diffusion Network for Real-World Image Super-Resolution

    Authors: Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, Lei Zhang

    Abstract: The pre-trained text-to-image diffusion models have been increasingly employed to tackle the real-world image super-resolution (Real-ISR) problem due to their powerful generative image priors. Most of the existing methods start from random noise to reconstruct the high-quality (HQ) image under the guidance of the given low-quality (LQ) image. While promising results have been achieved, such Real-… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2406.02560  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Less Peaky and More Accurate CTC Forced Alignment by Label Priors

    Authors: Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur

    Abstract: Connectionist temporal classification (CTC) models are known to have peaky output distributions. Such behavior is not a problem for automatic speech recognition (ASR), but it can cause inaccurate forced alignments (FA), especially at finer granularity, e.g., phoneme level. This paper aims at alleviating the peaky behavior for CTC and improve its suitability for forced alignment generation, by leve… ▽ More

    Submitted 15 June, 2024; v1 submitted 22 April, 2024; originally announced June 2024.

    Comments: Accepted by ICASSP 2024. Github repo: https://github.com/huangruizhe/audio/tree/aligner_label_priors

  6. arXiv:2405.20617  [pdf, other

    eess.SP

    Large-scale Outdoor Cell-free mMIMO Channel Measurement in an Urban Scenario at 3.5 GHz

    Authors: Yuning Zhang, Thomas Choi, Zihang Cheng, Issei Kanno, Masaaki Ito, Jorge Gomez-Ponce, Hussein Hammoud, Bowei Wu, Ashwani Pradhan, Kelvin Arana, Pramod Krishna, Tianyi Yang, Tyler Chen, Ishita Vasishtha, Haoyu Xie, Linyu Sun, Andreas F. Molisch

    Abstract: The design of cell-free massive MIMO (CF-mMIMO) systems requires accurate, measurement-based channel models. This paper provides the first results from the by far most extensive outdoor measurement campaign for CF-mMIMO channels in an urban environment. We measured impulse responses between over 20,000 potential access point (AP) locations and 80 user equipments (UEs) at 3.5 GHz with 350 MHz bandw… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Submitted to: VTC 2024-Fall

  7. arXiv:2405.09923  [pdf, other

    cs.CV eess.IV

    NTIRE 2024 Restore Any Image Model (RAIM) in the Wild Challenge

    Authors: Jie Liang, Radu Timofte, Qiaosi Yi, Shuaizheng Liu, Lingchen Sun, Rongyuan Wu, Xindong Zhang, Hui Zeng, Lei Zhang

    Abstract: In this paper, we review the NTIRE 2024 challenge on Restore Any Image Model (RAIM) in the Wild. The RAIM challenge constructed a benchmark for image restoration in the wild, including real-world images with/without reference ground truth in various scenarios from real applications. The participants were required to restore the real-captured images from complex and unknown degradation, where gener… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  8. arXiv:2404.19201  [pdf, other

    eess.IV cs.CV cs.RO physics.optics

    Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

    Authors: Yao Gao, Qi Jiang, Shaohua Gao, Lei Sun, Kailun Yang, Kaiwei Wang

    Abstract: The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/wumengshenyou/GSO

  9. arXiv:2404.18687  [pdf, other

    cs.RO eess.SY

    Socially Adaptive Path Planning Based on Generative Adversarial Network

    Authors: Yao Wang, Yuqi Kong, Wenzheng Chi, Lining Sun

    Abstract: The natural interaction between robots and pedestrians in the process of autonomous navigation is crucial for the intelligent development of mobile robots, which requires robots to fully consider social rules and guarantee the psychological comfort of pedestrians. Among the research results in the field of robotic path planning, the learning-based socially adaptive algorithms have performed well i… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  10. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  11. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  12. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  13. arXiv:2404.01082  [pdf, other

    eess.IV

    The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

    Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Li** Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

    Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More

    Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 25 pages, 17 figures

  14. arXiv:2403.11155  [pdf, other

    eess.IV cs.MM

    Interactive $360^{\circ}$ Video Streaming Using FoV-Adaptive Coding with Temporal Prediction

    Authors: Yixiang Mao, Liyang Sun, Yong Liu, Yao Wang

    Abstract: For $360^{\circ}$ video streaming, FoV-adaptive coding that allocates more bits for the predicted user's field of view (FoV) is an effective way to maximize the rendered video quality under the limited bandwidth. We develop a low-latency FoV-adaptive coding and streaming system for interactive applications that is robust to bandwidth variations and FoV prediction errors. To minimize the end-to-end… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  15. arXiv:2403.10573  [pdf, other

    eess.IV cs.CR cs.CV cs.LG

    Medical Unlearnable Examples: Securing Medical Data from Unauthorized Traning via Sparsity-Aware Local Masking

    Authors: Weixiang Sun, Yixin Liu, Zhiling Yan, Kaidi Xu, Lichao Sun

    Abstract: With the rapid growth of artificial intelligence (AI) in healthcare, there has been a significant increase in the generation and storage of sensitive medical data. This abundance of data, in turn, has propelled the advancement of medical AI technologies. However, concerns about unauthorized data exploitation, such as training commercial AI models, often deter researchers from making their invaluab… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  16. arXiv:2403.10012  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation

    Authors: Qi Jiang, Zhonghua Yi, Shaohua Gao, Yao Gao, Xiaolong Qian, Hao Shi, Lei Sun, Zhijie Xu, Kailun Yang, Kaiwei Wang

    Abstract: Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervi… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Codes and datasets will be made publicly available at https://github.com/zju-jiangqi/QDMR

  17. arXiv:2403.06940  [pdf, other

    eess.IV cs.LG q-bio.QM

    Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction

    Authors: Qing Xiao, Siyeop Yoon, Hui Ren, Matthew Tivnan, Lichao Sun, Quanzheng Li, Tianming Liu, Yu Zhang, Xiang Li

    Abstract: Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffe… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  18. arXiv:2402.19013  [pdf, other

    eess.SY

    Ultraviolet Positioning via TDOA: Error Analysis and System Prototype

    Authors: Shihui Yu, Chubing Lv, Yueke Yang, Yuchen Pan, Lei Sun, Juliang Cao, Ruihang Yu, Chen Gong, Wenqi Wu, Zhengyuan Xu

    Abstract: This work performs the design, real-time hardware realization, and experimental evaluation of a positioning system by ultra-violet (UV) communication under photon-level signal detection. The positioning is based on time-difference of arrival (TDOA) principle. Time division-based transmission of synchronization sequence from three transmitters with known positions is applied. We investigate the pos… ▽ More

    Submitted 14 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  19. arXiv:2402.18007  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Mixer is more than just a model

    Authors: Qingfeng Ji, Yuxin Wang, Letong Sun

    Abstract: Recently, MLP structures have regained popularity, with MLP-Mixer standing out as a prominent example. In the field of computer vision, MLP-Mixer is noted for its ability to extract data information from both channel and token perspectives, effectively acting as a fusion of channel and token information. Indeed, Mixer represents a paradigm for information extraction that amalgamates channel and to… ▽ More

    Submitted 1 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  20. arXiv:2402.12665  [pdf, other

    eess.SY

    Antifragile Perimeter Control: Anticipating and Gaining from Disruptions with Reinforcement Learning

    Authors: Linghang Sun, Michail A. Makridis, Alexander Genser, Cristian Axenie, Margherita Grossi, Anastasios Kouvelas

    Abstract: The optimal operation of transportation networks is often susceptible to unexpected disruptions, such as traffic incidents and social events. Many established control strategies rely on mathematical models that struggle to cope with real-world uncertainties, leading to a significant decline in effectiveness when faced with substantial disruptions. While previous research works have dedicated effor… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 32 pages, 13 figures

  21. arXiv:2402.00924  [pdf, other

    eess.SY

    The Fragile Nature of Road Transportation Systems

    Authors: Linghang Sun, Yifan Zhang, Cristian Axenie, Margherita Grossi, Anastasios Kouvelas, Michail A. Makridis

    Abstract: Major cities worldwide experience problems with the performance of their road transportation systems. The continuous increase in traffic demand presents a substantial challenge to the optimal operation of urban road networks and the efficiency of traffic control strategies. Although robust and resilient transportation systems have been extensively researched over the past decades, their performanc… ▽ More

    Submitted 4 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 30 pages, 11 figures

  22. arXiv:2401.05698  [pdf, other

    cs.CV cs.HC cs.MM cs.SD eess.AS

    HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

    Authors: Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in recent years for its critical role in creating emotion-ware intelligent machines. Previous efforts in this area are dominated by the supervised learning paradigm. Despite significant progress, supervised learning is meeting its bottleneck due to the longstanding data scarcity issue in AVER. Motivated by recent advances in… ▽ More

    Submitted 1 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted by Information Fusion. The code is available at https://github.com/sunlicai/HiCMAE

    Journal ref: Information Fusion, 2024

  23. arXiv:2401.00877  [pdf, other

    eess.IV cs.CV

    Improving the Stability of Diffusion Models for Content Consistent Super-Resolution

    Authors: Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Hongwei Yong, Lei Zhang

    Abstract: The generative priors of pre-trained latent diffusion models have demonstrated great potential to enhance the perceptual quality of image super-resolution (SR) results. Unfortunately, the existing diffusion prior-based SR methods encounter a common problem, i.e., they tend to generate rather different outputs for the same low-resolution image with different noise samples. Such stochasticity is des… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

  24. arXiv:2312.15659  [pdf, other

    eess.IV

    Perceptual Quality Assessment for Video Frame Interpolation

    Authors: **liang Han, Xiongkuo Min, Yixuan Gao, Jun Jia, Lei Sun, Zuowei Cao, Yonglin Luo, Guangtao Zhai

    Abstract: The quality of frames is significant for both research and application of video frame interpolation (VFI). In recent VFI studies, the methods of full-reference image quality assessment have generally been used to evaluate the quality of VFI frames. However, high frame rate reference videos, necessities for the full-reference methods, are difficult to obtain in most applications of VFI. To evaluate… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 5 pages, 4 figures

    ACM Class: I.4.0

  25. arXiv:2312.15408  [pdf, other

    eess.IV cs.CV

    Perception-Distortion Balanced Super-Resolution: A Multi-Objective Optimization Perspective

    Authors: Lingchen Sun, Jie Liang, Shuaizheng Liu, Hongwei Yong, Lei Zhang

    Abstract: High perceptual quality and low distortion degree are two important goals in image restoration tasks such as super-resolution (SR). Most of the existing SR methods aim to achieve these goals by minimizing the corresponding yet conflicting losses, such as the $\ell_1$ loss and the adversarial loss. Unfortunately, the commonly used gradient-based optimizers, such as Adam, are hard to balance these o… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  26. arXiv:2312.05256  [pdf, other

    eess.IV cs.AI

    Holistic Evaluation of GPT-4V for Biomedical Imaging

    Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, **gyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

    Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  27. arXiv:2310.03559  [pdf, other

    eess.IV cs.CV

    MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images

    Authors: Yanwu Xu, Li Sun, Wei Peng, Shyam Visweswaran, Kayhan Batmanghelich

    Abstract: This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by pr… ▽ More

    Submitted 18 June, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  28. arXiv:2309.11500  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    A Large-scale Dataset for Audio-Language Representation Learning

    Authors: Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie

    Abstract: The AI community has made significant strides in develo** powerful foundation models, driven by large-scale multimodal datasets. However, in the audio representation learning community, the present audio-language datasets suffer from limitations such as insufficient volume, simplistic content, and arduous collection procedures. To tackle these challenges, we present an innovative and automatic a… ▽ More

    Submitted 3 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  29. arXiv:2309.10738  [pdf, other

    cs.SD cs.AI cs.CL cs.IR cs.MM eess.AS

    MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

    Authors: Xinda Wu, Zhijie Huang, Kejun Zhang, Jiaxing Yu, Xu Tan, Tieyao Zhang, Zihao Wang, Lingyun Sun

    Abstract: Pre-trained language models have achieved impressive results in various music understanding and generation tasks. However, existing pre-training methods for symbolic melody generation struggle to capture multi-scale, multi-dimensional structural information in note sequences, due to the domain knowledge discrepancy between text and music. Moreover, the lack of available large-scale symbolic melody… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  30. arXiv:2309.10263  [pdf, other

    cs.CR cs.IT eess.IV eess.SP

    Disentangled Information Bottleneck guided Privacy-Protective JSCC for Image Transmission

    Authors: Lunan Sun, Yang Yang, Mingzhe Chen, Caili Guo

    Abstract: Joint source and channel coding (JSCC) has attracted increasing attention due to its robustness and high efficiency. However, JSCC is vulnerable to privacy leakage due to the high relevance between the source image and channel input. In this paper, we propose a disentangled information bottleneck guided privacy-protective JSCC (DIB-PPJSCC) for image transmission, which aims at protecting private i… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  31. arXiv:2309.08188  [pdf, other

    cs.CR eess.SP

    Privacy-Aware Joint Source-Channel Coding for image transmission based on Disentangled Information Bottleneck

    Authors: Lunan Sun, Caili Guo, Mingzhe Chen, Yang Yang

    Abstract: Current privacy-aware joint source-channel coding (JSCC) works aim at avoiding private information transmission by adversarially training the JSCC encoder and decoder under specific signal-to-noise ratios (SNRs) of eavesdroppers. However, these approaches incur additional computational and storage requirements as multiple neural networks must be trained for various eavesdroppers' SNRs to determine… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  32. arXiv:2309.04946  [pdf, other

    cs.SD cs.CV cs.GR eess.AS

    Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

    Authors: Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang

    Abstract: Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications. However, the inflexibility and inefficiency of existing methods, which necessitate expensive end-to-end training to transfer emotions from guidance videos to talking-head predictions, are significant limitations. In this work, we propose the Emotional Adaptation for Audio-driven Talking-head (EA… ▽ More

    Submitted 12 October, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023. Project page: https://yuangan.github.io/eat/

  33. arXiv:2308.14638  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

    Authors: Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

    Abstract: This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy base… ▽ More

    Submitted 10 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by 2023 CHiME Workshop, Oral

  34. arXiv:2306.12992  [pdf, other

    cs.CV eess.IV physics.optics

    Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers

    Authors: Qi Jiang, Shaohua Gao, Yao Gao, Kailun Yang, Zhonghua Yi, Hao Shi, Lei Sun, Kaiwei Wang

    Abstract: High-quality panoramic images with a Field of View (FoV) of 360-degree are essential for contemporary panoramic computer vision tasks. However, conventional imaging systems come with sophisticated lens designs and heavy optical components. This disqualifies their usage in many mobile and wearable applications where thin and portable, minimalist imaging systems are desired. In this paper, we propos… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: The dataset and code will be available at https://github.com/zju-jiangqi/PCIE-PART

  35. arXiv:2306.01458  [pdf, ps, other

    cs.IT eess.SP eess.SY

    Extremely Large-scale Array Systems: Near-Field Codebook Design and Performance Analysis

    Authors: Feng Zheng, Hongkang Yu, Chenchen Wang, Luyang Sun, Qingqing Wu, Yijian Chen

    Abstract: Extremely Large-scale Array (ELAA) promises to deliver ultra-high data rates with increased antenna elements. However, increasing antenna elements leads to a wider realm of near-field, which challenges the traditional design of codebooks. In this paper, we propose novel near-field codebook schemes based on the fitting formula of codewords' quantization performance. First, we analyze the quantizati… ▽ More

    Submitted 24 August, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

  36. arXiv:2304.10691  [pdf, other

    eess.IV cs.CV cs.LG

    SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model

    Authors: Juexiao Zhou, Xiaonan He, Liyuan Sun, Jiannan Xu, Xiuying Chen, Yuetan Chu, Longxi Zhou, Xingyu Liao, Bin Zhang, Xin Gao

    Abstract: Skin and subcutaneous diseases rank high among the leading contributors to the global burden of nonfatal diseases, impacting a considerable portion of the population. Nonetheless, the field of dermatology diagnosis faces three significant hurdles. Firstly, there is a shortage of dermatologists accessible to diagnose patients, particularly in rural regions. Secondly, accurately interpreting skin di… ▽ More

    Submitted 8 June, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  37. arXiv:2303.03640  [pdf, other

    cs.LG cs.DC eess.SY

    AHPA: Adaptive Horizontal Pod Autoscaling Systems on Alibaba Cloud Container Service for Kubernetes

    Authors: Zhiqiang Zhou, Chaoli Zhang, Lingna Ma, **g Gu, Huajie Qian, Qingsong Wen, Liang Sun, Peng Li, Zhimin Tang

    Abstract: The existing resource allocation policy for application instances in Kubernetes cannot dynamically adjust according to the requirement of business, which would cause an enormous waste of resources during fluctuations. Moreover, the emergence of new cloud services puts higher resource management requirements. This paper discusses horizontal POD resources management in Alibaba Cloud Container Servic… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  38. arXiv:2303.03553  [pdf, other

    cs.LG eess.SP stat.AP

    Robust Dominant Periodicity Detection for Time Series with Missing Data

    Authors: Qingsong Wen, Linxiao Yang, Liang Sun

    Abstract: Periodicity detection is an important task in time series analysis, but still a challenging problem due to the diverse characteristics of time series data like abrupt trend change, outlier, noise, and especially block missing data. In this paper, we propose a robust and effective periodicity detection algorithm for time series with block missing data. We first design a robust trend filter to remov… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted by 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)

    Journal ref: IEEE ICASSP 2023

  39. arXiv:2303.02559  [pdf, other

    cs.LG cs.CR cs.CV eess.IV

    Securing Biomedical Images from Unauthorized Training with Anti-Learning Perturbation

    Authors: Yixin Liu, Haohui Ye, Kai Zhang, Lichao Sun

    Abstract: The volume of open-source biomedical data has been essential to the development of various spheres of the healthcare community since more `free' data can provide individual researchers more chances to contribute. However, institutions often hesitate to share their data with the public due to the risk of data exploitation by unauthorized third parties for another commercial usage (e.g., training AI… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

    Comments: This paper is accepted as a poster for NDSS 2023

  40. arXiv:2302.13755  [pdf, ps, other

    eess.SY cs.MA

    Neuroadaptive Distributed Event-triggered Control of Networked Uncertain Pure-feedback Systems with Polluted Feedback

    Authors: Libei Sun, Zhirong Zhang, Xinjian Huang, Xiucai Huang

    Abstract: This paper investigates the distributed event-triggered control problem for a class of uncertain pure-feedback nonlinear multi-agent systems (MASs) with polluted feedback. Under the setting of event-triggered control, substantial challenges exist in both control design and stability analysis for systems in more general non-affine pure-feedback forms wherein all state variables are not directly and… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  41. arXiv:2302.11981  [pdf, other

    cs.SD cs.AI eess.AS

    Unsupervised Noise adaptation using Data Simulation

    Authors: Chen Chen, Yuchen Hu, Heqing Zou, Linhui Sun, Eng Siong Chng

    Abstract: Deep neural network based speech enhancement approaches aim to learn a noisy-to-clean transformation using a supervised learning paradigm. However, such a trained-well transformation is vulnerable to unseen noises that are not included in training set. In this work, we focus on the unsupervised noise adaptation problem in speech enhancement, where the ground truth of target domain data is complete… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP2023

  42. arXiv:2301.04488  [pdf, other

    cs.SD cs.HC cs.LG eess.AS

    WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning

    Authors: Kejun Zhang, Xinda Wu, Tieyao Zhang, Zhijie Huang, Xu Tan, Qihao Liang, Songruoyao Wu, Lingyun Sun

    Abstract: Although deep learning has revolutionized music generation, existing methods for structured melody generation follow an end-to-end left-to-right note-by-note generative paradigm and treat each note equally. Here, we present WuYun, a knowledge-enhanced deep learning architecture for improving the structure of generated melodies, which first generates the most structurally important notes to constru… ▽ More

    Submitted 14 March, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  43. arXiv:2211.15482  [pdf, other

    cs.LG cs.AI eess.SP

    Discovering Dynamic Patterns from Spatiotemporal Data with Time-Varying Low-Rank Autoregression

    Authors: Xinyu Chen, Chengyuan Zhang, Xiaoxu Chen, Nicolas Saunier, Lijun Sun

    Abstract: The problem of broad practical interest in spatiotemporal data analysis, i.e., discovering interpretable dynamic patterns from spatiotemporal data, is studied in this paper. Towards this end, we develop a time-varying reduced-rank vector autoregression (VAR) model whose coefficient matrices are parameterized by low-rank tensor factorization. Benefiting from the tensor factorization structure, the… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  44. arXiv:2211.11257  [pdf, other

    cs.CV eess.IV physics.optics

    Computational Imaging for Machine Perception: Transferring Semantic Segmentation beyond Aberrations

    Authors: Qi Jiang, Hao Shi, Shaohua Gao, Jiaming Zhang, Kailun Yang, Lei Sun, Huajian Ni, Kaiwei Wang

    Abstract: Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile and wearable applications remains a challenge due to the corrupted imaging quality induced by optical aberrations. However, previous works only focus on improving the subjective imaging quality through the Computational Imaging (CI) technique, ignoring the feasibility of advancing semantic segmentation. In this paper, we… ▽ More

    Submitted 14 March, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE Transactions on Computational Imaging (TCI). The project page is at https://github.com/zju-jiangqi/CIADA

  45. arXiv:2211.09378  [pdf, other

    cs.RO eess.SY

    Outracing Human Racers with Model-based Planning and Control for Time-trial Racing

    Authors: Ce Hao, Chen Tang, Eric Bergkvist, Catherine Weaver, Liting Sun, Wei Zhan, Masayoshi Tomizuka

    Abstract: Autonomous racing has become a popular sub-topic of autonomous driving in recent years. The goal of autonomous racing research is to develop software to control the vehicle at its limit of handling and achieve human-level racing performance. In this work, we investigate how to approach human expert-level racing performance with model-based planning and control methods using the high-fidelity racin… ▽ More

    Submitted 25 October, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: 16 pages, 13 figures, 3 tables

  46. arXiv:2211.05910  [pdf, other

    eess.IV cs.CV

    Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

    Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

  47. arXiv:2210.08256  [pdf, other

    cs.RO eess.SY

    On Trustworthy Decision-Making Process of Human Drivers from the View of Perceptual Uncertainty Reduction

    Authors: Huanjie Wang, Haibin Liu, Wenshuo Wang, Lijun Sun

    Abstract: Humans are experts in making decisions for challenging driving tasks with uncertainties. Many efforts have been made to model the decision-making process of human drivers at the behavior level. However, limited studies explain how human drivers actively make reliable sequential decisions to complete interactive driving tasks in an uncertain environment. This paper argues that human drivers intentl… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: 12 pages, 12 figures

  48. arXiv:2210.05451  [pdf, other

    cs.CV eess.IV

    Enabling ISP-less Low-Power Computer Vision

    Authors: Gourav Datta, Zeyu Liu, Zihan Yin, Linyu Sun, Akhilesh R. Jaiswal, Peter A. Beerel

    Abstract: In order to deploy current computer vision (CV) models on resource-constrained low-power devices, recent works have proposed in-sensor and in-pixel computing approaches that try to partly/fully bypass the image signal processor (ISP) and yield significant bandwidth reduction between the image sensor and the CV processing unit by downsampling the activation maps in the initial convolutional neural… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to WACV 2023

  49. arXiv:2208.00356  [pdf, ps, other

    eess.SY

    Decentralized Intermittent Feedback Adaptive Control of Non-triangular Nonlinear Time-varying Systems

    Authors: Libei Sun, Xiucai Huang, Yongduan Song

    Abstract: This paper investigates the decentralized stabilization problem for a class of interconnected systems in the presence of non-triangular structural uncertainties and time-varying parameters, where each subsystem exchanges information only with its neighbors and only intermittent (rather than continuous) states and input are to be utilized. Thus far to our best knowledge, no solution exists priori t… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

  50. arXiv:2207.13879  [pdf, other

    eess.IV cs.CV cs.LG

    Real Image Restoration via Structure-preserving Complementarity Attention

    Authors: Yuanfan Zhang, Gen Li, Lei Sun

    Abstract: Since convolutional neural networks perform well in learning generalizable image priors from large-scale data, these models have been widely used in image denoising tasks. However, the computational complexity increases dramatically as well on complex model. In this paper, We propose a novel lightweight Complementary Attention Module, which includes a density module and a sparse module, which can… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.