Skip to main content

Showing 1–50 of 88 results for author: Sun, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.12646  [pdf, other

    eess.IV cs.AI cs.CV

    An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

    Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Ya**g Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

    Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to MICCAI-2024

  3. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  4. arXiv:2404.01082  [pdf, other

    eess.IV

    The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

    Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Li** Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

    Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More

    Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 25 pages, 17 figures

  5. Low-Complexity Estimation Algorithm and Decoupling Scheme for FRaC System

    Authors: Mengjiang Sun, Peng Chen, Zhenxin Cao, Fei Shen

    Abstract: With the lea** advances in autonomous vehicles and transportation infrastructure, dual function radar-communication (DFRC) systems have become attractive due to the size, cost and resource efficiency. A frequency modulated continuous waveform (FMCW)-based radar-communication system (FRaC) utilizing both sparse multiple-input and multiple-output (MIMO) arrays and index modulation (IM) has been pr… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Journal ref: {IEEE Transactions on Intelligent Vehicles, 2024

  6. arXiv:2403.16473  [pdf, other

    cs.CR eess.IV

    Plaintext-Free Deep Learning for Privacy-Preserving Medical Image Analysis via Frequency Information Embedding

    Authors: Mengyu Sun, Ziyuan Yang, Maosong Ran, Zhiwen Wang, Hui Yu, Yi Zhang

    Abstract: In the fast-evolving field of medical image analysis, Deep Learning (DL)-based methods have achieved tremendous success. However, these methods require plaintext data for training and inference stages, raising privacy concerns, especially in the sensitive area of medical data. To tackle these concerns, this paper proposes a novel framework that uses surrogate images for analysis, eliminating the n… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  7. arXiv:2403.14978  [pdf, other

    cs.IT eess.SP

    Range-Angle Estimation for FDA-MIMO System With Frequency Offset

    Authors: Mengjiang Sun, Peng Chen, Zhenxin Cao

    Abstract: Frequency diverse array multiple-input multiple-output (FDA-MIMO) radar differs from the traditional phased array (PA) radar, and can form range-angle-dependent beampattern and differentiate between closely spaced targets sharing the same angle but occupying distinct range cells. In the FDA-MIMO radar, target range estimation is achieved by employing a subtle frequency variation between adjacent a… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Journal ref: IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024

  8. arXiv:2403.14185  [pdf, other

    eess.SP

    A LiDAR-Aided Channel Model for Vehicular Intelligent Sensing-Communication Integration

    Authors: Ziwei Huang, Lu Bai, Mingran Sun, Xiang Cheng

    Abstract: In this paper, a novel channel modeling approach, named light detection and ranging (LiDAR)-aided geometry-based stochastic modeling (LA-GBSM), is developed. Based on the developed LA-GBSM approach, a new millimeter wave (mmWave) channel model for sixth-generation (6G) vehicular intelligent sensing-communication integration is proposed, which can support the design of intelligent transportation sy… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  9. arXiv:2403.10362  [pdf, other

    eess.IV cs.CV

    CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

    Authors: Qiang Zhu, **hua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu

    Abstract: Recently, numerous approaches have achieved notable success in compressed video quality enhancement (VQE). However, these methods usually ignore the utilization of valuable coding priors inherently embedded in compressed videos, such as motion vectors and residual frames, which carry abundant temporal and spatial information. To remedy this problem, we propose the Coding Priors-Guided Aggregation… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  10. arXiv:2402.19085  [pdf, other

    cs.CL cs.AI eess.SY

    Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

    Authors: Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Jiexin Wang, Huimin Chen, Bowen Sun, Ruobing Xie, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, exi… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  11. arXiv:2402.16581  [pdf, other

    eess.IV

    Rate Splitting Multiple Access-Enabled Adaptive Panoramic Video Semantic Transmission

    Authors: Haixiao Gao, Mengying Sun, Xiaodong Xu, Shujun Han, Bizhu Wang, **gxuan Zhang, ** Zhang

    Abstract: In this paper, we propose an adaptive panoramic video semantic transmission (APVST) framework enabled by rate splitting multiple access (RSMA). The APVST framework consists of a semantic transmitter and receiver, utilizing a deep joint source-channel coding structure to adaptively extract and encode semantic features from panoramic frames. To achieve higher spectral efficiency and conserve bandwid… ▽ More

    Submitted 23 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  12. arXiv:2402.07220  [pdf, other

    eess.IV cs.CV

    KVQ: Kwai Video Quality Assessment for Short-form Videos

    Authors: Yiting Lu, Xin Li, Ya**g Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun, Chao Zhou, Zhibo Chen

    Abstract: Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc. However, the advancing content-generation modes, e.g., special effects, and sophisticated processing workflows, e.g., de-artifacts, have introduced significant challenges to recent UGC video quality assessment: (i… ▽ More

    Submitted 20 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: 19 pages

  13. arXiv:2401.10411  [pdf, other

    eess.AS cs.SD

    AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

    Authors: Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide

    Abstract: Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart glasses that have microphone arrays, which fuses multi-channel ASR with serialized output training, for wearer/conversation-partner disambiguation as well as s… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  14. arXiv:2401.04283  [pdf, ps, other

    eess.AS cs.SD

    FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

    Authors: Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei

    Abstract: Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted. In this paper, we propose DI-AEC, pioneering a diffusion-based stochastic regeneration approach dedicated to AEC. Further, we propose FADI-AEC, fast score-based diffusion AEC framework to save computational demands, making it favorable for edge devices. It stan… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  15. arXiv:2401.03680  [pdf

    eess.SY

    Decision-Oriented Learning for Future Power System Decision-Making under Uncertainty

    Authors: Ran Li, Haipeng Zhang, Mingyang Sun, Fei Teng, Can Wan, Salvador Pineda, Georges Kariniotakis

    Abstract: Better forecasts may not lead to better decision-making. To address this challenge, decision-oriented learning (DOL) has been proposed as a new branch of machine learning that replaces traditional statistical loss with a decision loss to form an end-to-end model. Applications of DOL in power systems have been developed in recent years. For renewable-rich power systems, uncertainties propagate thro… ▽ More

    Submitted 7 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  16. arXiv:2312.13501  [pdf

    eess.SY

    Adaptive Decision-Objective Loss for Forecast-then-Optimize in Power Systems

    Authors: Haipeng Zhang, Ran Li, Mingyang Sun, Teng Fei

    Abstract: Forecast-then-optimize is a widely-used framework for decision-making problems in power systems. Traditionally, statistical losses have been employed to train forecasting models, but recent research demonstrated that improved decision utility in downstream optimization tasks can be achieved by using decision loss as an alternative. However, the implementation of decision loss in power systems face… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  17. arXiv:2310.15930  [pdf, other

    cs.SD eess.AS

    CDSD: Chinese Dysarthria Speech Database

    Authors: Mengyi Sun, Ming Gao, Xinchen Kang, Shiru Wang, Jun Du, Dengfeng Yao, Su-**g Wang

    Abstract: We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text poo… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 9 pages, 3 figures

  18. arXiv:2309.10993  [pdf, other

    cs.SD cs.HC eess.AS

    Directional Source Separation for Robust Speech Recognition on Smart Glasses

    Authors: Tiantian Feng, Ju Lin, Yiteng Huang, Weipeng He, Kaustubh Kalgaonkar, Niko Moritz, Li Wan, Xin Lei, Ming Sun, Frank Seide

    Abstract: Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encounter challenges related to environmental noises, resulting in degradation to speech recognition and speaker change detection. To improve voice quality,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  19. arXiv:2308.15736  [pdf, ps, other

    cs.CR eess.SY

    Vulnerability of Machine Learning Approaches Applied in IoT-based Smart Grid: A Review

    Authors: Zhenyong Zhang, Mengxiang Liu, Mingyang Sun, Ruilong Deng, Peng Cheng, Dusit Niyato, Mo-Yuen Chow, Jiming Chen

    Abstract: Machine learning (ML) sees an increasing prevalence of being used in the internet-of-things (IoT)-based smart grid. However, the trustworthiness of ML is a severe issue that must be addressed to accommodate the trend of ML-based smart grid applications (MLsgAPPs). The adversarial distortion injected into the power signal will greatly affect the system's normal control and operation. Therefore, it… ▽ More

    Submitted 24 December, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

  20. arXiv:2308.05862  [pdf, other

    eess.IV cs.AI cs.CV

    Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Shihao Ma, Adamo Young, Cheng Zhu, Kangkang Meng, Xin Yang, Ziyan Huang, Fan Zhang, Wentao Liu, YuanKe Pan, Shou** Huang, Jiacheng Wang, Mingze Sun, Weixin Xu, Dengqiang Jia, Jae Won Choi, Natália Alves, Bram de Wilde, Gregor Koehler, Yajun Wu, Manuel Wiesenfarth, Qiongjie Zhu , et al. (4 additional authors not shown)

    Abstract: Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations,… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: MICCAI FLARE22: https://flare22.grand-challenge.org/

  21. arXiv:2307.09729  [pdf, other

    cs.CV cs.MM eess.IV

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  22. arXiv:2307.08544  [pdf, other

    eess.IV cs.CV

    Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution

    Authors: Guandu Liu, Yukang Ding, Mading Li, Ming Sun, Xing Wen, Bin Wang

    Abstract: Look-up table(LUT)-based methods have shown the great efficacy in single image super-resolution (SR) task. However, previous methods ignore the essential reason of restricted receptive field (RF) size in LUT, which is caused by the interaction of space and channel features in vanilla convolution. They can only increase the RF at the cost of linearly increasing LUT size. To enlarge RF with containe… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  23. arXiv:2306.14125  [pdf, other

    eess.SP

    M$^3$SC: A Generic Dataset for Mixed Multi-Modal (MMM) Sensing and Communication Integration

    Authors: Xiang Cheng, Ziwei Huang, Lu Bai, Haotian Zhang, Mingran Sun, Boxun Liu, Sijiang Li, Jianan Zhang, Minson Lee

    Abstract: The sixth generation (6G) of mobile communication system is witnessing a new paradigm shift, i.e., integrated sensing-communication system. A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research. This paper develops a novel simulation dataset, named M3SC, for mixed multi-modal (MMM) sensing-communication integration, and the generation framework of the M3SC data… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: 12 pages, 12 figures

  24. arXiv:2306.06646  [pdf, ps, other

    eess.SY

    Fractional Barrier Lyapunov Functions with Application to Learning Control

    Authors: Mingxuan Sun

    Abstract: Barrier Lyapunov functions are suitable for learning control designs, due to their feature of finite duration tracking. This paper presents fractional barrier Lyapunov functions, provided and compared with the conventional ones in the error-constraint learning control designs. Two error models are adopted and the desired compensation control approach is applied for a non-parametric design, allowin… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  25. arXiv:2305.18500  [pdf, other

    cs.CV cs.AI cs.CL cs.LG eess.AS

    VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

    Authors: Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, **g Liu

    Abstract: Vision and text have been fully explored in contemporary video-text foundational models, while other modalities such as audio and subtitles in videos have not received sufficient attention. In this paper, we resort to establish connections between multi-modality video tracks, including Vision, Audio, and Subtitle, and Text by exploring an automatically generated large-scale omni-modality video cap… ▽ More

    Submitted 7 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted by NeurIPS 2023

  26. Enhancing Cyber-Resiliency of DER-based SmartGrid: A Survey

    Authors: Mengxiang Liu, Fei Teng, Zhenyong Zhang, Pudong Ge, Ruilong Deng, Mingyang Sun, Peng Cheng, Jiming Chen

    Abstract: The rapid development of information and communications technology has enabled the use of digital-controlled and software-driven distributed energy resources (DERs) to improve the flexibility and efficiency of power supply, and support grid operations. However, this evolution also exposes geographically-dispersed DERs to cyber threats, including hardware and software vulnerabilities, communication… ▽ More

    Submitted 5 March, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted by IEEE Transactions on Smart Grid

  27. arXiv:2304.11029  [pdf, other

    cs.SD cs.IR eess.AS

    CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

    Authors: Shangda Wu, Dingyao Yu, Xu Tan, Maosong Sun

    Abstract: We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal representations between natural language and symbolic music using a music encoder and a text encoder trained jointly with a contrastive loss. To pre-train CLaMP, we collected a large dataset of 1.4 million music-text pairs. It employed text dropout as a data augmentation technique and bar patching to efficiently… ▽ More

    Submitted 18 October, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: 11 pages, 5 figures, 5 tables, accepted by ISMIR 2023

  28. arXiv:2303.11785  [pdf

    eess.SY

    Risk-Aware Objective-Based Forecasting in Inertia Management

    Authors: Haipeng Zhang, Ran Li, Yan Chen, Zhongda Chu, Mingyang Sun, Fei Teng

    Abstract: The objective-based forecasting considers the asymmetric and non-linear impacts of forecasting errors on decision objectives, thus improving the effectiveness of its downstream decision-making process. However, existing objective-based forecasting methods are risk-neutral and not suitable for tasks like power system inertia management and unit commitment, of which decision-makers are usually biase… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

  29. arXiv:2302.08950  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

    Authors: Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun

    Abstract: Wake word detection exists in most intelligent homes and portable devices. It offers these devices the ability to "wake up" when summoned at a low cost of power and computing. This paper focuses on understanding alignment's role in develo** a wake-word system that answers a generic phrase. We discuss three approaches. The first is alignment-based, where the model is trained with frame-wise cross… ▽ More

    Submitted 7 June, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted to Interspeech 2023

  30. arXiv:2301.02884  [pdf, other

    cs.SD eess.AS

    TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching

    Authors: Shangda Wu, Xiaobing Li, Feng Yu, Maosong Sun

    Abstract: This paper introduces TunesFormer, an efficient Transformer-based dual-decoder model specifically designed for the generation of melodies that adhere to user-defined musical forms. Trained on 214,122 Irish tunes, TunesFormer utilizes techniques including bar patching and control codes. Bar patching reduces sequence length and generation time, while control codes guide TunesFormer in producing melo… ▽ More

    Submitted 12 December, 2023; v1 submitted 7 January, 2023; originally announced January 2023.

    Comments: 6 pages, 1 figure, 1 table, accepted by HCMIR 2023

  31. Semiconductor Defect Pattern Classification by Self-Proliferation-and-Attention Neural Network

    Authors: YuanFu Yang, Min Sun

    Abstract: Semiconductor manufacturing is on the cusp of a revolution: the Internet of Things (IoT). With IoT we can connect all the equipment and feed information back to the factory so that quality issues can be detected. In this situation, more and more edge devices are used in wafer inspection equipment. This edge device must have the ability to quickly detect defects. Therefore, how to develop a high-ef… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  32. arXiv:2211.14162  [pdf, other

    eess.SP cs.RO

    A Gaussian Process Regression based Dynamical Models Learning Algorithm for Target Tracking

    Authors: Mengwei Sun, Mike E. Davies, Ian K. Proudler, James R. Hopgood

    Abstract: Maneuvering target tracking is a challenging problem for sensor systems because of the unpredictability of the targets' motions. This paper proposes a novel data-driven method for learning the dynamical motion model of a target. Non-parametric Gaussian process regression (GPR) is used to learn a target's naturally shift invariant motion (NSIM) behavior, which is translationally invariant and does… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: 11 pages, 10 figures

  33. arXiv:2211.13128  [pdf, other

    eess.SP cs.LG

    A Closed-loop Sleep Modulation System with FPGA-Accelerated Deep Learning

    Authors: Mingzhe Sun, Aaron Zhou, Naize Yang, Yaqian Xu, Yuhan Hou, Xilin Liu

    Abstract: Closed-loop sleep modulation is an emerging research paradigm to treat sleep disorders and enhance sleep benefits. However, two major barriers hinder the widespread application of this research paradigm. First, subjects often need to be wire-connected to rack-mount instrumentation for data acquisition, which negatively affects sleep quality. Second, conventional real-time sleep stage classificatio… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

  34. arXiv:2211.11216  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task

    Authors: Shangda Wu, Maosong Sun

    Abstract: Benefiting from large-scale datasets and pre-trained models, the field of generative models has recently gained significant momentum. However, most datasets for symbolic music are very small, which potentially limits the performance of data-driven multimodal models. An intuitive solution to this problem is to leverage pre-trained models from other modalities (e.g., natural language) to improve the… ▽ More

    Submitted 3 January, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted by the Creative AI Across Modalities workshop at AAAI 2023

  35. arXiv:2211.05910  [pdf, other

    eess.IV cs.CV

    Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

    Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

  36. arXiv:2211.05256  [pdf, other

    eess.IV cs.CV

    Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

    Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

  37. arXiv:2211.04635  [pdf, other

    cs.LG cs.AI eess.AS

    LiCo-Net: Linearized Convolution Network for Hardware-efficient Keyword Spotting

    Authors: Haichuan Yang, Zhaojun Yang, Li Wan, Biqiao Zhang, Yangyang Shi, Yiteng Huang, Ivaylo Enchev, Limin Tang, Raziel Alvarez, Ming Sun, Xin Lei, Raghuraman Krishnamoorthi, Vikas Chandra

    Abstract: This paper proposes a hardware-efficient architecture, Linearized Convolution Network (LiCo-Net) for keyword spotting. It is optimized specifically for low-power processor units like microcontrollers. ML operators exhibit heterogeneous efficiency profiles on power-efficient hardware. Given the exact theoretical computation cost, int8 operators are more computation-effective than float operators, a… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

  38. arXiv:2211.02297  [pdf, other

    eess.IV

    SDRTV-to-HDRTV Conversion via Spatial-Temporal Feature Fusion

    Authors: Kepeng Xu, Li Xu, Gang He, Chang Wu, Zijia Ma, Ming Sun, Yu-Wing Tai

    Abstract: HDR(High Dynamic Range) video can reproduce realistic scenes more realistically, with a wider gamut and broader brightness range. HDR video resources are still scarce, and most videos are still stored in SDR (Standard Dynamic Range) format. Therefore, SDRTV-to-HDRTV Conversion (SDR video to HDR video) can significantly enhance the user's video viewing experience. Since the correlation between adja… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 8 pages

  39. arXiv:2211.00899  [pdf, other

    eess.IV cs.CV

    LightVessel: Exploring Lightweight Coronary Artery Vessel Segmentation via Similarity Knowledge Distillation

    Authors: Hao Dang, Yuekai Zhang, Xingqun Qi, Wanting Zhou, Muyi Sun

    Abstract: In recent years, deep convolution neural networks (DCNNs) have achieved great prospects in coronary artery vessel segmentation. However, it is difficult to deploy complicated models in clinical scenarios since high-performance approaches have excessive parameters and high computation costs. To tackle this problem, we propose \textbf{LightVessel}, a Similarity Knowledge Distillation Framework, for… ▽ More

    Submitted 25 February, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: 5 pages, 7 figures, conference

  40. arXiv:2210.14434  [pdf, other

    eess.SY

    A formal process of hierarchical functional requirements development for Set-Based Design

    Authors: Minghui Sun, Zhaoyang Chen, Georgios Bakirtzis, Hassan Jafarzadeh, Cody Fleming

    Abstract: The design of complex systems is typically uncertain and ambiguous at early stages. Set-Based Design is a promising approach to complex systems design as it supports alternative exploration and gradual uncertainty reduction. When designing a complex system, functional requirements decomposition is a common and effective approach to progress the design incrementally. However, the current literature… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  41. arXiv:2209.06496  [pdf

    cs.MM cs.SD eess.AS

    CCOM-HuQin: an Annotated Multimodal Chinese Fiddle Performance Dataset

    Authors: Yu Zhang, Ziya Zhou, Xiaobing Li, Feng Yu, Maosong Sun

    Abstract: HuQin is a family of traditional Chinese bowed string instruments. Playing techniques(PTs) embodied in various playing styles add abundant emotional coloring and aesthetic feelings to HuQin performance. The complex applied techniques make HuQin music a challenging source for fundamental MIR tasks such as pitch analysis, transcription and score-audio alignment. In this paper, we present a multimoda… ▽ More

    Submitted 9 October, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: 15 pages, 11 figures

    Journal ref: Transactions of the International Society for Music Information Retrieval, 2023, 6(1), 60-74

  42. arXiv:2208.06885  [pdf, other

    cs.CV eess.IV

    Global Priors Guided Modulation Network for Joint Super-Resolution and Inverse Tone-Map**

    Authors: Gang He, Shaoyi Long, Li Xu, Chang Wu, **jia Zhou, Ming Sun, Xing Wen, Yurong Dai

    Abstract: Joint super-resolution and inverse tone-map** (SR-ITM) aims to enhance the visual quality of videos that have quality deficiencies in resolution and dynamic range. This problem arises when using 4K high dynamic range (HDR) TVs to watch a low-resolution standard dynamic range (LR SDR) video. Previous methods that rely on learning local information typically cannot do well in preserving color conf… ▽ More

    Submitted 10 November, 2022; v1 submitted 14 August, 2022; originally announced August 2022.

    Comments: 10 pages, 7 figures

  43. arXiv:2208.05163  [pdf, other

    cs.CV cs.LG eess.IV

    Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

    Authors: Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

    Abstract: Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, thi… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: Published in FPL2022

  44. arXiv:2206.13476  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework

    Authors: Rahil Parikh, Harshavardhan Sundar, Ming Sun, Chao Wang, Spyros Matsoukas

    Abstract: Acoustic events are sounds with well-defined spectro-temporal characteristics which can be associated with the physical objects generating them. Acoustic scenes are collections of such acoustic events in no specific temporal order. Given this natural linkage between events and scenes, a common belief is that the ability to classify events must help in the classification of scenes. This has led to… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted at ISCA Interspeech 2022

  45. arXiv:2205.12633  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

    Authors: Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, ** Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang , et al. (68 additional authors not shown)

    Abstract: This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR Workshops 2022. 15 pages, 21 figures, 2 tables

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022

  46. arXiv:2205.05448  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Symphony Generation with Permutation Invariant Language Model

    Authors: Jiafeng Liu, Yuanliang Dong, Zehua Cheng, Xinran Zhang, Xiaobing Li, Feng Yu, Maosong Sun

    Abstract: In this work, we propose a permutation invariant language model, SymphonyNet, as a solution for symbolic symphony music generation. We propose a novel Multi-track Multi-instrument Repeatable (MMR) representation for symphonic music and model the music sequence using a Transformer-based auto-regressive language model with specific 3-D positional embedding. To overcome length overflow when modeling… ▽ More

    Submitted 16 September, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

    Journal ref: International Society for Music Information Retrieval (ISMIR) 2022

  47. arXiv:2205.00152  [pdf, other

    eess.SY

    A new safety-guided design methodology to complement model-based safety analysis for safety assurance

    Authors: Minghui Sun, Cody H. Fleming

    Abstract: With the rapid advancement of Formal Methods, Model-based Safety Analysis (MBSA) has been gaining tremendous attention for its ability to rigorously verify whether the safety-critical scenarios are adequately addressed by the design solution of a cyber-physical human system. However, there is a gap. If specific safety-critical scenarios are not included in the given design solution (i.e., the mode… ▽ More

    Submitted 29 April, 2022; originally announced May 2022.

  48. arXiv:2204.10197  [pdf, other

    cs.NI eess.SY

    Flexible and dependable manufacturing beyond xURLLC: A novel framework for communication-control co-design

    Authors: Bin Han, Mu-Xia Sun, Lai-Kan Muk, Yan-Fu Li, Hans D. Schotten

    Abstract: Future Industrial 4.0 applications in the 6G era is calling for high dependability that goes far beyond the current ultra-reliable low latency communication (URLLC), and therewith proposed critical challenges to the communication technology. Instead of struggling against the physical and technical limits towards an extreme URLLC (xURLLC), communication-control co-design (CoCoCo) appears a more pro… ▽ More

    Submitted 5 December, 2022; v1 submitted 18 March, 2022; originally announced April 2022.

    Comments: To appear in the 22nd IEEE International Conference on Software Quality, Reliability, and Security (QRS 2022) Workshops

  49. arXiv:2203.11997  [pdf, other

    cs.SD cs.LG eess.AS

    Federated Self-Supervised Learning for Acoustic Event Classification

    Authors: Meng Feng, Chieh-Chi Kao, Qingming Tang, Ming Sun, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization. Federated learning (FL) is a compelling framework that decouples data collection and model training to enhance customer privacy. In this work, we investigate the feasibility of applying FL to improve AEC performance while no customer data can be directly uploade… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  50. Adaptive Kernel Kalman Filter

    Authors: Mengwei Sun, Mike E. Davies, Ian K. Proudler, James R. Hopgood

    Abstract: Sequential Bayesian filters in non-linear dynamic systems require the recursive estimation of the predictive and posterior distributions. This paper introduces a Bayesian filter called the adaptive kernel Kalman filter (AKKF). With this filter, the arbitrary predictive and posterior distributions of hidden states are approximated using the empirical kernel mean embeddings (KMEs) in reproducing ker… ▽ More

    Submitted 27 February, 2023; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: The manuscript has been accepted for publication as a regular paper in the IEEE Transactions on Signal Processing