Skip to main content

Showing 1–50 of 131 results for author: Fu, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18201  [pdf, other

    eess.IV cs.CV

    EFCNet: Every Feature Counts for Small Medical Object Segmentation

    Authors: Lingjie Kong, Qiaoling Wei, Chengming Xu, Han Chen, Yanwei Fu

    Abstract: This paper explores the segmentation of very small medical objects with significant clinical value. While Convolutional Neural Networks (CNNs), particularly UNet-like models, and recent Transformers have shown substantial progress in image segmentation, our empirical findings reveal their poor performance in segmenting the small medical objects and lesions concerned in this paper. This limitation… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation

    Authors: **gyuan Xia, Zhixiong Yang, Shengxi Li, Shuanghui Zhang, Yaowen Fu, Deniz Gündüz, Xiang Li

    Abstract: Learning-based approaches have witnessed great successes in blind single image super-resolution (SISR) tasks, however, handcrafted kernel priors and learning based kernel priors are typically required. In this paper, we propose a Meta-learning and Markov Chain Monte Carlo (MCMC) based SISR approach to learn kernel priors from organized randomness. In concrete, a lightweight network is adopted as k… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

  3. arXiv:2405.01007  [pdf, other

    cs.NI eess.SY

    Multi-User Multi-Application Packet Scheduling for Application-Specific QoE Enhancement Based on Knowledge-Embedded DDPG in 6G RAN

    Authors: Yongqin Fu, Xianbin Wang

    Abstract: The rapidly growing diversity of concurrent applications from both different users and same devices calls for application-specific Quality of Experience (QoE) enhancement of future wireless communications. Achieving this goal relies on application-specific packet scheduling, as it is vital for achieving tailored QoE enhancement by realizing the application-specific Quality of Service (QoS) require… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  4. arXiv:2404.15620  [pdf, other

    eess.IV

    A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution

    Authors: Zhixiong Yang, **gyuan Xia, Shengxi Li, Xinghua Huang, Shuanghui Zhang, Zhen Liu, Yaowen Fu, Yongxiang Liu

    Abstract: Deep learning-based methods have achieved significant successes on solving the blind super-resolution (BSR) problem. However, most of them request supervised pre-training on labelled datasets. This paper proposes an unsupervised kernel estimation model, named dynamic kernel prior (DKP), to realize an unsupervised and pre-training-free learning-based algorithm for solving the BSR problem. DKP can a… ▽ More

    Submitted 25 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted for publication in CVPR 2024

  5. arXiv:2404.09557  [pdf

    cs.RO cs.AI cs.DC cs.MA eess.SY

    Characterization and Mitigation of Insufficiencies in Automated Driving Systems

    Authors: Yuting Fu, Jochen Seemann, Caspar Hanselaar, Tim Beurskens, Andrei Terechko, Emilia Silvas, Maurice Heemels

    Abstract: Automated Driving (AD) systems have the potential to increase safety, comfort and energy efficiency. Recently, major automotive companies have started testing and validating AD systems (ADS) on public roads. Nevertheless, the commercial deployment and wide adoption of ADS have been moderate, partially due to system functional insufficiencies (FI) that undermine passenger safety and lead to hazardo… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Published at the 27th International Technical Conference on the Enhanced Safety of Vehicles (ESV), Apr 2023, Yokohama, Japan. Original publication https://www-esv.nhtsa.dot.gov/Proceedings/27/27ESV-000110.pdf

    Report number: 27ESV-000110

  6. arXiv:2404.07556  [pdf, other

    eess.IV cs.CV

    Attention-Aware Laparoscopic Image Desmoking Network with Lightness Embedding and Hybrid Guided Embedding

    Authors: Ziteng Liu, Jiahua Zhu, Bainan Liu, Hao Liu, Wenpeng Gao, Yili Fu

    Abstract: This paper presents a novel method of smoke removal from the laparoscopic images. Due to the heterogeneous nature of surgical smoke, a two-stage network is proposed to estimate the smoke distribution and reconstruct a clear, smoke-free surgical scene. The utilization of the lightness channel plays a pivotal role in providing vital information pertaining to smoke density. The reconstruction of smok… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: ISBI2024

  7. arXiv:2403.19944  [pdf, other

    cs.CV eess.IV

    Binarized Low-light Raw Video Enhancement

    Authors: Gengchen Zhang, Yulun Zhang, Xin Yuan, Ying Fu

    Abstract: Recently, deep neural networks have achieved excellent performance on low-light raw video enhancement. However, they often come with high computational complexity and large memory costs, which hinder their applications on resource-limited devices. In this paper, we explore the feasibility of applying the extremely compact binary neural network (BNN) to low-light raw video enhancement. Nevertheless… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  8. arXiv:2403.18826  [pdf

    q-bio.QM eess.IV eess.SY

    SAM-dPCR: Real-Time and High-throughput Absolute Quantification of Biological Samples Using Zero-Shot Segment Anything Model

    Authors: Yuanyuan Wei, Shanhang Luo, Changran Xu, Yingqi Fu, Qingyue Dong, Yi Zhang, Fuyang Qu, Guangyao Cheng, Yi-** Ho, Ho-Pui Ho, Wu Yuan

    Abstract: Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM… ▽ More

    Submitted 22 January, 2024; originally announced March 2024.

    Comments: 23 pages, 6 figures

  9. arXiv:2403.17770  [pdf, other

    eess.IV cs.CV

    CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

    Authors: Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

    Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  10. arXiv:2403.14250  [pdf, other

    eess.IV cs.CR cs.CV

    Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations

    Authors: Xun Lin, Yi Yu, Song Xia, Jue Jiang, Haoran Wang, Zitong Yu, Yizhong Liu, Ying Fu, Shuai Wang, Wenzhong Tang, Alex Kot

    Abstract: The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segme… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  11. arXiv:2403.06532  [pdf, other

    eess.IV cs.CV q-bio.NC

    Reconstructing Visual Stimulus Images from EEG Signals Based on Deep Visual Representation Model

    Authors: Hongguang Pan, Zhuoyi Li, Yunpeng Fu, Xuebin Qin, Jianchen Hu

    Abstract: Reconstructing visual stimulus images is a significant task in neural decoding, and up to now, most studies consider the functional magnetic resonance imaging (fMRI) as the signal source. However, the fMRI-based image reconstruction methods are difficult to widely applied because of the complexity and high cost of the acquisition equipments. Considering the advantages of low cost and easy portabil… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  12. arXiv:2402.10728  [pdf, other

    eess.IV cs.CV

    Semi-weakly-supervised neural network training for medical image registration

    Authors: Yiwen Li, Yunguan Fu, Iani J. M. B. Gayo, Qianye Yang, Zhe Min, Shaheer U. Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton, Matthew J. Clarkson, Dean C. Barratt, Victor A. Prisacariu, Yipeng Hu

    Abstract: For training registration networks, weak supervision from segmented corresponding regions-of-interest (ROIs) have been proven effective for (a) supplementing unsupervised methods, and (b) being used independently in registration tasks in which unsupervised losses are unavailable or ineffective. This correspondence-informing supervision entails cost in annotation that requires significant specialis… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  13. arXiv:2401.02099  [pdf

    cs.CV cs.SD eess.AS

    Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition

    Authors: Zeyu Li, Suncheng Xiang, Tong Yu, **gsheng Gao, Jiacheng Ruan, Yan** Hu, Ting Liu, Yuzhuo Fu

    Abstract: The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audi… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted by ICIC 2024

  14. arXiv:2311.14280  [pdf, other

    eess.IV cs.CV

    Latent Diffusion Prior Enhanced Deep Unfolding for Spectral Image Reconstruction

    Authors: Zongliang Wu, Ruiying Lu, Ying Fu, Xin Yuan

    Abstract: Snapshot compressive spectral imaging reconstruction aims to reconstruct three-dimensional spatial-spectral images from a single-shot two-dimensional compressed measurement. Existing state-of-the-art methods are mostly based on deep unfolding structures but have intrinsic performance bottlenecks: $i$) the ill-posed problem of dealing with heavily degraded measurement, and $ii$) the regression loss… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  15. arXiv:2310.18780  [pdf, other

    cs.LG cs.AI eess.SP

    Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

    Authors: Stefano Massaroli, Michael Poli, Daniel Y. Fu, Hermann Kumbong, Rom N. Parnichkun, Aman Timalsina, David W. Romero, Quinn McIntyre, Beidi Chen, Atri Rudra, Ce Zhang, Christopher Re, Stefano Ermon, Yoshua Bengio

    Abstract: Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input se… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  16. arXiv:2310.10964  [pdf, other

    cs.IT eess.SP

    Spectral-Efficiency and Energy-Efficiency of Variable-Length XP-HARQ

    Authors: Jiahui Feng, Zheng Shi, Yaru Fu, Hong Wang, Guanghua Yang, Shaodan Ma

    Abstract: A variable-length cross-packet hybrid automatic repeat request (VL-XP-HARQ) is proposed to boost the spectral efficiency (SE) and the energy efficiency (EE) of communications. The SE is firstly derived in terms of the outage probabilities, with which the SE is proved to be upper bounded by the ergodic capacity (EC). Moreover, to facilitate the maximization of the SE, the asymptotic outage probabil… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  17. arXiv:2310.03018  [pdf, other

    eess.AS cs.CL cs.SD

    Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

    Authors: Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-yi Lee

    Abstract: We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech enco… ▽ More

    Submitted 18 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2024 (v2)

  18. arXiv:2309.02020  [pdf, other

    eess.IV cs.CV

    RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image

    Authors: Yunhao Zou, Chenggang Yan, Ying Fu

    Abstract: High dynamic range (HDR) images capture much more intensity levels than standard ones. Current methods predominantly generate HDR images from 8-bit low dynamic range (LDR) sRGB images that have been degraded by the camera processing pipeline. However, it becomes a formidable task to retrieve extremely high dynamic range scenes from such limited bit-depth data. Unlike existing methods, the core ide… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  19. A Recycling Training Strategy for Medical Image Segmentation with Diffusion Denoising Models

    Authors: Yunguan Fu, Yiwen Li, Shaheer U Saeed, Matthew J Clarkson, Yipeng Hu

    Abstract: Denoising diffusion models have found applications in image segmentation by generating segmented masks conditioned on images. Existing studies predominantly focus on adjusting model architecture or improving inference, such as test-time sampling strategies. In this work, we focus on improving the training strategy and propose a novel recycling method. During each training step, a segmentation mask… ▽ More

    Submitted 8 December, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2023:016

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2023)

  20. arXiv:2308.10820  [pdf, other

    cs.CV eess.IV

    Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction

    Authors: Miaoyu Li, Ying Fu, Ji Liu, Yulun Zhang

    Abstract: Hyperspectral Image (HSI) reconstruction has made gratifying progress with the deep unfolding framework by formulating the problem into a data module and a prior module. Nevertheless, existing methods still face the problem of insufficient matching with HSI data. The issues lie in three aspects: 1) fixed gradient descent step in the data module while the degradation of HSI is agnostic in the pixel… ▽ More

    Submitted 22 September, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  21. arXiv:2308.06285  [pdf, other

    cs.HC eess.IV

    An Integrated Visual Analytics System for Studying Clinical Carotid Artery Plaques

    Authors: Chaoqing Xu, Zhentao Zheng, Yiting Fu, Baofeng Chang, Legao Chen, Minghui Wu, Mingli Song, **song Jiang

    Abstract: Carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery p… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  22. arXiv:2308.02131  [pdf, other

    cs.IT eess.SP

    Graph Convolutional Network Enabled Power-Constrained HARQ Strategy for URLLC

    Authors: Yi Chen, Zheng Shi, Hong Wang, Yaru Fu, Guanghua Yang, Shaodan Ma, Haichuan Ding

    Abstract: In this paper, a power-constrained hybrid automatic repeat request (HARQ) transmission strategy is developed to support ultra-reliable low-latency communications (URLLC). In particular, we aim to minimize the delivery latency of HARQ schemes over time-correlated fading channels, meanwhile ensuring the high reliability and limited power consumption. To ease the optimization, the simple asymptotic o… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  23. arXiv:2307.08473  [pdf, other

    eess.IV cs.CV

    EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

    Authors: Jiacheng Ruan, Mingye Xie, **gsheng Gao, Ting Liu, Yuzhuo Fu

    Abstract: Transformer and its variants have been widely used for medical image segmentation. However, the large number of parameter and computational load of these models make them unsuitable for mobile health applications. To address this issue, we propose a more efficient approach, the Efficient Group Enhanced UNet (EGE-UNet). We incorporate a Group multi-axis Hadamard Product Attention module (GHPA) and… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 10 pages, 4 figures, 2 tables. This paper has been early accepted by MICCAI 2023 and has received the MICCAI Student-Author Registration (STAR) Award

  24. arXiv:2306.15686  [pdf, other

    eess.AS cs.CL

    Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning

    Authors: Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Yonggan Fu, Yingyan Lin

    Abstract: Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  25. arXiv:2305.18771  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    SFCNeXt: a simple fully convolutional network for effective brain age estimation with small sample size

    Authors: Yu Fu, Yanyan Huang, Shunjie Dong, Yalin Wang, Tianbai Yu, Meng Niu, Cheng Zhuo

    Abstract: Deep neural networks (DNN) have been designed to predict the chronological age of a healthy brain from T1-weighted magnetic resonance images (T1 MRIs), and the predicted brain age could serve as a valuable biomarker for the early detection of development-related or aging-related disorders. Recent DNN models for brain age estimations usually rely too much on large sample sizes and complex network s… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: This paper has been accepted by IEEE ISBI 2023

  26. arXiv:2305.10821  [pdf, other

    eess.AS

    Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

    Authors: Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

    Abstract: Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available. However, most of them neglect to utilize speaker's 2-dimensional (2D) location cues contained in mixture signal, which limits the performance when two sources come from close directions. In this paper, we propose an end-to-end beamforming network for… ▽ More

    Submitted 2 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2212.03401

  27. arXiv:2305.07455  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation

    Authors: Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-yi Lee

    Abstract: Most of the speech translation models heavily rely on parallel data, which is hard to collect especially for low-resource languages. To tackle this issue, we propose to build a cascaded speech translation system without leveraging any kind of paired data. We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS. The results show that our work is co… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  28. arXiv:2305.01183  [pdf, other

    cs.CV eess.IV

    Faster OreFSDet : A Lightweight and Effective Few-shot Object Detector for Ore Images

    Authors: Yang Zhang, Le Cheng, Yuting Peng, Chengming Xu, Yanwei Fu, Bo Wu, Guodong Sun

    Abstract: For the ore particle size detection, obtaining a sizable amount of high-quality ore labeled data is time-consuming and expensive. General object detection methods often suffer from severe over-fitting with scarce labeled data. Despite their ability to eliminate over-fitting, existing few-shot object detectors encounter drawbacks such as slow detection speed and high memory requirements, making the… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 18 pages, 11 figures

  29. arXiv:2304.00844  [pdf, other

    cs.CV eess.IV

    Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising

    Authors: Miaoyu Li, Ji Liu, Ying Fu, Yulun Zhang, De**g Dou

    Abstract: Denoising is a crucial step for hyperspectral image (HSI) applications. Though witnessing the great power of deep learning, existing HSI denoising methods suffer from limitations in capturing the non-local self-similarity. Transformers have shown potential in capturing long-range dependencies, but few attempts have been made with specifically designed Transformer to model the spatial and spectral… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  30. arXiv:2303.09278  [pdf, other

    eess.AS cs.SD

    DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

    Authors: Yanzhe Fu, Yueteng Kang, Songjun Cao, Long Ma

    Abstract: Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech recognition (ASR). However, the large model size and the non-streaming architecture make it hard to be used under low-resource or streaming scenarios. In this work, we propose a two-stage knowledge distillation method to solve these two problems: the first step is to make the big and non-streaming teacher model smaller, and th… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  31. arXiv:2303.09040  [pdf, other

    cs.CV eess.IV

    Hybrid Spectral Denoising Transformer with Guided Attention

    Authors: Zeqiang Lai, Chenggang Yan, Ying Fu

    Abstract: In this paper, we present a Hybrid Spectral Denoising Transformer (HSDT) for hyperspectral image denoising. Challenges in adapting transformer for HSI arise from the capabilities to tackle existing limitations of CNN-based methods in capturing the global and local spatial-spectral correlations while maintaining efficiency and flexibility. To address these issues, we introduce a hybrid approach tha… ▽ More

    Submitted 8 August, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: ICCV 2023

  32. arXiv:2303.06550  [pdf, other

    eess.IV cs.CV

    Spatial Correspondence between Graph Neural Network-Segmented Images

    Authors: Qian Li, Yunguan Fu, Qianye Yang, Zhijiang Du, Hongjian Yu, Yipeng Hu

    Abstract: Graph neural networks (GNNs) have been proposed for medical image segmentation, by predicting anatomical structures represented by graphs of vertices and edges. One such type of graph is predefined with fixed size and connectivity to represent a reference of anatomical regions of interest, thus known as templates. This work explores the potentials in these GNNs with common topology for establishin… ▽ More

    Submitted 16 March, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

    Comments: Accepted at MIDL 2023 (The Medical Imaging with Deep Learning conference, 2023)

  33. arXiv:2303.06040  [pdf, other

    eess.IV cs.CV

    Importance of Aligning Training Strategy with Evaluation for Diffusion Models in 3D Multiclass Segmentation

    Authors: Yunguan Fu, Yiwen Li, Shaheer U. Saeed, Matthew J. Clarkson, Yipeng Hu

    Abstract: Recently, denoising diffusion probabilistic models (DDPM) have been applied to image segmentation by generating segmentation masks conditioned on images, while the applications were mainly limited to 2D networks without exploiting potential benefits from the 3D formulation. In this work, we studied the DDPM-based segmentation model for 3D multiclass segmentation on two large multiclass data sets (… ▽ More

    Submitted 18 August, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted at Deep Generative Models workshop at MICCAI 2023

  34. arXiv:2302.13523  [pdf, other

    cs.SD eess.AS

    VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting

    Authors: Ao Zhang, He Wang, Pengcheng Guo, Yihui Fu, Lei Xie, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: The performance of the keyword spotting (KWS) system based on audio modality, commonly measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. Therefore, audio-visual keyword spotting, which leverages complementary relationships over multiple modalities, has recently gained much attention. However, current studies mainly focus on combining the e… ▽ More

    Submitted 14 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: 5 pages. Accepted at ICASSP2023

  35. arXiv:2302.12757  [pdf, other

    eess.AS cs.CL cs.SD

    Ensemble knowledge distillation of self-supervised speech models

    Authors: Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-yi Lee

    Abstract: Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerw… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  36. arXiv:2302.06818  [pdf, other

    cs.LG cs.AI eess.SP

    Masked Multi-Step Probabilistic Forecasting for Short-to-Mid-Term Electricity Demand

    Authors: Yiwei Fu, Nurali Virani, Honggang Wang

    Abstract: Predicting the demand for electricity with uncertainty helps in planning and operation of the grid to provide reliable supply of power to the consumers. Machine learning (ML)-based demand forecasting approaches can be categorized into (1) sample-based approaches, where each forecast is made independently, and (2) time series regression approaches, where some historical load and other feature infor… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: Accepted by the 2023 IEEE Power & Energy Society General Meeting (PESGM). arXiv admin note: substantial text overlap with arXiv:2209.14413

  37. arXiv:2302.06298  [pdf, other

    cs.CV eess.IV

    Hyperspectral Image Super Resolution with Real Unaligned RGB Guidance

    Authors: Zeqiang Lai, Ying Fu, Jun Zhang

    Abstract: Fusion-based hyperspectral image (HSI) super-resolution has become increasingly prevalent for its capability to integrate high-frequency spatial information from the paired high-resolution (HR) RGB reference image. However, most of the existing methods either heavily rely on the accurate alignment between low-resolution (LR) HSIs and RGB images, or can only deal with simulated unaligned RGB images… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: The code and dataset are publicly available at https://zeqiang-lai.github.io/HSI-RefSR/

  38. arXiv:2301.13412  [pdf

    eess.SY

    Development of a Hardware-in-the-loop Testbed for Laboratory Performance Verification of Flexible Building Equipment in Typical Commercial Buildings

    Authors: Zhelun Chen, ** Wen, Steven T. Bushby, L. James Lo, Zheng O'Neill, W. Vance Payne, Amanda Pertzborn, Caleb Calfa, Yangyang Fu, Gabriel Grajewski, Yicheng Li, Zhiyao Yang

    Abstract: The goals of reducing energy costs, shifting electricity peaks, increasing the use of renewable energy, and enhancing the stability of the electric grid can be met in part by fully exploiting the energy flexibility potential of buildings and building equipment. The development of strategies that exploit these flexibilities could be facilitated by publicly available high-resolution datasets illustr… ▽ More

    Submitted 5 February, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: presented at the ASHRAE 2022 Annual Conference

  39. arXiv:2301.12048  [pdf, other

    cs.CV eess.IV

    Making Reconstruction-based Method Great Again for Video Anomaly Detection

    Authors: Yizhou Wang, Can Qin, Yue Bai, Yi Xu, Xu Ma, Yun Fu

    Abstract: Anomaly detection in videos is a significant yet challenging problem. Previous approaches based on deep neural networks employ either reconstruction-based or prediction-based approaches. Nevertheless, existing reconstruction-based methods 1) rely on old-fashioned convolutional autoencoders and are poor at modeling temporal dependency; 2) are prone to overfit the training samples, leading to indist… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: Accepted by ICDM 2022

  40. arXiv:2301.11525  [pdf, other

    cs.CV cs.LG eess.IV

    Mixed Attention Network for Hyperspectral Image Denoising

    Authors: Zeqiang Lai, Ying Fu

    Abstract: Hyperspectral image denoising is unique for the highly similar and correlated spectral information that should be properly considered. However, existing methods show limitations in exploring the spectral correlations across different bands and feature interactions within each band. Besides, the low- and high-level features usually exhibit different importance for different spatial-spectral regions… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Code is available at https://github.com/Zeqiang-Lai/MAN. arXiv admin note: text overlap with arXiv:2211.14811

  41. arXiv:2212.03401  [pdf, other

    eess.AS cs.LG cs.SD

    MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

    Authors: Yanjie Fu, Haoran Yin, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

    Abstract: Recently, many deep learning based beamformers have been proposed for multi-channel speech separation. Nevertheless, most of them rely on extra cues known in advance, such as speaker feature, face image or directional information. In this paper, we propose an end-to-end beamforming network for direction guided speech separation given merely the mixture signal, namely MIMO-DBnet. Specifically, we d… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  42. arXiv:2211.14811   

    eess.IV cs.CV

    Improved Quasi-Recurrent Neural Network for Hyperspectral Image Denoising

    Authors: Zeqiang Lai, Ying Fu

    Abstract: Hyperspectral image is unique and useful for its abundant spectral bands, but it subsequently requires extra elaborated treatments of the spatial-spectral correlation as well as the global correlation along the spectrum for building a robust and powerful HSI restoration algorithm. By considering such HSI characteristics, 3D Quasi-Recurrent Neural Network (QRNN3D) is one of the HSI denoising networ… ▽ More

    Submitted 2 April, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

    Comments: The updated version of this paper is accidentally submitted as a new submission at arXiv:2301.11525

  43. arXiv:2211.14090  [pdf, other

    cs.CV eess.IV

    Spatial-Spectral Transformer for Hyperspectral Image Denoising

    Authors: Miaoyu Li, Ying Fu, Yulun Zhang

    Abstract: Hyperspectral image (HSI) denoising is a crucial preprocessing procedure for the subsequent HSI applications. Unfortunately, though witnessing the development of deep learning in HSI denoising area, existing convolution-based methods face the trade-off between computational efficiency and capability to model non-local characteristics of HSI. In this paper, we propose a Spatial-Spectral Transformer… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

  44. arXiv:2211.09332  [pdf

    eess.SY cs.RO

    iNavFIter-M: Matrix Formulation of Functional Iteration for Inertial Navigation Computation

    Authors: Hongyan Jiang, Maoran Zhu, Yanyan Fu, Yuanxin Wu

    Abstract: The acquisition of attitude, velocity, and position is an essential task in the field of inertial navigation, achieved by integrating the measurements from inertial sensors. Recently, the ultra-precision inertial navigation computation has been tackled by the functional iteration approach (iNavFIter) that drives the non-commutativity errors almost to the computer truncation error level. This paper… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: 30 pages, 7 figures

  45. arXiv:2211.01784  [pdf, other

    eess.IV cs.CV

    MALUNet: A Multi-Attention and Light-weight UNet for Skin Lesion Segmentation

    Authors: Jiacheng Ruan, Suncheng Xiang, Mingye Xie, Ting Liu, Yuzhuo Fu

    Abstract: Recently, some pioneering works have preferred applying more complex modules to improve segmentation performances. However, it is not friendly for actual clinical environments due to limited computing resources. To address this challenge, we propose a light-weight model to achieve competitive performances for skin lesion segmentation at the lowest cost of parameters and computational complexity so… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: 7 pages, 7 figures, 5 tables; This work has been accepted as a regular paper in IEEE BIBM 2022

  46. arXiv:2211.01522  [pdf, other

    cs.LG cs.SD eess.AS

    Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

    Authors: Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin

    Abstract: Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low-resource Automatic Speech Recognition (ASR) and other speech processing tasks, which can mitigate the necessity of a large amount of transcribed speech and thus has driven a growing demand for on-device ASR and other speech processing. However, advanced speech SSL models have become increasingly la… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS 2022

  47. arXiv:2210.14007  [pdf, other

    eess.IV cs.CV

    MEW-UNet: Multi-axis representation learning in frequency domain for medical image segmentation

    Authors: Jiacheng Ruan, Mingye Xie, Suncheng Xiang, Ting Liu, Yuzhuo Fu

    Abstract: Recently, Visual Transformer (ViT) has been widely used in various fields of computer vision due to applying self-attention mechanism in the spatial domain to modeling global knowledge. Especially in medical image segmentation (MIS), many works are devoted to combining ViT and CNN, and even some works directly utilize pure ViT-based models. However, recent works improved models in the aspect of sp… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 figures, 4 tables

  48. arXiv:2210.08802  [pdf, other

    eess.AS cs.SD

    spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement

    Authors: Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang

    Abstract: Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network -- Spatial DCCRN. Firstly, we extend S-DCCRN to multi-channel scenario, aiming at performing cascaded su… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  49. arXiv:2210.07978  [pdf, other

    cs.SD cs.CL eess.AS

    Improving generalizability of distilled self-supervised speech processing models under distorted settings

    Authors: Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-yi Lee

    Abstract: Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks. Distilled versions of SSL models have been developed to match the needs of on-device speech applications. Though having similar performance as original SSL models, distilled counterparts suffer from performance degradation even more than their original versions in distorted environments. Th… ▽ More

    Submitted 20 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted by IEEE SLT2022

  50. arXiv:2209.11382  [pdf, ps, other

    cs.IT eess.SP

    Zero-Forcing Based Downlink Virtual MIMO-NOMA Communications in IoT Networks

    Authors: Zheng Shi, Hong Wang, Yaru Fu, Guanghua Yang, Shaodan Ma, Fen Hou, Theodoros A. Tsiftsis

    Abstract: To support massive connectivity and boost spectral efficiency for internet of things (IoT), a downlink scheme combining virtual multiple-input multiple-output (MIMO) and nonorthogonal multiple access (NOMA) is proposed. All the single-antenna IoT devices in each cluster cooperate with each other to establish a virtual MIMO entity, and multiple independent data streams are requested by each cluster… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.