-
Generative Iris Prior Embedded Transformer for Iris Restoration
Authors:
Yubo Huang,
Jia Wang,
Peipei Li,
Liuyu Xiang,
Peigang Li,
Zhaofeng He
Abstract:
Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder…
▽ More
Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder network employing Transformer block and generative iris prior. First, we tame Transformer blocks to model long-range dependencies in target images. Second, we pretrain an iris generative adversarial network (GAN) to obtain the rich iris prior, and incorporate it into the iris restoration process with our iris feature modulator. Our experiments demonstrate that the proposed Gformer outperforms state-of-the-art methods. Besides, iris recognition performance has been significantly improved after applying Gformer.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
LUT-boosted CDR and Equalization for Burst-mode 50/100 Gbit/s Bandwidth-limited Flexible PON
Authors:
Yanlu Huang,
Liyan Wu,
Shangya Han,
Kai **,
Kun Xu,
Yanni Ou
Abstract:
We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles.
We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Energy efficiency analysis of ammonia-fueled power systems for vehicles considering residual heat recovery
Authors:
Zexin Nie,
Yi Huang,
Guangyu Tian
Abstract:
Ammonia, known as a good hydrogen carrier, shows great potential for use as a zero-carbon fuel for vehicles. However, both the internal combustion engine (ICE) and the proton exchange membrane fuel cell (PEMFC), the currently available engines used by the vehicle, require hydrogen decomposed from ammonia. On-board hydrogen production is an energy-intensive process that significantly reduces system…
▽ More
Ammonia, known as a good hydrogen carrier, shows great potential for use as a zero-carbon fuel for vehicles. However, both the internal combustion engine (ICE) and the proton exchange membrane fuel cell (PEMFC), the currently available engines used by the vehicle, require hydrogen decomposed from ammonia. On-board hydrogen production is an energy-intensive process that significantly reduces system efficiency. Therefore, energy recovery from the system's residual heat is essential to promote system efficiency. ICEs and FCs require different amounts of hydrogen, and they produce residual heat of different quality and quantity, so the system efficiency is not only determined by the engine operating point, but also by the measures and ratios of residual heat recovery. To thoroughly understand the relationships between system energy efficiency and system configuration as well as system parameters, this paper takes three typical power systems with different configurations as our objects. Models of three systems are set up for system energy efficiency analysis, and carry out simulations under different conditions to conduct system output power and energy efficiency. By analyzing the simulation results, the factors that most significantly impact the system efficiency are identified, the guidelines for system design and parameter optimization are proposed.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Reachability Analysis for Linear Systems with Uncertain Parameters using Polynomial Zonotopes
Authors:
Yushen Huang,
Ertai Luo,
Stanley Bak,
Yifan Sun
Abstract:
In real world applications, uncertain parameters are the rule rather than the exception. We present a reachability algorithm for linear systems with uncertain parameters and inputs using set propagation of polynomial zonotopes. In contrast to previous methods, our approach is able to tightly capture the non-convexity of the reachable set. Building up on our main result, we show how our reachabilit…
▽ More
In real world applications, uncertain parameters are the rule rather than the exception. We present a reachability algorithm for linear systems with uncertain parameters and inputs using set propagation of polynomial zonotopes. In contrast to previous methods, our approach is able to tightly capture the non-convexity of the reachable set. Building up on our main result, we show how our reachability algorithm can be extended to handle linear time-varying systems as well as linear systems with time-varying parameters. Moreover, our approach opens up new possibilities for reachability analysis of linear time-invariant systems, nonlinear systems, and hybrid systems. We compare our approach to other state of the art methods, with superior tightness on two benchmarks including a 9-dimensional vehicle platooning system. Moreover, as part of the journal extension, we investigate through a polynomial zonotope with special structure named multi-affine zonotopes and its optimization problem. We provide the corresponding optimization algorithm and experiment over the examples obatined from two benchmark systems, showing the efficiency and scalability comparing to the state of the art method for handling such type of set representation.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases
Authors:
Meng Wang,
Tian Lin,
Aidi Lin,
Kai Yu,
Yuanyuan Peng,
Lianyu Wang,
Cheng Chen,
Ke Zou,
Huiyu Liang,
Man Chen,
Xue Yao,
Meiqin Zhang,
Binwei Huang,
Chaoxin Zheng,
Peixin Zhang,
Wei Chen,
Yilong Luo,
Yifan Chen,
Honghe Xia,
Tingkun Shi,
Qi Zhang,
**ming Guo,
Xiaolin Chen,
**gcheng Wang,
Yih Chung Tham
, et al. (24 additional authors not shown)
Abstract:
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources…
▽ More
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered.
△ Less
Submitted 30 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
Authors:
Yu-Fen Huang,
Nikki Moran,
Simon Coleman,
Jon Kelly,
Shun-Hwa Wei,
Po-Yin Chen,
Yun-Hsin Huang,
Tsung-** Chen,
Yu-Chia Kuo,
Yu-Chi Wei,
Chih-Hsuan Li,
Da-Yu Huang,
Hsuan-Kai Kao,
Ting-Wei Lin,
Li Su
Abstract:
In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m…
▽ More
In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset).
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome
Authors:
Yixin Huang,
Yiqi **,
Ke Tao,
Kaijian Xia,
Jianfeng Gu,
Lei Yu,
Lan Du,
Cunjian Chen
Abstract:
May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t…
▽ More
May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-temporal relationship among CT scans and emulate the clinical process of diagnosing MTS, we propose a novel attention module called the dual-enhanced positional multi-head self-attention (DEP-MHSA). The proposed DEP-MHSA reconsiders the role of positional embedding and incorporates a dual-enhanced positional embedding in both attention weights and residual connections. Further, we establish a new dataset, termed MTS-CT, consisting of 747 subjects. Experimental results demonstrate that our proposed approach achieves state-of-the-art MTS diagnosis results, and our self-attention design facilitates the spatial-temporal modeling. We believe that our DEP-MHSA is more suitable to handle CT image sequence modeling and the proposed dataset enables future research on MTS diagnosis. We make our code and dataset publicly available at: https://github.com/Nutingnon/MTS_dep_mhsa.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Navigating Autonomous Vehicle on Unmarked Roads with Diffusion-Based Motion Prediction and Active Inference
Authors:
Yufei Huang,
Yulin Li,
Andrea Matta,
Mohsen Jafari
Abstract:
This paper presents a novel approach to improving autonomous vehicle control in environments lacking clear road markings by integrating a diffusion-based motion predictor within an Active Inference Framework (AIF). Using a simulated parking lot environment as a parallel to unmarked roads, we develop and test our model to predict and guide vehicle movements effectively. The diffusion-based motion p…
▽ More
This paper presents a novel approach to improving autonomous vehicle control in environments lacking clear road markings by integrating a diffusion-based motion predictor within an Active Inference Framework (AIF). Using a simulated parking lot environment as a parallel to unmarked roads, we develop and test our model to predict and guide vehicle movements effectively. The diffusion-based motion predictor forecasts vehicle actions by leveraging probabilistic dynamics, while AIF aids in decision-making under uncertainty. Unlike traditional methods such as Model Predictive Control (MPC) and Reinforcement Learning (RL), our approach reduces computational demands and requires less extensive training, enhancing navigation safety and efficiency. Our results demonstrate the model's capability to navigate complex scenarios, marking significant progress in autonomous driving technology.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Multiscale Spatio-Temporal Enhanced Short-term Load Forecasting of Electric Vehicle Charging Stations
Authors:
Zongbao Zhang,
Jiao Hao,
Wenmeng Zhao,
Yan Liu,
Yaohui Huang,
Xinhang Luo
Abstract:
The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address the…
▽ More
The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address these challenges, we propose a Multiscale Spatio-Temporal Enhanced Model (MSTEM) for effective load forecasting at EVCS. MSTEM incorporates a multiscale graph neural network to discern hierarchical nonlinear temporal dependencies across various time scales. Besides, it also integrates a recurrent learning component and a residual fusion mechanism, enhancing its capability to accurately capture spatial and temporal variations in charging patterns. The effectiveness of the proposed MSTEM has been validated through comparative analysis with six baseline models using three evaluation metrics. The case studies utilize real-world datasets for both fast and slow charging loads at EVCS in Perth, UK. The experimental results demonstrate the superiority of MSTEM in short-term continuous load forecasting for EVCS.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
A 7K Parameter Model for Underwater Image Enhancement based on Transmission Map Prior
Authors:
Fuheng Zhou,
Dikai Wei,
Ye Fan,
Yulong Huang,
Yonggang Zhang
Abstract:
Although deep learning based models for underwater image enhancement have achieved good performance, they face limitations in both lightweight and effectiveness, which prevents their deployment and application on resource-constrained platforms. Moreover, most existing deep learning based models use data compression to get high-level semantic information in latent space instead of using the origina…
▽ More
Although deep learning based models for underwater image enhancement have achieved good performance, they face limitations in both lightweight and effectiveness, which prevents their deployment and application on resource-constrained platforms. Moreover, most existing deep learning based models use data compression to get high-level semantic information in latent space instead of using the original information. Therefore, they require decoder blocks to generate the details of the output. This requires additional computational cost. In this paper, a lightweight network named lightweight selective attention network (LSNet) based on the top-k selective attention and transmission maps mechanism is proposed. The proposed model achieves a PSNR of 97\% with only 7K parameters compared to a similar attention-based model. Extensive experiments show that the proposed LSNet achieves excellent performance in state-of-the-art models with significantly fewer parameters and computational resources. The code is available at https://github.com/FuhengZhou/LSNet}{https://github.com/FuhengZhou/LSNet.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Automatic diagnosis of cardiac magnetic resonance images based on semi-supervised learning
Authors:
Hejun Huang,
Zuguo Chen,
Yi Huang,
Guangqiang Luo,
Chaoyang Chen,
Youzhi Song
Abstract:
Cardiac magnetic resonance imaging (MRI) is a pivotal tool for assessing cardiac function. Precise segmentation of cardiac structures is imperative for accurate cardiac functional evaluation. This paper introduces a semi-supervised model for automatic segmentation of cardiac images and auxiliary diagnosis. By harnessing cardiac MRI images and necessitating only a small portion of annotated image d…
▽ More
Cardiac magnetic resonance imaging (MRI) is a pivotal tool for assessing cardiac function. Precise segmentation of cardiac structures is imperative for accurate cardiac functional evaluation. This paper introduces a semi-supervised model for automatic segmentation of cardiac images and auxiliary diagnosis. By harnessing cardiac MRI images and necessitating only a small portion of annotated image data, the model achieves fully automated, high-precision segmentation of cardiac images, extraction of features, calculation of clinical indices, and prediction of diseases. The provided segmentation results, clinical indices, and prediction outcomes can aid physicians in diagnosis, thereby serving as auxiliary diagnostic tools. Experimental results showcase that this semi-supervised model for automatic segmentation of cardiac images and auxiliary diagnosis attains high accuracy in segmentation and correctness in prediction, demonstrating substantial practical guidance and application value.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma
Authors:
Ahmed Gomaa,
Yixing Huang,
Amr Hagag,
Charlotte Schmitter,
Daniel Höfler,
Thomas Weissmann,
Katharina Breininger,
Manuel Schmidt,
Jenny Stritzelberger,
Daniel Delev,
Roland Coras,
Arnd Dörfler,
Oliver Schnell,
Benjamin Frey,
Udo S. Gaipl,
Sabine Semrau,
Christoph Bert,
Rainer Fietkau,
Florian Putz
Abstract:
Background: This research aims to improve glioblastoma survival prediction by integrating MR images, clinical and molecular-pathologic data in a transformer-based deep learning model, addressing data heterogeneity and performance generalizability. Method: We propose and evaluate a transformer-based non-linear and non-proportional survival prediction model. The model employs self-supervised learnin…
▽ More
Background: This research aims to improve glioblastoma survival prediction by integrating MR images, clinical and molecular-pathologic data in a transformer-based deep learning model, addressing data heterogeneity and performance generalizability. Method: We propose and evaluate a transformer-based non-linear and non-proportional survival prediction model. The model employs self-supervised learning techniques to effectively encode the high-dimensional MRI input for integration with non-imaging data using cross-attention. To demonstrate model generalizability, the model is assessed with the time-dependent concordance index (Cdt) in two training setups using three independent public test sets: UPenn-GBM, UCSF-PDGM, and RHUH-GBM, each comprising 378, 366, and 36 cases, respectively. Results: The proposed transformer model achieved promising performance for imaging as well as non-imaging data, effectively integrating both modalities for enhanced performance (UPenn-GBM test-set, imaging Cdt 0.645, multimodal Cdt 0.707) while outperforming state-of-the-art late-fusion 3D-CNN-based models. Consistent performance was observed across the three independent multicenter test sets with Cdt values of 0.707 (UPenn-GBM, internal test set), 0.672 (UCSF-PDGM, first external test set) and 0.618 (RHUH-GBM, second external test set). The model achieved significant discrimination between patients with favorable and unfavorable survival for all three datasets (logrank p 1.9\times{10}^{-8}, 9.7\times{10}^{-3}, and 1.2\times{10}^{-2}). Conclusions: The proposed transformer-based survival prediction model integrates complementary information from diverse input modalities, contributing to improved glioblastoma survival prediction compared to state-of-the-art methods. Consistent performance was observed across institutions supporting model generalizability.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Predictive Energy Management for Battery Electric Vehicles with Hybrid Models
Authors:
Yu-Wen Huang,
Christian Prehofer,
William Lindskog,
Ron Puts,
Pietro Mosca,
Göran Kauermann
Abstract:
This paper addresses the problem of predicting the energy consumption for the drivers of Battery electric vehicles (BEVs). Several external factors (e.g., weather) are shown to have huge impacts on the energy consumption of a vehicle besides the vehicle or powertrain dynamics. Thus, it is challenging to take all of those influencing variables into consideration. The proposed approach is based on a…
▽ More
This paper addresses the problem of predicting the energy consumption for the drivers of Battery electric vehicles (BEVs). Several external factors (e.g., weather) are shown to have huge impacts on the energy consumption of a vehicle besides the vehicle or powertrain dynamics. Thus, it is challenging to take all of those influencing variables into consideration. The proposed approach is based on a hybrid model which improves the prediction accuracy of energy consumption of BEVs. The novelty of this approach is to combine a physics-based simulation model, which captures the basic vehicle and powertrain dynamics, with a data-driven model. The latter accounts for other external influencing factors neglected by the physical simulation model, using machine learning techniques, such as generalized additive mixed models, random forests and boosting. The hybrid modeling method is evaluated with a real data set from TUM and the hybrid models were shown that decrease the average prediction error from 40% of the pure physics model to 10%.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation
Authors:
Yixing Huang,
Zahra Khodabakhshi,
Ahmed Gomaa,
Manuel Schmidt,
Rainer Fietkau,
Matthias Guckenberger,
Nicolaus Andratschke,
Christoph Bert,
Stephanie Tanadini-Lang,
Florian Putz
Abstract:
Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data.
Materials and methods: A total of six BM datasets from University Hospita…
▽ More
Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data.
Materials and methods: A total of six BM datasets from University Hospital Erlangen (UKER), University Hospital Zurich (USZ), Stanford, UCSF, NYU and BraTS Challenge 2023 on BM segmentation were used for this evaluation. First, the multicenter performance of a convolutional neural network (DeepMedic) for BM autosegmentation was established for exclusive single-center training and for training on pooled data, respectively. Subsequently bilateral collaboration was evaluated, where a UKER pretrained model is shared to another center for further training using transfer learning (TL) either with or without LWF.
Results: For single-center training, average F1 scores of BM detection range from 0.625 (NYU) to 0.876 (UKER) on respective single-center test data. Mixed multicenter training notably improves F1 scores at Stanford and NYU, with negligible improvement at other centers. When the UKER pretrained model is applied to USZ, LWF achieves a higher average F1 score (0.839) than naive TL (0.570) and single-center training (0.688) on combined UKER and USZ test data. Naive TL improves sensitivity and contouring accuracy, but compromises precision. Conversely, LWF demonstrates commendable sensitivity, precision and contouring accuracy. When applied to Stanford, similar performance was observed.
Conclusion: Data heterogeneity results in varying performance in BM autosegmentation, posing challenges to model generalizability. LWF is a promising approach to peer-to-peer privacy-preserving model training.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model
Authors:
Yongsong Huang,
Tomo Miyazaki,
Xiaofeng Liu,
Shinichiro Omachi
Abstract:
Infrared (IR) image super-resolution faces challenges from homogeneous background pixel distributions and sparse target regions, requiring models that effectively handle long-range dependencies and capture detailed local-global information. Recent advancements in Mamba-based (Selective Structured State Space Model) models, employing state space models, have shown significant potential in visual ta…
▽ More
Infrared (IR) image super-resolution faces challenges from homogeneous background pixel distributions and sparse target regions, requiring models that effectively handle long-range dependencies and capture detailed local-global information. Recent advancements in Mamba-based (Selective Structured State Space Model) models, employing state space models, have shown significant potential in visual tasks, suggesting their applicability for IR enhancement. In this work, we introduce IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model, a novel Mamba-based model designed specifically for IR image super-resolution. This model enhances the restoration of context-sparse target details through its advanced dependency modeling capabilities. Additionally, a new wavelet transform feature modulation block improves multi-scale receptive field representation, capturing both global and local information efficiently. Comprehensive evaluations confirm that IRSRMamba outperforms existing models on multiple benchmarks. This research advances IR super-resolution and demonstrates the potential of Mamba-based models in IR image processing. Code are available at \url{https://github.com/yongsongH/IRSRMamba}.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior
Authors:
Yongfeng Huang,
Zhendong Chen,
Kun Ye,
Lang Zhou,
Haixin Sun
Abstract:
In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to mod…
▽ More
In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to model the real array manifold matrix, and Bayesian inference is utilized to calculate the off-grid error, which mitigates the grid dictionary mismatch problem in underdetermined scenarios. Secondly, an innovative grid refinement method is introduced, treating grid points as iterative parameters to minimize the modeling error between the source and grid points. The numerical simulation results verify the superiority of the proposed strategy, especially when dealing with a coarse grid and few snapshots.
△ Less
Submitted 17 May, 2024; v1 submitted 18 April, 2024;
originally announced May 2024.
-
Low-Complexity Joint Azimuth-Range-Velocity Estimation for Integrated Sensing and Communication with OFDM Waveform
Authors:
Jun Zhang,
Gang Yang,
Qibin Ye,
Yixuan Huang,
Su Hu
Abstract:
Integrated sensing and communication (ISAC) is a main application scenario of the sixth-generation mobile communication systems. Due to the fast-growing number of antennas and subcarriers in cellular systems, the computational complexity of joint azimuth-range-velocity estimation (JARVE) in ISAC systems is extremely high. This paper studies the JARVE problem for a monostatic ISAC system with ortho…
▽ More
Integrated sensing and communication (ISAC) is a main application scenario of the sixth-generation mobile communication systems. Due to the fast-growing number of antennas and subcarriers in cellular systems, the computational complexity of joint azimuth-range-velocity estimation (JARVE) in ISAC systems is extremely high. This paper studies the JARVE problem for a monostatic ISAC system with orthogonal frequency division multiplexing (OFDM) waveform, in which a base station receives the echos of its transmitted cellular OFDM signals to sense multiple targets. The Cramer-Rao bounds are first derived for JARVE. A low-complexity algorithm is further designed for super-resolution JARVE, which utilizes the proposed iterative subspace update scheme and Levenberg-Marquardt optimization method to replace the exhaustive search of spatial spectrum in multiple-signal-classification (MUSIC) algorithm. Finally, with the practical parameters of 5G New Radio, simulation results verify that the proposed algorithm can reduce the computational complexity by three orders of magnitude and two orders of magnitude compared to the existing three-dimensional MUSIC algorithm and estimation-of-signal-parameters-using-rotational-invariance-techniques (ESPRIT) algorithm, respectively, and also improve the estimation performance.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results
Authors:
Yaqi Wu,
Zhihao Fan,
Xiaofeng Chu,
Jimmy S. Ren,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangcheng Zhou,
Ruicheng Feng,
Yuekun Dai,
Peiqing Yang,
Chen Change Loy,
Senyan Xu,
Zhi**g Sun,
Jiaying Zhu,
Yurui Zhu,
Xueyang Fu,
Zheng-Jun Zha,
Jun Cao,
Cheng Li,
Shu Chen,
Liang Ma,
Shiyang Zhou,
Hai** Zeng,
Kai Feng
, et al. (24 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
POPDG: Popular 3D Dance Generation with PopDanceSet
Authors:
Zhenye Luo,
Min Ren,
Xuecai Hu,
Yongzhen Huang,
Li Yao
Abstract:
Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. M…
▽ More
Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at https://github.com/Luke-Luo1/POPDG.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Authors:
Qixin Deng,
Qikai Yang,
Ruibin Yuan,
Yipeng Huang,
Yi Wang,
Xubo Liu,
Zeyue Tian,
Jiahao Pan,
Ge Zhang,
Hanfeng Lin,
Yizhi Li,
Yinghao Ma,
Jie Fu,
Chenghua Lin,
Emmanouil Benetos,
Wenwu Wang,
Guangyu Xia,
Wei Xue,
Yike Guo
Abstract:
Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C…
▽ More
Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. To further explore and enhance LLMs' potential in music composition by leveraging their reasoning ability and the large knowledge base in music history and theory, we propose ComposerX, an agent-based symbolic music generation framework. We find that applying a multi-agent approach significantly improves the music composition quality of GPT-4. The results demonstrate that ComposerX is capable of producing coherent polyphonic music compositions with captivating melodies, while adhering to user instructions.
△ Less
Submitted 30 April, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
Authors:
Marcos V. Conde,
Zhijun Lei,
Wen Li,
Cosmin Stejerean,
Ioannis Katsavounidis,
Radu Timofte,
Kihwan Yoon,
Ganzorig Gankhuyag,
Jiangtao Lv,
Long Sun,
**shan Pan,
Jiangxin Dong,
**hui Tang,
Zhiyuan Li,
Hao Wei,
Chenyang Ge,
Dongyang Zhang,
Tianle Liu,
Huaian Chen,
Yi **,
Menghan Zhou,
Yiqiang Yan,
Si Gao,
Biao Wu,
Shaoli Liu
, et al. (50 additional authors not shown)
Abstract:
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod…
▽ More
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
An Alternative Method to Identify the Susceptibility Threshold Level of Device under Test in a Reverberation Chamber
Authors:
Qian Xu,
Kai Chen,
Xueqi Shen,
Lei Xing,
Yi Huang,
Tian Hong Loh
Abstract:
By counting the number of pass/fail occurrences of a DUT (Device under Test) in the stirring process in a reverberation chamber (RC), the threshold electric field (E-field) level can be well estimated without tuning the input power and repeating the whole testing many times. The Monte-Carlo method is used to verify the results. Estimated values and uncertainties are given for Rayleigh distributed…
▽ More
By counting the number of pass/fail occurrences of a DUT (Device under Test) in the stirring process in a reverberation chamber (RC), the threshold electric field (E-field) level can be well estimated without tuning the input power and repeating the whole testing many times. The Monte-Carlo method is used to verify the results. Estimated values and uncertainties are given for Rayleigh distributed fields and for Rice distributed fields with different K-factors.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
A Massive MIMO Sampling Detection Strategy Based on Denoising Diffusion Model
Authors:
Lanxin He,
Zheng Wang,
Yongming Huang
Abstract:
The Langevin sampling method relies on an accurate score matching while the existing massive multiple-input multiple output (MIMO) Langevin detection involves an inevitable singular value decomposition (SVD) to calculate the posterior score. In this work, a massive MIMO sampling detection strategy that leverages the denoising diffusion model is proposed to narrow the gap between the given iterativ…
▽ More
The Langevin sampling method relies on an accurate score matching while the existing massive multiple-input multiple output (MIMO) Langevin detection involves an inevitable singular value decomposition (SVD) to calculate the posterior score. In this work, a massive MIMO sampling detection strategy that leverages the denoising diffusion model is proposed to narrow the gap between the given iterative detector and the maximum likelihood (ML) detection in an SVD-free manner. Specifically, the proposed score-based sampling detection strategy, denoted as approximate diffusion detection (ADD), is applicable to a wide range of iterative detection methods, and therefore entails a considerable potential in their performance improvement by multiple sampling attempts. On the other hand, the ADD scheme manages to bypass the channel SVD by introducing a reliable iterative detector to produce a sample from the approximate posterior, so that further Langevin sampling is tractable. Customized by the conjugated gradient descent algorithm as an instance, the proposed sampling scheme outperforms the existing score-based detector in terms of a better complexity-performance trade-off.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Learning Wireless Data Knowledge Graph for Green Intelligent Communications: Methodology and Experiments
Authors:
Yongming Huang,
Xiaohu You,
Hang Zhan,
Shiwen He,
Ningning Fu,
Wei Xu
Abstract:
Intelligent communications have played a pivotal role in sha** the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only…
▽ More
Intelligent communications have played a pivotal role in sha** the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only a fraction of them imposes significant impact on the network AI models. Therefore, real-time intelligence of communication systems heavily relies on a small but critical set of the data that profoundly influences the performance of network AI models. These challenges underscore the need for innovative architectures and solutions. In this paper, we propose a solution, termed the pervasive multi-level (PML) native AI architecture, which integrates the concept of knowledge graph (KG) into the intelligent operational manipulations of mobile networks, resulting in the establishment of a wireless data KG. Leveraging the wireless data KG, we characterize the massive and complex data collected from wireless communication networks and analyze the relationships among various data fields. The obtained graph of data field relations enables the on-demand generation of minimal and effective datasets, referred to as feature datasets, tailored to specific application requirements. Consequently, this architecture not only enhances AI training, inference, and validation processes but also significantly reduces resource wastage and overhead for communication networks. To implement this architecture, we have developed a specific solution comprising a spatio-temporal heterogeneous graph attention neural network model (STREAM) as well as a feature dataset generation algorithm. Experiments are conducted to validate the effectiveness of the proposed architecture.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies
Authors:
Haili Sun,
Yan Huang,
Lansheng Han,
Cai Fu,
Chunjie Zhou
Abstract:
Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominant…
▽ More
Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominantly reconstruction-based and predictive in nature. However, they typically concentrate on a single-dimensional instance level, thereby not fully harnessing the complex associations inherent in industrial MTS. To address this issue, we propose a novel self-supervised hierarchical contrastive consistency learning method for detecting anomalies in MTS, named HCL-MTSAD. It innovatively leverages data consistency at multiple levels inherent in industrial MTS, systematically capturing consistent associations across four latent levels-measurement, sample, channel, and process. By develo** a multi-layer contrastive loss, HCL-MTSAD can extensively mine data consistency and spatio-temporal association, resulting in more informative representations. Subsequently, an anomaly discrimination module, grounded in self-supervised hierarchical contrastive learning, is designed to detect timestamp-level anomalies by calculating multi-scale data consistency. Extensive experiments conducted on six diverse MTS datasets retrieved from real cyber-physical systems and server machines, in comparison with 20 baselines, indicate that HCL-MTSAD's anomaly detection capability outperforms the state-of-the-art benchmark models by an average of 1.8\% in terms of F1 score.
△ Less
Submitted 18 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Net 835-Gb/s/λ Carrier- and LO-Free 100-km Transmission Using Channel-Aware Phase Retrieval Reception
Authors:
Hanzi Huang,
Haoshuo Chen,
Qian Hu,
Di Che,
Yetian Huang,
Brian Stern,
Nicolas K. Fontaine,
Mikael Mazur,
Lauren Dallachiesa,
Roland Ryf,
Zhengxuan Li,
Yingxiong Song
Abstract:
We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency.
We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Joint Active and Passive Beamforming for IRS-Aided Wireless Energy Transfer Network Exploiting One-Bit Feedback
Authors:
Taotao Ji,
Meng Hua,
Chunguo Li,
Yongming Huang,
Luxi Yang
Abstract:
To reap the active and passive beamforming gain in an intelligent reflecting surface (IRS)-aided wireless network, a typical way is to first acquire the channel state information (CSI) relying on the pilot signal, and then perform the joint beamforming design. However, it is a great challenge when the receiver can neither send pilot signals nor have complex signal processing capabilities due to it…
▽ More
To reap the active and passive beamforming gain in an intelligent reflecting surface (IRS)-aided wireless network, a typical way is to first acquire the channel state information (CSI) relying on the pilot signal, and then perform the joint beamforming design. However, it is a great challenge when the receiver can neither send pilot signals nor have complex signal processing capabilities due to its hardware limitation. To tackle this problem, we study in this paper an IRS-aided wireless energy transfer (WET) network and propose two joint beamforming design methods, namely, the channel-estimationbased method and the distributed-beamforming-based method, that require only one-bit feedback from the energy receiver (ER) to the energy transmitter (ET). Specifically, for the channelestimation-based method, according to the feedback information, the ET is able to infer the cascaded ET-IRS-ER channel by continually adjusting its transmit beamformer while applying the analytic center cutting plane method (ACCPM). Then, based on the estimated cascaded CSI, the joint beamforming design can be performed by using the existing optimization techniques. While for the distributed-beamforming-based method, we first apply the distributed beamforming algorithm to optimize the IRS reflection coefficients, which is theoretically proven to converge to a local optimum almost surely. Then, the optimal ET's transmit covariance matrix is obtained based on the effective ET-ER channel learned by applying the ACCPM only once. Numerical results demonstrate the effectiveness of our proposed one-bitfeedback-based joint beamforming design schemes while greatly reducing the requirement on the hardware complexity of the ER. In particular, the high accuracy of our IRS-involved cascaded channel estimation method exploiting one-bit feedback is also validated.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Intelligent Reflecting Surface Aided Target Localization With Unknown Transceiver-IRS Channel State Information
Authors:
Taotao Ji,
Meng Hua,
Xuanhong Yan,
Chunguo Li,
Yongming Huang,
Luxi Yang
Abstract:
Integrating wireless sensing capabilities into base stations (BSs) has become a widespread trend in the future beyond fifth-generation (B5G)/sixth-generation (6G) wireless networks. In this paper, we investigate intelligent reflecting surface (IRS) enabled wireless localization, in which an IRS is deployed to assist a BS in locating a target in its non-line-of-sight (NLoS) region. In particular, w…
▽ More
Integrating wireless sensing capabilities into base stations (BSs) has become a widespread trend in the future beyond fifth-generation (B5G)/sixth-generation (6G) wireless networks. In this paper, we investigate intelligent reflecting surface (IRS) enabled wireless localization, in which an IRS is deployed to assist a BS in locating a target in its non-line-of-sight (NLoS) region. In particular, we consider the case where the BS-IRS channel state information (CSI) is unknown. Specifically, we first propose a separate BS-IRS channel estimation scheme in which the BS operates in full-duplex mode (FDM), i.e., a portion of the BS antennas send downlink pilot signals to the IRS, while the remaining BS antennas receive the uplink pilot signals reflected by the IRS. However, we can only obtain an incomplete BS-IRS channel matrix based on our developed iterative coordinate descent-based channel estimation algorithm due to the "sign ambiguity issue". Then, we employ the multiple hypotheses testing framework to perform target localization based on the incomplete estimated channel, in which the probability of each hypothesis is updated using Bayesian inference at each cycle. Moreover, we formulate a joint BS transmit waveform and IRS phase shifts optimization problem to improve the target localization performance by maximizing the weighted sum distance between each two hypotheses. However, the objective function is essentially a quartic function of the IRS phase shift vector, thus motivating us to resort to the penalty-based method to tackle this challenge. Simulation results validate the effectiveness of our proposed target localization scheme and show that the scheme's performance can be further improved by finely designing the BS transmit waveform and IRS phase shifts intending to maximize the weighted sum distance between different hypotheses.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Fourier Transform-based Wavenumber Domain 3D Imaging in RIS-aided Communication Systems
Authors:
Yixuan Huang,
Jie Yang,
Wankai Tang,
Chao-Kai Wen,
Shi **
Abstract:
Radio imaging is rapidly gaining prominence in the design of future communication systems, with the potential to utilize reconfigurable intelligent surfaces (RISs) as imaging apertures. Although the sparsity of targets in three-dimensional (3D) space has led most research to adopt compressed sensing (CS)-based imaging algorithms, these often require substantial computational and memory burdens. Dr…
▽ More
Radio imaging is rapidly gaining prominence in the design of future communication systems, with the potential to utilize reconfigurable intelligent surfaces (RISs) as imaging apertures. Although the sparsity of targets in three-dimensional (3D) space has led most research to adopt compressed sensing (CS)-based imaging algorithms, these often require substantial computational and memory burdens. Drawing inspiration from conventional Fourier transform (FT)-based imaging methods, our research seeks to accelerate radio imaging in RIS-aided communication systems. To begin, we introduce a two-stage wavenumber domain 3D imaging technique: first, we modify RIS phase shifts to recover the equivalent channel response from the user equipment to the RIS array, subsequently employing traditional FT-based wavenumber domain methods to produce target images. We also determine the diffraction resolution limits of the system through k-space analysis, taking into account factors including system bandwidth, transmission direction, operating frequency, and the angle subtended by the RIS. Addressing the challenge of limited pilots in communication systems, we unveil an innovative algorithm that merges the strengths of both FT- and CS-based techniques by substituting the expansive sensing matrix with FT-based operators. Our simulation outcomes confirm that our proposed FT-based methods achieve high-quality images while demanding few time, memory, and communication resources.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Computationally Efficient Unsupervised Deep Learning for Robust Joint AP Clustering and Beamforming Design in Cell-Free Systems
Authors:
Guanghui Chen,
Zheng Wang,
Hongxin Lin,
Yongming Huang,
Luxi Yang
Abstract:
In this paper, we consider robust joint access point (AP) clustering and beamforming design with imperfect channel state information (CSI) in cell-free systems. Specifically, we jointly optimize AP clustering and beamforming with imperfect CSI to simultaneously maximize the worst-case sum rate and minimize the number of AP clustering under power constraint and the sparsity constraint of AP cluster…
▽ More
In this paper, we consider robust joint access point (AP) clustering and beamforming design with imperfect channel state information (CSI) in cell-free systems. Specifically, we jointly optimize AP clustering and beamforming with imperfect CSI to simultaneously maximize the worst-case sum rate and minimize the number of AP clustering under power constraint and the sparsity constraint of AP clustering. By transformations, the semi-infinite constraints caused by the imperfect CSI are converted into more tractable forms for facilitating a computationally efficient unsupervised deep learning algorithm. In addition, to further reduce the computational complexity, a computationally effective unsupervised deep learning algorithm is proposed to implement robust joint AP clustering and beamforming design with imperfect CSI in cell-free systems. Numerical results demonstrate that the proposed unsupervised deep learning algorithm achieves a higher worst-case sum rate under a smaller number of AP clustering with computational efficiency.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems
Authors:
Md Adnan Faisal Hossain,
Zhihao Duan,
Yuning Huang,
Fengqing Zhu
Abstract:
Feature compression is a promising direction for coding for machines. Existing methods have made substantial progress, but they require designing and training separate neural network models to meet different specifications of compression rate, performance accuracy and computational complexity. In this paper, a flexible variable-rate feature compression method is presented that can operate on a ran…
▽ More
Feature compression is a promising direction for coding for machines. Existing methods have made substantial progress, but they require designing and training separate neural network models to meet different specifications of compression rate, performance accuracy and computational complexity. In this paper, a flexible variable-rate feature compression method is presented that can operate on a range of rates by introducing a rate control parameter as an input to the neural network model. By compressing different intermediate features of a pre-trained vision task model, the proposed method can scale the encoding complexity without changing the overall size of the model. The proposed method is more flexible than existing baselines, at the same time outperforming them in terms of the three-way trade-off between feature compression rate, vision task accuracy, and encoding complexity. We have made the source code available at https://github.com/adnan-hossain/var_feat_comp.git.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs
Authors:
Yichi Zhang,
Zhihao Duan,
Yuning Huang,
Fengqing Zhu
Abstract:
Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bo…
▽ More
Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bound-guided hierarchical VAE (BG-VAE) for NIC. The proposed BG-VAE leverages the theoretical bound to guide the NIC model towards enhanced performance. We implement the BG-VAE using Hierarchical VAEs and demonstrate its effectiveness through extensive experiments. Along with advanced neural network blocks, we provide a versatile, variable-rate NIC that outperforms existing methods when considering both rate-distortion performance and computational complexity. The code is available at BG-VAE.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Semi-Automatic Line-System Provisioning with Integrated Physical-Parameter-Aware Methodology: Field Verification and Operational Feasibility
Authors:
Hideki Nishizawa,
Giacomo Borraccini,
Takeo Sasai,
Yue-Kai Huang,
Toru Mano,
Kazuya Anazawa,
Masatoshi Namiki,
Soichiroh Usui,
Tatsuya Matsumura,
Yoshiaki Sone,
Zehao Wang,
Seiji Okamoto,
Takeru Inoue,
Ezra Ip,
Andrea D'Amico,
Tingjun Chen,
Vittorio Curri,
Ting Wang,
Koji Asahi,
Koichi Takasugi
Abstract:
We propose methods and an architecture to conduct measurements and optimize newly installed optical fiber line systems semi-automatically using integrated physics-aware technologies in a data center interconnection (DCI) transmission scenario. We demonstrate, for the first time, digital longitudinal monitoring (DLM) and optical line system (OLS) physical parameter calibration working together in r…
▽ More
We propose methods and an architecture to conduct measurements and optimize newly installed optical fiber line systems semi-automatically using integrated physics-aware technologies in a data center interconnection (DCI) transmission scenario. We demonstrate, for the first time, digital longitudinal monitoring (DLM) and optical line system (OLS) physical parameter calibration working together in real-time to extract physical link parameters for transmission performance optimization. Our methodology has the following advantages over traditional design: a minimized footprint at user sites, accurate estimation of the necessary optical network characteristics via complementary telemetry technologies, and the capability to conduct all operation work remotely. The last feature is crucial, as it enables remote operation to implement network design settings for immediate response to quality of transmission (QoT) degradation and reversion in the case of unforeseen problems. We successfully performed semi-automatic line system provisioning over field fiber networks facilities at Duke University, Durham, NC. The tasks of parameter retrieval, equipment setting optimization, and system setup/provisioning were completed within 1 hour. The field operation was supervised by on-duty personnel who could access the system remotely from different time zones. By comparing Q-factor estimates calculated from the extracted link parameters with measured results from 400G transceivers, we confirmed that our methodology has a reduction in the QoT prediction errors (+-0.3 dB) over existing design (+-10.6 dB).
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
RIS-aided Single-frequency 3D Imaging by Exploiting Multi-view Image Correlations
Authors:
Yixuan Huang,
Jie Yang,
Chao-Kai Wen,
Shi **
Abstract:
Retrieving range information in three-dimensional (3D) radio imaging is particularly challenging due to the limited communication bandwidth and pilot resources. To address this issue, we consider a reconfigurable intelligent surface (RIS)-aided uplink communication scenario, generating multiple measurements through RIS phase adjustment. This study successfully realizes 3D single-frequency imaging…
▽ More
Retrieving range information in three-dimensional (3D) radio imaging is particularly challenging due to the limited communication bandwidth and pilot resources. To address this issue, we consider a reconfigurable intelligent surface (RIS)-aided uplink communication scenario, generating multiple measurements through RIS phase adjustment. This study successfully realizes 3D single-frequency imaging by exploiting the near-field multi-view image correlations deduced from user mobility. We first highlight the significance of considering anisotropy in multi-view image formation by investigating radar cross-section properties and diffraction resolution limits. We then propose a novel model for joint multi-view 3D imaging that incorporates occlusion effects and anisotropic scattering. These factors lead to slow image support variation and smooth coefficient evolution, which are mathematically modeled as Markov processes. Based on this model, we employ the Expectation Maximization-Turbo-Generalized Approximate Message Passing algorithm for joint multi-view single-frequency 3D imaging with limited measurements. Simulation results reveal the superiority of joint multi-view imaging in terms of enhanced imaging ranges, accuracies, and anisotropy characterization compared to single-view imaging. Combining adjacent observations for joint multi-view imaging enables a reduction in the measurement overhead by 80%.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
EAGLE: An Edge-Aware Gradient Localization Enhanced Loss for CT Image Reconstruction
Authors:
Yipeng Sun,
Yixing Huang,
Linda-Sophie Schneider,
Mareike Thies,
Mingxuan Gu,
Siyuan Mei,
Siming Bayer,
Andreas Maier
Abstract:
Computed Tomography (CT) image reconstruction is crucial for accurate diagnosis and deep learning approaches have demonstrated significant potential in improving reconstruction quality. However, the choice of loss function profoundly affects the reconstructed images. Traditional mean squared error loss often produces blurry images lacking fine details, while alternatives designed to improve may in…
▽ More
Computed Tomography (CT) image reconstruction is crucial for accurate diagnosis and deep learning approaches have demonstrated significant potential in improving reconstruction quality. However, the choice of loss function profoundly affects the reconstructed images. Traditional mean squared error loss often produces blurry images lacking fine details, while alternatives designed to improve may introduce structural artifacts or other undesirable effects. To address these limitations, we propose Eagle-Loss, a novel loss function designed to enhance the visual quality of CT image reconstructions. Eagle-Loss applies spectral analysis of localized features within gradient changes to enhance sharpness and well-defined edges. We evaluated Eagle-Loss on two public datasets across low-dose CT reconstruction and CT field-of-view extension tasks. Our results show that Eagle-Loss consistently improves the visual quality of reconstructed images, surpassing state-of-the-art methods across various network architectures. Code and data are available at \url{https://github.com/sypsyp97/Eagle_Loss}.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Unsupervised Spatio-Temporal State Estimation for Fine-grained Adaptive Anomaly Diagnosis of Industrial Cyber-physical Systems
Authors:
Haili Sun,
Yan Huang,
Lansheng Han,
Cai Fu,
Chunjie Zhou
Abstract:
Accurate detection and diagnosis of abnormal behaviors such as network attacks from multivariate time series (MTS) are crucial for ensuring the stable and effective operation of industrial cyber-physical systems (CPS). However, existing researches pay little attention to the logical dependencies among system working states, and have difficulties in explaining the evolution mechanisms of abnormal s…
▽ More
Accurate detection and diagnosis of abnormal behaviors such as network attacks from multivariate time series (MTS) are crucial for ensuring the stable and effective operation of industrial cyber-physical systems (CPS). However, existing researches pay little attention to the logical dependencies among system working states, and have difficulties in explaining the evolution mechanisms of abnormal signals. To reveal the spatio-temporal association relationships and evolution mechanisms of the working states of industrial CPS, this paper proposes a fine-grained adaptive anomaly diagnosis method (i.e. MAD-Transformer) to identify and diagnose anomalies in MTS. MAD-Transformer first constructs a temporal state matrix to characterize and estimate the change patterns of the system states in the temporal dimension. Then, to better locate the anomalies, a spatial state matrix is also constructed to capture the inter-sensor state correlation relationships within the system. Subsequently, based on these two types of state matrices, a three-branch structure of series-temporal-spatial attention module is designed to simultaneously capture the series, temporal, and space dependencies among MTS. Afterwards, three associated alignment loss functions and a reconstruction loss are constructed to jointly optimize the model. Finally, anomalies are determined and diagnosed by comparing the residual matrices with the original matrices. We conducted comparative experiments on five publicly datasets spanning three application domains (service monitoring, spatial and earth exploration, and water treatment), along with a petroleum refining simulation dataset collected by ourselves. The results demonstrate that MAD-Transformer can adaptively detect fine-grained anomalies with short duration, and outperforms the state-of-the-art baselines in terms of noise robustness and localization performance.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Secure and Scalable Network Slicing with Plug-and-Play Support for Power Distribution System Communication Networks
Authors:
Jian Zhong,
Chen Chen,
Yuqi Qian,
Yiheng Bian,
Yuxiong Huang,
Zhaohong Bie
Abstract:
With the rapid development of power distribution systems (PDSs), the number of terminal devices and the types of delivered services involved are constantly growing. These trends make the operations of PDSs highly dependent on the support of advanced communication networks, which face two related challenges. The first is to provide sufficient flexibility, resilience, and security to meet varying de…
▽ More
With the rapid development of power distribution systems (PDSs), the number of terminal devices and the types of delivered services involved are constantly growing. These trends make the operations of PDSs highly dependent on the support of advanced communication networks, which face two related challenges. The first is to provide sufficient flexibility, resilience, and security to meet varying demands and ensure the proper operation of gradually diversifying network services. The second is to realize the automatic identification of terminal devices, thus reducing the network maintenance burden. To solve these problems, this paper presents a novel multiservice network integration and device authentication slice-based network slicing scheme. In this scheme, the integration of PDS communication networks enables network resource sharing, and recovery from communication interruption is achieved through network slicing in the integrated network. Authentication servers periodically poll terminal devices, adjusting network slice ranges based on authentication results, thereby facilitating dynamic network slicing. Additionally, secure plug-and-play support for PDS terminal devices and network protection are achieved through device identification and dynamic adjustment of network slices. On this basis, a network optimization and upgrading methodology for load balancing and robustness enhancement is further proposed. This approach is designed to improve the performance of PDS communication networks, adapting to ongoing PDS development and the evolution of PDS services. The simulation results show that the proposed schemes endow a PDS communication network with favorable resource utilization, fault recovery, terminal device plug-and-play support, load balancing, and improved network robustness.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Resilient Microgrid Formation Considering Communication Interruptions
Authors:
Jian Zhong,
Chen Chen,
Young-** Kim,
Yuxiong Huang,
Mengjie Teng,
Yiheng Bian,
Zhaohong Bie
Abstract:
Distribution system (DS) communication failures following extreme events often degrade monitoring and control functions, thus preventing the acquisition of complete global DS component state information, on which existing post-disaster DS restoration methods are based. This letter proposes methods of inferring the states of DS components in the case of incomplete component state information. By us…
▽ More
Distribution system (DS) communication failures following extreme events often degrade monitoring and control functions, thus preventing the acquisition of complete global DS component state information, on which existing post-disaster DS restoration methods are based. This letter proposes methods of inferring the states of DS components in the case of incomplete component state information. By using the known DS information, the operating states of unobservable DS branches and buses can be inferred, providing complete information for DS performance restoration before full communication recovery
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
RISAR: RIS-assisted Human Activity Recognition with Commercial Wi-Fi Devices
Authors:
Junshuo Liu,
Yunlong Huang,
Wei Yang,
Zhe Li,
Ru**g Xiong,
Tiebin Mi,
Xin Shi,
Robert C. Qiu
Abstract:
Human activity recognition (HAR) holds significant importance in smart homes, security, and healthcare. Existing systems face limitations because of the insufficient spatial diversity provided by a limited number of antennas. Furthermore, inefficiencies in noise reduction and feature extraction from sensing data pose challenges to recognition performance. This study presents a reconfigurable intel…
▽ More
Human activity recognition (HAR) holds significant importance in smart homes, security, and healthcare. Existing systems face limitations because of the insufficient spatial diversity provided by a limited number of antennas. Furthermore, inefficiencies in noise reduction and feature extraction from sensing data pose challenges to recognition performance. This study presents a reconfigurable intelligent surface (RIS)-assisted passive human activity recognition (RISAR) method, compatible with commercial Wi-Fi devices. RISAR leverages a RIS to enhance the spatial diversity of Wi-Fi signals, effectively capturing a wider range of information distributed across the spatial domain. A novel high-dimensional factor model based on random matrix theory is proposed to address noise reduction and feature extraction in the temporal domain. A dual-stream spatial-temporal attention network model is developed to assign variable weights to different characteristics and sequences, mimicking human cognitive processes in prioritizing essential information. Experimental analysis shows that RISAR significantly outperforms existing HAR methods in accuracy and efficiency, achieving an average accuracy of 97.26%. These findings underscore RISAR's adaptability and potential as a robust activity recognition solution in real environments.
△ Less
Submitted 20 March, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Offline Learning of Decision Functions in Multiplayer Games with Expectation Constraints
Authors:
Yuanhanqing Huang,
Jianghai Hu
Abstract:
We explore a class of stochastic multiplayer games where each player in the game aims to optimize its objective under uncertainty and adheres to some expectation constraints. The study employs an offline learning paradigm, leveraging a pre-existing dataset containing auxiliary features. While prior research in deterministic and stochastic multiplayer games primarily explored vector-valued decision…
▽ More
We explore a class of stochastic multiplayer games where each player in the game aims to optimize its objective under uncertainty and adheres to some expectation constraints. The study employs an offline learning paradigm, leveraging a pre-existing dataset containing auxiliary features. While prior research in deterministic and stochastic multiplayer games primarily explored vector-valued decisions, this work departs by considering function-valued decisions that incorporate auxiliary features as input. We leverage the law of large deviations and degree theory to establish the almost sure convergence of the offline learning solution to the true solution as the number of data samples increases. Finally, we demonstrate the validity of our method via a multi-account portfolio optimization problem.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
Authors:
Yujia Huang,
Adishree Ghatare,
Yuanzhe Liu,
Ziniu Hu,
Qinsheng Zhang,
Chandramouli S Sastry,
Siddharth Gururani,
Sageev Oore,
Yisong Yue
Abstract:
We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel gui…
▽ More
We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.
△ Less
Submitted 2 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
wmh_seg: Transformer based U-Net for Robust and Automatic White Matter Hyperintensity Segmentation across 1.5T, 3T and 7T
Authors:
**ghang Li,
Tales Santini,
Yuanzhe Huang,
Joseph M. Mettenburg,
Tamer S. Ibrahim,
Howard J. Aizenstein,
Minjie Wu
Abstract:
White matter hyperintensity (WMH) remains the top imaging biomarker for neurodegenerative diseases. Robust and accurate segmentation of WMH holds paramount significance for neuroimaging studies. The growing shift from 3T to 7T MRI necessitates robust tools for harmonized segmentation across field strengths and artifacts. Recent deep learning models exhibit promise in WMH segmentation but still fac…
▽ More
White matter hyperintensity (WMH) remains the top imaging biomarker for neurodegenerative diseases. Robust and accurate segmentation of WMH holds paramount significance for neuroimaging studies. The growing shift from 3T to 7T MRI necessitates robust tools for harmonized segmentation across field strengths and artifacts. Recent deep learning models exhibit promise in WMH segmentation but still face challenges, including diverse training data representation and limited analysis of MRI artifacts' impact. To address these, we introduce wmh_seg, a novel deep learning model leveraging a transformer-based encoder from SegFormer. wmh_seg is trained on an unmatched dataset, including 1.5T, 3T, and 7T FLAIR images from various sources, alongside with artificially added MR artifacts. Our approach bridges gaps in training diversity and artifact analysis. Our model demonstrated stable performance across magnetic field strengths, scanner manufacturers, and common MR imaging artifacts. Despite the unique inhomogeneity artifacts on ultra-high field MR images, our model still offers robust and stable segmentation on 7T FLAIR images. Our model, to date, is the first that offers quality white matter lesion segmentation on 7T FLAIR images.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
TC-DiffRecon: Texture coordination MRI reconstruction method based on diffusion model and modified MF-UNet method
Authors:
Chenyan Zhang,
Yifei Chen,
Zhenxiong Fan,
Yiyu Huang,
Wenchao Weng,
Ruiquan Ge,
Dong Zeng,
Changmiao Wang
Abstract:
Recently, diffusion models have gained significant attention as a novel set of deep learning-based generative methods. These models attempt to sample data from a Gaussian distribution that adheres to a target distribution, and have been successfully adapted to the reconstruction of MRI data. However, as an unconditional generative model, the diffusion model typically disrupts image coordination be…
▽ More
Recently, diffusion models have gained significant attention as a novel set of deep learning-based generative methods. These models attempt to sample data from a Gaussian distribution that adheres to a target distribution, and have been successfully adapted to the reconstruction of MRI data. However, as an unconditional generative model, the diffusion model typically disrupts image coordination because of the consistent projection of data introduced by conditional bootstrap. This often results in image fragmentation and incoherence. Furthermore, the inherent limitations of the diffusion model often lead to excessive smoothing of the generated images. In the same vein, some deep learning-based models often suffer from poor generalization performance, meaning their effectiveness is greatly affected by different acceleration factors. To address these challenges, we propose a novel diffusion model-based MRI reconstruction method, named TC-DiffRecon, which does not rely on a specific acceleration factor for training. We also suggest the incorporation of the MF-UNet module, designed to enhance the quality of MRI images generated by the model while mitigating the over-smoothing issue to a certain extent. During the image generation sampling process, we employ a novel TCKG module and a Coarse-to-Fine sampling scheme. These additions aim to harmonize image texture, expedite the sampling process, while achieving data consistency. Our source code is available at https://github.com/JustlfC03/TC-DiffRecon.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Comparative Analysis of ImageNet Pre-Trained Deep Learning Models and DINOv2 in Medical Imaging Classification
Authors:
Yuning Huang,
**gchen Zou,
Lanxi Meng,
Xin Yue,
Qing Zhao,
Jianqiang Li,
Changwei Song,
Gabriel Jimenez,
Shaowu Li,
Guanghui Fu
Abstract:
Medical image analysis frequently encounters data scarcity challenges. Transfer learning has been effective in addressing this issue while conserving computational resources. The recent advent of foundational models like the DINOv2, which uses the vision transformer architecture, has opened new opportunities in the field and gathered significant interest. However, DINOv2's performance on clinical…
▽ More
Medical image analysis frequently encounters data scarcity challenges. Transfer learning has been effective in addressing this issue while conserving computational resources. The recent advent of foundational models like the DINOv2, which uses the vision transformer architecture, has opened new opportunities in the field and gathered significant interest. However, DINOv2's performance on clinical data still needs to be verified. In this paper, we performed a glioma grading task using three clinical modalities of brain MRI data. We compared the performance of various pre-trained deep learning models, including those based on ImageNet and DINOv2, in a transfer learning context. Our focus was on understanding the impact of the freezing mechanism on performance. We also validated our findings on three other types of public datasets: chest radiography, fundus radiography, and dermoscopy. Our findings indicate that in our clinical dataset, DINOv2's performance was not as strong as ImageNet-based pre-trained models, whereas in public datasets, DINOv2 generally outperformed other models, especially when using the frozen mechanism. Similar performance was observed with various sizes of DINOv2 models across different tasks. In summary, DINOv2 is viable for medical image classification tasks, particularly with data resembling natural images. However, its effectiveness may vary with data that significantly differs from natural images such as MRI. In addition, employing smaller versions of the model can be adequate for medical task, offering resource-saving benefits. Our codes are available at https://github.com/GuanghuiFU/medical_DINOv2_eval.
△ Less
Submitted 13 February, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Dual-modal Tactile E-skin: Enabling Bidirectional Human-Robot Interaction via Integrated Tactile Perception and Feedback
Authors:
Shilong Mu,
Runze Zhao,
Zenan Lin,
Yan Huang,
Shoujie Li,
Chenchang Li,
Xiao-** Zhang,
Wenbo Ding
Abstract:
To foster an immersive and natural human-robot interaction, the implementation of tactile perception and feedback becomes imperative, effectively bridging the conventional sensory gap. In this paper, we propose a dual-modal electronic skin (e-skin) that integrates magnetic tactile sensing and vibration feedback for enhanced human-robot interaction. The dual-modal tactile e-skin offers multi-functi…
▽ More
To foster an immersive and natural human-robot interaction, the implementation of tactile perception and feedback becomes imperative, effectively bridging the conventional sensory gap. In this paper, we propose a dual-modal electronic skin (e-skin) that integrates magnetic tactile sensing and vibration feedback for enhanced human-robot interaction. The dual-modal tactile e-skin offers multi-functional tactile sensing and programmable haptic feedback, underpinned by a layered structure comprised of flexible magnetic films, soft silicone, a Hall sensor and actuator array, and a microcontroller unit. The e-skin captures the magnetic field changes caused by subtle deformations through Hall sensors, employing deep learning for accurate tactile perception. Simultaneously, the actuator array generates mechanical vibrations to facilitate haptic feedback, delivering diverse mechanical stimuli. Notably, the dual-modal e-skin is capable of transmitting tactile information bidirectionally, enabling object recognition and fine-weighing operations. This bidirectional tactile interaction framework will enhance the immersion and efficiency of interactions between humans and robots.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
A Collaborative Model-driven Network for MRI Reconstruction
Authors:
Xiaoyu Qiao,
Weisheng Li,
Guofen Wang,
Yu** Huang
Abstract:
Deep learning (DL)-based methods offer a promising solution to reduce the prolonged scanning time in magnetic resonance imaging (MRI). While model-driven DL methods have demonstrated convincing results by incorporating prior knowledge into deep networks, further exploration is needed to optimize the integration of diverse priors.. Existing model-driven networks typically utilize linearly stacked u…
▽ More
Deep learning (DL)-based methods offer a promising solution to reduce the prolonged scanning time in magnetic resonance imaging (MRI). While model-driven DL methods have demonstrated convincing results by incorporating prior knowledge into deep networks, further exploration is needed to optimize the integration of diverse priors.. Existing model-driven networks typically utilize linearly stacked unrolled cascades to mimic iterative solution steps in optimization algorithms. However, this approach needs to find a balance between different prior-based regularizers during training, resulting in slower convergence and suboptimal reconstructions. To overcome the limitations, we propose a collaborative model-driven network to maximally exploit the complementarity of different regularizers. We design attention modules to learn both the relative confidence (RC) and overall confidence (OC) for the intermediate reconstructions (IRs) generated by different prior-based subnetworks. RC assigns more weight to the areas of expertise of the subnetworks, enabling precise element-wise collaboration. We design correction modules to tackle bottleneck scenarios where both subnetworks exhibit low accuracy, and they further optimize the IRs based on OC maps. IRs across various stages are concatenated and fed to the attention modules to build robust and accurate confidence maps. Experimental results on multiple datasets showed significant improvements in the final results without additional computational costs. Moreover, the proposed model-driven network design strategy can be conveniently applied to various model-driven methods to improve their performance.
△ Less
Submitted 5 May, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
A compact and cost-effective laser-powered speckle visibility spectroscopy (SVS) device for measuring cerebral blood flow
Authors:
Yu Xi Huang,
Simon Mahler,
Maya Dickson,
Aidin Abedi,
Julian M. Tyszka,
Jack Lo Yu Tung,
Jonathan Russin,
Charles Liu,
Changhuei Yang
Abstract:
In the realm of cerebrovascular monitoring, primary metrics typically include blood pressure, which influences cerebral blood flow (CBF) and is contingent upon vessel radius. Measuring CBF non-invasively poses a persistent challenge, primarily attributed to the difficulty of accessing and obtaining signal from the brain. This study aims to introduce a compact speckle visibility spectroscopy (SVS)…
▽ More
In the realm of cerebrovascular monitoring, primary metrics typically include blood pressure, which influences cerebral blood flow (CBF) and is contingent upon vessel radius. Measuring CBF non-invasively poses a persistent challenge, primarily attributed to the difficulty of accessing and obtaining signal from the brain. This study aims to introduce a compact speckle visibility spectroscopy (SVS) device designed for non-invasive CBF measurements, offering cost-effectiveness and scalability while tracking CBF with remarkable sensitivity and temporal resolution. The wearable hardware has a modular design approach consisting solely of a laser diode as the source and a meticulously selected board camera as the detector. They both can be easily placed on the head of a subject to measure CBF with no additional optical elements. The SVS device can achieve a sampling rate of 80 Hz with minimal susceptibility to external disturbances. The device also achieves better SNR compared with traditional fiber-based SVS devices, capturing about 70 times more signal and showing superior stability and reproducibility. It is designed to be paired and distributed in multiple configurations around the head, and measure signals that exceed the quality of prior optical CBF measurement techniques. Given its cost-effectiveness, scalability, and simplicity, this laser-centric tool offers significant potential in advancing non-invasive cerebral monitoring technologies.
△ Less
Submitted 8 February, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Two-View Topogram-Based Anatomy-Guided CT Reconstruction for Prospective Risk Minimization
Authors:
Chang Liu,
Laura Klein,
Yixing Huang,
Edith Baader,
Michael Lell,
Marc Kachelrieß,
Andreas Maier
Abstract:
To facilitate a prospective estimation of CT effective dose and risk minimization process, a prospective spatial dose estimation and the known anatomical structures are expected. To this end, a CT reconstruction method is required to reconstruct CT volumes from as few projections as possible, i.e. by using the topograms, with anatomical structures as correct as possible. In this work, an optimized…
▽ More
To facilitate a prospective estimation of CT effective dose and risk minimization process, a prospective spatial dose estimation and the known anatomical structures are expected. To this end, a CT reconstruction method is required to reconstruct CT volumes from as few projections as possible, i.e. by using the topograms, with anatomical structures as correct as possible. In this work, an optimized CT reconstruction model based on a generative adversarial network (GAN) is proposed. The GAN is trained to reconstruct 3D volumes from an anterior-posterior and a lateral CT projection. To enhance anatomical structures, a pre-trained organ segmentation network and the 3D perceptual loss are applied during the training phase, so that the model can then generate both organ-enhanced CT volume and the organ segmentation mask. The proposed method can reconstruct CT volumes with PSNR of 26.49, RMSE of 196.17, and SSIM of 0.64, compared to 26.21, 201.55 and 0.63 using the baseline method. In terms of the anatomical structure, the proposed method effectively enhances the organ shape and boundary and allows for a straight-forward identification of the relevant anatomical structures. We note that conventional reconstruction metrics fail to indicate the enhancement of anatomical structures. In addition to such metrics, the evaluation is expanded with assessing the organ segmentation performance. The average organ dice of the proposed method is 0.71 compared with 0.63 in baseline model, indicating the enhancement of anatomical structures.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Authors:
Ju Lin,
Niko Moritz,
Yiteng Huang,
Ruiming Xie,
Ming Sun,
Christian Fuegen,
Frank Seide
Abstract:
Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart glasses that have microphone arrays, which fuses multi-channel ASR with serialized output training, for wearer/conversation-partner disambiguation as well as s…
▽ More
Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart glasses that have microphone arrays, which fuses multi-channel ASR with serialized output training, for wearer/conversation-partner disambiguation as well as suppression of cross-talk speech from non-target directions and noise.
When ASR work is part of a broader system-development process, one may be faced with changes to microphone geometries as system development progresses.
This paper aims to make multi-channel ASR insensitive to limited variations of microphone-array geometry. We show that a model trained on multiple similar geometries is largely agnostic and generalizes well to new geometries, as long as they are not too different. Furthermore, training the model this way improves accuracy for seen geometries by 15 to 28\% relative. Lastly, we refine the beamforming by a novel Non-Linearly Constrained Minimum Variance criterion.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Dynamic Indoor Fingerprinting Localization based on Few-Shot Meta-Learning with CSI Images
Authors:
Jiyu Jiao,
Xiaojun Wang,
Chenpei Han,
Yuhua Huang,
Yizhuo Zhang
Abstract:
While fingerprinting localization is favored for its effectiveness, it is hindered by high data acquisition costs and the inaccuracy of static database-based estimates. Addressing these issues, this letter presents an innovative indoor localization method using a data-efficient meta-learning algorithm. This approach, grounded in the ``Learning to Learn'' paradigm of meta-learning, utilizes histori…
▽ More
While fingerprinting localization is favored for its effectiveness, it is hindered by high data acquisition costs and the inaccuracy of static database-based estimates. Addressing these issues, this letter presents an innovative indoor localization method using a data-efficient meta-learning algorithm. This approach, grounded in the ``Learning to Learn'' paradigm of meta-learning, utilizes historical localization tasks to improve adaptability and learning efficiency in dynamic indoor environments. We introduce a task-weighted loss to enhance knowledge transfer within this framework. Our comprehensive experiments confirm the method's robustness and superiority over current benchmarks, achieving a notable 23.13\% average gain in Mean Euclidean Distance, particularly effective in scenarios with limited CSI data.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.