-
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
Authors:
Yi Zhu,
Tiago Falk
Abstract:
Speech is known to carry health-related attributes, which has emerged as a novel venue for remote and long-term health monitoring. However, existing models are usually tailored for a specific type of disease, and have been shown to lack generalizability across datasets. Furthermore, concerns have been raised recently towards the leakage of speaker identity from health embeddings. To mitigate these…
▽ More
Speech is known to carry health-related attributes, which has emerged as a novel venue for remote and long-term health monitoring. However, existing models are usually tailored for a specific type of disease, and have been shown to lack generalizability across datasets. Furthermore, concerns have been raised recently towards the leakage of speaker identity from health embeddings. To mitigate these limitations, we propose WavRx, a speech health diagnostics model that captures the respiration and articulation related dynamics from a universal speech representation. Our in-domain and cross-domain experiments on six pathological speech datasets demonstrate WavRx as a new state-of-the-art health diagnostic model. Furthermore, we show that the amount of speaker identity entailed in the WavRx health embeddings is significantly reduced without extra guidance during training. An in-depth analysis of the model was performed, thus providing physiological interpretation of its improved generalizability and privacy-preserving ability.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Enhancing Wearable based Real-Time Glucose Monitoring via Phasic Image Representation Learning based Deep Learning
Authors:
Yidong Zhu,
Nadia B Aimandi,
Mohammad Arif Ul Alam
Abstract:
In the U.S., over a third of adults are pre-diabetic, with 80\% unaware of their status. This underlines the need for better glucose monitoring to prevent type 2 diabetes and related heart diseases. Existing wearable glucose monitors are limited by the lack of models trained on small datasets, as collecting extensive glucose data is often costly and impractical. Our study introduces a novel machin…
▽ More
In the U.S., over a third of adults are pre-diabetic, with 80\% unaware of their status. This underlines the need for better glucose monitoring to prevent type 2 diabetes and related heart diseases. Existing wearable glucose monitors are limited by the lack of models trained on small datasets, as collecting extensive glucose data is often costly and impractical. Our study introduces a novel machine learning method using modified recurrence plots in the frequency domain to improve glucose level prediction accuracy from wearable device data, even with limited datasets. This technique combines advanced signal processing with machine learning to extract more meaningful features. We tested our method against existing models using historical data, showing that our approach surpasses the current 87\% accuracy benchmark in predicting real-time interstitial glucose levels.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Authors:
Haoqiu Yan,
Yongxin Zhu,
Kai Zheng,
Bing Liu,
Haoyu Cao,
Deqiang Jiang,
Linli Xu
Abstract:
Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions…
▽ More
Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions, resulting in inconsistent or even contradictory responses within dialogues. To bridge this gap, in this paper, we propose PerceptiveAgent, an empathetic multi-modal dialogue system designed to discern deeper or more subtle meanings beyond the literal interpretations of words through the integration of speech modality perception. Employing LLMs as a cognitive core, PerceptiveAgent perceives acoustic information from input speech and generates empathetic responses based on speaking styles described in natural language. Experimental results indicate that PerceptiveAgent excels in contextual understanding by accurately discerning the speakers' true intentions in scenarios where the linguistic meaning is either contrary to or inconsistent with the speaker's true feelings, producing more nuanced and expressive spoken dialogues. Code is publicly available at: \url{https://github.com/Haoqiu-Yan/PerceptiveAgent}.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms
Authors:
Yifei Chen,
Zhu Zhu,
Shenghao Zhu,
Linwei Qiu,
Binfeng Zou,
Fan Jia,
Yunpeng Zhu,
Chenyan Zhang,
Zhaojie Fang,
Feiwei Qin,
** Fan,
Changmiao Wang,
Yu Gao,
Gang Yu
Abstract:
The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund…
▽ More
The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redundant feature extraction when processing high-dimensional microimage data. We propose a novel fine-grained classification model, SCKansformer, for bone marrow blood cells, which addresses these challenges and enhances classification accuracy and efficiency. The model integrates the Kansformer Encoder, SCConv Encoder, and Global-Local Attention Encoder. The Kansformer Encoder replaces the traditional MLP layer with the KAN, improving nonlinear feature representation and interpretability. The SCConv Encoder, with its Spatial and Channel Reconstruction Units, enhances feature representation and reduces redundancy. The Global-Local Attention Encoder combines Multi-head Self-Attention with a Local Part module to capture both global and local features. We validated our model using the Bone Marrow Blood Cell Fine-Grained Classification Dataset (BMCD-FGCD), comprising over 10,000 samples and nearly 40 classifications, developed with a partner hospital. Comparative experiments on our private dataset, as well as the publicly available PBC and ALL-IDB datasets, demonstrate that SCKansformer outperforms both typical and advanced microcell classification methods across all datasets. Our source code and private BMCD-FGCD dataset are available at https://github.com/JustlfC03/SCKansformer.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Joint Channel Estimation and Prediction for Massive MIMO with Frequency Hop** Sounding
Authors:
Yiming Zhu,
Jiawei Zhuang,
Gangle Sun,
Hongwei Hou,
Li You,
Wen** Wang
Abstract:
In massive multiple-input multiple-output (MIMO) systems, the downlink transmission performance heavily relies on accurate channel state information (CSI). Constrained by the transmitted power, user equipment always transmits sounding reference signals (SRSs) to the base station through frequency hop**, which will be leveraged to estimate uplink CSI and subsequently predict downlink CSI. This pa…
▽ More
In massive multiple-input multiple-output (MIMO) systems, the downlink transmission performance heavily relies on accurate channel state information (CSI). Constrained by the transmitted power, user equipment always transmits sounding reference signals (SRSs) to the base station through frequency hop**, which will be leveraged to estimate uplink CSI and subsequently predict downlink CSI. This paper aims to investigate joint channel estimation and prediction (JCEP) for massive MIMO with frequency hop** sounding (FHS). Specifically, we present a multiple-subband (MS) delay-angle-Doppler (DAD) domain channel model with off-grid basis to tackle the energy leakage problem. Furthermore, we formulate the JCEP problem with FHS as a multiple measurement vector (MMV) problem, facilitating the sharing of common CSI across different subbands. To solve this problem, we propose an efficient Off-Grid-MS hybrid message passing (HMP) algorithm under the constrained Bethe free energy (BFE) framework. Aiming to address the lack of prior CSI in practical scenarios, the proposed algorithm can adaptively learn the hyper-parameters of the channel by minimizing the corresponding terms in the BFE expression. To alleviate the complexity of channel hyper-parameter learning, we leverage the approximations of the off-grid matrices to simplify the off-grid hyper-parameter estimation. Numerical results illustrate that the proposed algorithm can effectively mitigate the energy leakage issue and exploit the common CSI across different subbands, acquiring more accurate CSI compared to state-of-the-art counterparts.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Flexible Music-Conditioned Dance Generation with Style Description Prompts
Authors:
Hongsong Wang,
Yin Zhu,
Xin Geng
Abstract:
Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task. Most dance generation methods primarily rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre. In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framew…
▽ More
Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task. Most dance generation methods primarily rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre. In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framework suitable for diversified tasks of dance generation by fully leveraging the semantics of music style. The core component of this framework is Music-Conditioned Style-Aware Diffusion (MCSAD), which comprises a Transformer-based network and a music Style Modulation module. The MCSAD seemly integrates music conditions and style description prompts into the dance generation framework, ensuring that generated dances are consistent with the music content and style. To facilitate flexible dance generation and accommodate different tasks, a spatial-temporal masking strategy is effectively applied in the backward diffusion process. The proposed framework successfully generates realistic dance sequences that are accurately aligned with music for a variety of tasks such as long-term generation, dance in-betweening, dance inpainting, and etc. We hope that this work has the potential to inspire dance generation and creation, with promising applications in entertainment, art, and education.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning
Authors:
Xin Wang,
Zhiyun Song,
Yitao Zhu,
Sheng Wang,
Lichi Zhang,
Dinggang Shen,
Qian Wang
Abstract:
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.…
▽ More
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated. However, most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios. In this work, we propose a self-supervised super-resolution framework for inter-slice super-resolution of MR images. Our framework is first featured by pre-training on video dataset, as temporal correlation of videos is found beneficial for modeling the spatial relation among MR slices. Then, we use public high-quality MR dataset to fine-tune our pre-trained model, for enhancing awareness of our model to medical data. Finally, given a target dataset at hand, we utilize self-supervised fine-tuning to further ensure our model works well with user-specific super-resolution tasks. The proposed method demonstrates superior performance compared to other self-supervised methods and also holds the potential to benefit various downstream applications.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
UrBAN: Urban Beehive Acoustics and PheNoty** Dataset
Authors:
Mahsa Abdollahi,
Yi Zhu,
Heitor R. Guimarães,
Nico Coallier,
Ségolène Maucourt,
Pierre Giovenazzo,
Tiago H. Falk
Abstract:
In this paper, we present a multimodal dataset obtained from a honey bee colony in Montréal, Quebec, Canada, spanning the years of 2021 to 2022. This apiary comprised 10 beehives, with microphones recording more than 2000 hours of high quality raw audio, and also sensors capturing temperature, and humidity. Periodic hive inspections involved monitoring colony honey bee population changes, assessin…
▽ More
In this paper, we present a multimodal dataset obtained from a honey bee colony in Montréal, Quebec, Canada, spanning the years of 2021 to 2022. This apiary comprised 10 beehives, with microphones recording more than 2000 hours of high quality raw audio, and also sensors capturing temperature, and humidity. Periodic hive inspections involved monitoring colony honey bee population changes, assessing queen-related conditions, and documenting overall hive health. Additionally, health metrics, such as Varroa mite infestation rates and winter mortality assessments were recorded, offering valuable insights into factors affecting hive health status and resilience. In this study, we first outline the data collection process, sensor data description, and dataset structure. Furthermore, we demonstrate a practical application of this dataset by extracting various features from the raw audio to predict colony population using the number of frames of bees as a proxy.
△ Less
Submitted 20 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Authors:
Yongxin Zhu,
Dan Su,
Liqiang He,
Linli Xu,
Dong Yu
Abstract:
While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs. In this paper, we introduce \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST), a hierarchical transformer designed for efficient speech language modeling. GPST quantizes audio wavef…
▽ More
While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs. In this paper, we introduce \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST), a hierarchical transformer designed for efficient speech language modeling. GPST quantizes audio waveforms into two distinct types of discrete speech representations and integrates them within a hierarchical transformer architecture, allowing for a unified one-stage generation process and enhancing Hi-Res audio generation capabilities. By training on large corpora of speeches in an end-to-end unsupervised manner, GPST can generate syntactically consistent speech with diverse speaker identities. Given a brief 3-second prompt, GPST can produce natural and coherent personalized speech, demonstrating in-context learning abilities. Moreover, our approach can be easily extended to spoken cross-lingual speech generation by incorporating multi-lingual semantic tokens and universal acoustic tokens. Experimental results indicate that GPST significantly outperforms the existing speech language models in terms of word error rate, speech quality, and speaker similarity. See \url{https://youngsheen.github.io/GPST/demo} for demo samples.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Diff-DTI: Fast Diffusion Tensor Imaging Using A Feature-Enhanced Joint Diffusion Model
Authors:
Lang Zhang,
**ling He,
Dong Liang,
Hairong Zheng,
Yanjie Zhu
Abstract:
Magnetic resonance diffusion tensor imaging (DTI) is a critical tool for neural disease diagnosis. However, long scan time greatly hinders the widespread clinical use of DTI. To accelerate image acquisition, a feature-enhanced joint diffusion model (Diff-DTI) is proposed to obtain accurate DTI parameter maps from a limited number of diffusion-weighted images (DWIs). Diff-DTI introduces a joint dif…
▽ More
Magnetic resonance diffusion tensor imaging (DTI) is a critical tool for neural disease diagnosis. However, long scan time greatly hinders the widespread clinical use of DTI. To accelerate image acquisition, a feature-enhanced joint diffusion model (Diff-DTI) is proposed to obtain accurate DTI parameter maps from a limited number of diffusion-weighted images (DWIs). Diff-DTI introduces a joint diffusion model that directly learns the joint probability distribution of DWIs with DTI parametric maps for conditional generation. Additionally, a feature enhancement fusion mechanism (FEFM) is designed and incorporated into the generative process of Diff-DTI to preserve fine structures in the generated DTI maps. A comprehensive evaluation of the performance of Diff-DTI was conducted on the Human Connectome Project dataset. The results demonstrate that Diff-DTI outperforms existing state-of-the-art fast DTI imaging methods in terms of visual quality and quantitative metrics. Furthermore, Diff-DTI has shown the ability to produce high-fidelity DTI maps with only three DWIs, thus overcoming the requirement of a minimum of six DWIs for DTI.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Efficient Navigation of a Robotic Fish Swimming Across the Vortical Flow Field
Authors:
Haodong Feng,
Dehan Yuan,
Jiale Miao,
Jie You,
Yue Wang,
Yi Zhu,
Dixia Fan
Abstract:
Navigating efficiently across vortical flow fields presents a significant challenge in various robotic applications. The dynamic and unsteady nature of vortical flows often disturbs the control of underwater robots, complicating their operation in hydrodynamic environments. Conventional control methods, which depend on accurate modeling, fail in these settings due to the complexity of fluid-struct…
▽ More
Navigating efficiently across vortical flow fields presents a significant challenge in various robotic applications. The dynamic and unsteady nature of vortical flows often disturbs the control of underwater robots, complicating their operation in hydrodynamic environments. Conventional control methods, which depend on accurate modeling, fail in these settings due to the complexity of fluid-structure interactions (FSI) caused by unsteady hydrodynamics. This study proposes a deep reinforcement learning (DRL) algorithm, trained in a data-driven manner, to enable efficient navigation of a robotic fish swimming across vortical flows. Our proposed algorithm incorporates the LSTM architecture and uses several recent consecutive observations as the state to address the issue of partial observation, often due to sensor limitations. We present a numerical study of navigation within a Karman vortex street, created by placing a stationary cylinder in a uniform flow, utilizing the immersed boundary-lattice Boltzmann method (IB-LBM). The aim is to train the robotic fish to discover efficient navigation policies, enabling it to reach a designated target point across the Karman vortex street from various initial positions. After training, the fish demonstrates the ability to rapidly reach the target from different initial positions, showcasing the effectiveness and robustness of our proposed algorithm. Analysis of the results reveals that the robotic fish can leverage velocity gains and pressure differences induced by the vortices to reach the target, underscoring the potential of our proposed algorithm in enhancing navigation in complex hydrodynamic environments.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation
Authors:
Yuhua Zhu
Abstract:
In this paper, we address the problem of continuous-time reinforcement learning in scenarios where the dynamics follow a stochastic differential equation. When the underlying dynamics remain unknown and we have access only to discrete-time information, how can we effectively conduct policy evaluation? We first highlight that the commonly used Bellman equation (BE) is not always a reliable approxim…
▽ More
In this paper, we address the problem of continuous-time reinforcement learning in scenarios where the dynamics follow a stochastic differential equation. When the underlying dynamics remain unknown and we have access only to discrete-time information, how can we effectively conduct policy evaluation? We first highlight that the commonly used Bellman equation (BE) is not always a reliable approximation to the true value function. We then introduce a new bellman equation, PhiBE, which integrates the discrete-time information into a PDE formulation. The new bellman equation offers a more accurate approximation to the true value function, especially in scenarios where the underlying dynamics change slowly. Moreover, we extend PhiBE to higher orders, providing increasingly accurate approximations. We conduct the error analysis for both BE and PhiBE with explicit dependence on the discounted coefficient, the reward and the dynamics. Additionally, we present a model-free algorithm to solve PhiBE when only discrete-time trajectory data is available. Numerical experiments are provided to validate the theoretical guarantees we propose.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results
Authors:
Yaqi Wu,
Zhihao Fan,
Xiaofeng Chu,
Jimmy S. Ren,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangcheng Zhou,
Ruicheng Feng,
Yuekun Dai,
Peiqing Yang,
Chen Change Loy,
Senyan Xu,
Zhi**g Sun,
Jiaying Zhu,
Yurui Zhu,
Xueyang Fu,
Zheng-Jun Zha,
Jun Cao,
Cheng Li,
Shu Chen,
Liang Ma,
Shiyang Zhou,
Hai** Zeng,
Kai Feng
, et al. (24 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Data-Driven Dynamics Modeling of Miniature Robotic Blimps Using Neural ODEs With Parameter Auto-Tuning
Authors:
Yongjian Zhu,
Hao Cheng,
Feitian Zhang
Abstract:
Miniature robotic blimps, as one type of lighter-than-air aerial vehicles, have attracted increasing attention in the science and engineering community for their enhanced safety, extended endurance, and quieter operation compared to quadrotors. Accurately modeling the dynamics of these robotic blimps poses a significant challenge due to the complex aerodynamics stemming from their large lifting bo…
▽ More
Miniature robotic blimps, as one type of lighter-than-air aerial vehicles, have attracted increasing attention in the science and engineering community for their enhanced safety, extended endurance, and quieter operation compared to quadrotors. Accurately modeling the dynamics of these robotic blimps poses a significant challenge due to the complex aerodynamics stemming from their large lifting bodies. Traditional first-principle models have difficulty obtaining accurate aerodynamic parameters and often overlook high-order nonlinearities, thus coming to its limit in modeling the motion dynamics of miniature robotic blimps. To tackle this challenge, this letter proposes the Auto-tuning Blimp-oriented Neural Ordinary Differential Equation method (ABNODE), a data-driven approach that integrates first-principle and neural network modeling. Spiraling motion experiments of robotic blimps are conducted, comparing the ABNODE with first-principle and other data-driven benchmark models, the results of which demonstrate the effectiveness of the proposed method.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems
Authors:
Yiyang Zhu,
Enyu Shi,
Ziheng Liu,
Jiayi Zhang,
Bo Ai
Abstract:
Cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising technique for achieving high spectral efficiency (SE) using multiple distributed access points (APs). However, harsh propagation environments often lead to significant communication performance degradation due to high penetration loss. To overcome this issue, we introduce the reconfigurable intelligent surface (RIS) into…
▽ More
Cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising technique for achieving high spectral efficiency (SE) using multiple distributed access points (APs). However, harsh propagation environments often lead to significant communication performance degradation due to high penetration loss. To overcome this issue, we introduce the reconfigurable intelligent surface (RIS) into the CF mMIMO system as a low-cost and power-efficient solution. In this paper, we focus on optimizing the joint precoding design of the RIS-aided CF mMIMO system to maximize the sum SE. This involves optimizing the precoding matrix at the APs and the reflection coefficients at the RIS. To tackle this problem, we propose a fully distributed multi-agent reinforcement learning (MARL) algorithm that incorporates fuzzy logic (FL). Unlike conventional approaches that rely on alternating optimization techniques, our FL-based MARL algorithm only requires local channel state information, which reduces the need for high backhaul capacity. Simulation results demonstrate that our proposed FL-MARL algorithm effectively reduces computational complexity while achieving similar performance as conventional MARL methods.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Fairness-aware Age-of-Information Minimization in WPT-Assisted Short-Packet THz Communications for mURLLC
Authors:
Yao Zhu,
Xiaopeng Yuan,
Yulin Hu,
Bo Ai,
Ruikang Wang,
Bin Han,
Anke Schmeink
Abstract:
The technological landscape is swiftly advancing towards large-scale systems, creating significant opportunities, particularly in the domain of Terahertz (THz) communications. Networks designed for massive connectivity, comprising numerous Internet of Things (IoT) devices, are at the forefront of this advancement. In this paper, we consider Wireless Power Transfer (WPT)-enabled networks that suppo…
▽ More
The technological landscape is swiftly advancing towards large-scale systems, creating significant opportunities, particularly in the domain of Terahertz (THz) communications. Networks designed for massive connectivity, comprising numerous Internet of Things (IoT) devices, are at the forefront of this advancement. In this paper, we consider Wireless Power Transfer (WPT)-enabled networks that support these IoT devices with massive Ultra-Reliable and Low-Latency Communication (mURLLC) services.The focus of such networks is information freshness, with the Age-of-Information (AoI) serving as the pivotal performance metric. In particular, we aim to minimize the maximum AoI among IoT devices by optimizing the scheduling policy. Our analytical findings establish the convexity property of the problem, which can be solved efficiently. Furthermore, we introduce the concept of AoI-oriented cluster capacity, examining the relationship between the number of supported devices and the AoI performance in the network. Numerical simulations validate the advantage of our proposed approach in enhancing AoI performance, indicating its potential to guide the design of future THz communication systems for IoT applications requiring mURLLC services.
△ Less
Submitted 15 February, 2024;
originally announced April 2024.
-
iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer
Authors:
Fengtao Zhou,
Yingxue Xu,
Yanfen Cui,
Shenyan Zhang,
Yun Zhu,
Weiyang He,
Jiguang Wang,
Xin Wang,
Ronald Chan,
Louis Ho Shing Lau,
Chu Han,
Dafu Zhang,
Zhenhui Li,
Hao Chen
Abstract:
Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among…
▽ More
Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among patients, with a considerable subset displaying treatment resistance. Ineffective NACT not only leads to adverse effects but also misses the optimal therapeutic window, resulting in lower survival rate. However, existing multimodal learning methods assume the availability of all modalities for each patient, which does not align with the reality of clinical practice. The limited availability of modalities for each patient would cause information loss, adversely affecting predictive accuracy. In this study, we propose an incomplete multimodal data integration framework for GC (iMD4GC) to address the challenges posed by incomplete multimodal data, enabling precise response prediction and survival analysis. Specifically, iMD4GC incorporates unimodal attention layers for each modality to capture intra-modal information. Subsequently, the cross-modal interaction layers explore potential inter-modal interactions and capture complementary information across modalities, thereby enabling information compensation for missing modalities. To evaluate iMD4GC, we collected three multimodal datasets for GC study: GastricRes (698 cases) for response prediction, GastricSur (801 cases) for survival analysis, and TCGA-STAD (400 cases) for survival analysis. The scale of our datasets is significantly larger than previous studies. The iMD4GC achieved impressive performance with an 80.2% AUC on GastricRes, 71.4% C-index on GastricSur, and 66.1% C-index on TCGA-STAD, significantly surpassing other compared methods.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images
Authors:
Liu Yang,
Huiyu Duan,
Long Teng,
Yucheng Zhu,
Xiaohong Liu,
Menghan Hu,
Xiongkuo Min,
Guangtao Zhai,
Patrick Le Callet
Abstract:
In recent years, the rapid advancement of Artificial Intelligence Generated Content (AIGC) has attracted widespread attention. Among the AIGC, AI generated omnidirectional images hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications, hence omnidirectional AIGC techniques have also been widely studied. AI-generated omnidirectional images exhibit unique distorti…
▽ More
In recent years, the rapid advancement of Artificial Intelligence Generated Content (AIGC) has attracted widespread attention. Among the AIGC, AI generated omnidirectional images hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications, hence omnidirectional AIGC techniques have also been widely studied. AI-generated omnidirectional images exhibit unique distortions compared to natural omnidirectional images, however, there is no dedicated Image Quality Assessment (IQA) criteria for assessing them. This study addresses this gap by establishing a large-scale AI generated omnidirectional image IQA database named AIGCOIQA2024 and constructing a comprehensive benchmark. We first generate 300 omnidirectional images based on 5 AIGC models utilizing 25 text prompts. A subjective IQA experiment is conducted subsequently to assess human visual preferences from three perspectives including quality, comfortability, and correspondence. Finally, we conduct a benchmark experiment to evaluate the performance of state-of-the-art IQA models on our database. The database will be released to facilitate future research.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Time-Quantitatively Nonblocking Supervisory Control of Timed Discrete-Event Systems
Authors:
Renyuan Zhang,
Jiale Wu,
Junhua Gou,
Yabo Zhu,
Kai Cai
Abstract:
Recently we proposed an automaton property of quantitative nonblockingness in supervisory control of discrete-event systems, which quantifies the standard nonblocking property by capturing the practical requirement that all tasks be completed within a bounded number of steps. However, in practice tasks may be further required to be completed in specific time; this requirement cannot be fulfilled b…
▽ More
Recently we proposed an automaton property of quantitative nonblockingness in supervisory control of discrete-event systems, which quantifies the standard nonblocking property by capturing the practical requirement that all tasks be completed within a bounded number of steps. However, in practice tasks may be further required to be completed in specific time; this requirement cannot be fulfilled by the quantitatively nonblocking supervisor. To meet this new requirement, in this paper we introduce the concept of time-quantitative nonblockingness, which extends the concept of quantitative nonblockingness from untimed discrete-event systems (DES) to timed DES. This property requires that each task must be completed within a bounded time. Accordingly, we formulate a new time-quantitatively nonblocking supervisory control problem of TDES, and characterize its solvability in terms of a new concept of time-quantitative language completability. It is proved that there exists a unique supremal time-quantitatively completable sublanguage of a given language, and we develop an automaton-based algorithm to compute the supremal sublanguage. Finally, we present an approach to compute a maximally permissive supervisory control solution to the new time-quantitative nonblocking supervisory control problem.
△ Less
Submitted 27 January, 2024;
originally announced March 2024.
-
Integrated Communications and Localization for Massive MIMO LEO Satellite Systems
Authors:
Li You,
Xiaoyu Qiang,
Yongxiang Zhu,
Fan Jiang,
Christos G. Tsinos,
Wen** Wang,
Henk Wymeersch,
Xiqi Gao,
Björn Ottersten
Abstract:
Integrated communications and localization (ICAL) will play an important part in future sixth generation (6G) networks for the realization of Internet of Everything (IoE) to support both global communications and seamless localization. Massive multiple-input multiple-output (MIMO) low earth orbit (LEO) satellite systems have great potential in providing wide coverage with enhanced gains, and thus…
▽ More
Integrated communications and localization (ICAL) will play an important part in future sixth generation (6G) networks for the realization of Internet of Everything (IoE) to support both global communications and seamless localization. Massive multiple-input multiple-output (MIMO) low earth orbit (LEO) satellite systems have great potential in providing wide coverage with enhanced gains, and thus are strong candidates for realizing ubiquitous ICAL. In this paper, we develop a wideband massive MIMO LEO satellite system to simultaneously support wireless communications and localization operations in the downlink. In particular, we first characterize the signal propagation properties and derive a localization performance bound. Based on these analyses, we focus on the hybrid analog/digital precoding design to achieve high communication capability and localization precision. Numerical results demonstrate that the proposed ICAL scheme supports both the wireless communication and localization operations for typical system setups.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
A Segmentation Foundation Model for Diverse-type Tumors
Authors:
Jianhao Xie,
Ziang Zhang,
Guibo Luo,
Yuesheng Zhu
Abstract:
Large pre-trained models with their numerous model parameters and extensive training datasets have shown excellent performance in various tasks. Many publicly available medical image datasets do not have a sufficient amount of data so there are few large-scale models in medical imaging. We propose a large-scale Tumor Segmentation Foundation Model (TSFM) with 1.6 billion parameters using Resblock-b…
▽ More
Large pre-trained models with their numerous model parameters and extensive training datasets have shown excellent performance in various tasks. Many publicly available medical image datasets do not have a sufficient amount of data so there are few large-scale models in medical imaging. We propose a large-scale Tumor Segmentation Foundation Model (TSFM) with 1.6 billion parameters using Resblock-backbone and Transformer-bottleneck,which has good transfer ability for downstream tasks. To make TSFM exhibit good performance in tumor segmentation, we make full use of the strong spatial correlation between tumors and organs in the medical image, innovatively fuse 7 tumor datasets and 3 multi-organ datasets to build a 3D medical dataset pool, including 2779 cases with totally 300k medical images, whose size currently exceeds many other single publicly available datasets. TSFM is the pre-trained model for medical image segmentation, which also can be transferred to multiple downstream tasks for fine-tuning learning. The average performance of our pre-trained model is 2% higher than that of nnU-Net across various tumor types. In the transfer learning task, TSFM only needs 5% training epochs of nnU-Net to achieve similar performance and can surpass nnU-Net by 2% on average with 10% training epoch. Pre-trained TSFM and its code will be released soon.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Stochastic Geometry Analysis for Distributed RISs-Assisted mmWave Communications
Authors:
Yuan Xu,
Li Wei,
Chongwen Huang,
Yongxu Zhu,
Zhaohui Yang,
Jun Yang,
Jiguang He,
Zhaoyang Zhang,
Mérouane Debbah
Abstract:
Millimeter wave (mmWave) has attracted considerable attention due to its wide bandwidth and high frequency. However, it is highly susceptible to blockages, resulting in significant degradation of the coverage and the sum rate. A promising approach is deploying distributed reconfigurable intelligent surfaces (RISs), which can establish extra communication links. In this paper, we investigate the im…
▽ More
Millimeter wave (mmWave) has attracted considerable attention due to its wide bandwidth and high frequency. However, it is highly susceptible to blockages, resulting in significant degradation of the coverage and the sum rate. A promising approach is deploying distributed reconfigurable intelligent surfaces (RISs), which can establish extra communication links. In this paper, we investigate the impact of distributed RISs on the coverage probability and the sum rate in mmWave wireless communication systems. Specifically, we first introduce the system model, which includes the blockage, the RIS and the user distribution models, leveraging the Poisson point process. Then, we define the association criterion and derive the conditional coverage probabilities for the two cases of direct association and reflective association through RISs. Finally, we combine the two cases using Campbell's theorem and the total probability theorem to obtain the closed-form expressions for the ergodic coverage probability and the sum rate. Simulation results validate the effectiveness of the proposed analytical approach, demonstrating that the deployment of distributed RISs significantly improves the ergodic coverage probability by 45.4% and the sum rate by over 1.5 times.
△ Less
Submitted 9 April, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Authors:
Chunhui Wang,
Chang Zeng,
Bowen Zhang,
Ziyang Ma,
Yefan Zhu,
Zifeng Cai,
Jian Zhao,
Zhonglin Jiang,
Yong Chen
Abstract:
Token-based text-to-speech (TTS) models have emerged as a promising avenue for generating natural and realistic speech, yet they grapple with low pronunciation accuracy, speaking style and timbre inconsistency, and a substantial need for diverse training data. In response, we introduce a novel hierarchical acoustic modeling approach complemented by a tailored data augmentation strategy and train i…
▽ More
Token-based text-to-speech (TTS) models have emerged as a promising avenue for generating natural and realistic speech, yet they grapple with low pronunciation accuracy, speaking style and timbre inconsistency, and a substantial need for diverse training data. In response, we introduce a novel hierarchical acoustic modeling approach complemented by a tailored data augmentation strategy and train it on the combination of real and synthetic data, scaling the data size up to 650k hours, leading to the zero-shot TTS model with 0.8B parameters. Specifically, our method incorporates a latent variable sequence containing supplementary acoustic information based on refined self-supervised learning (SSL) discrete units into the TTS model by a predictor. This significantly mitigates pronunciation errors and style mutations in synthesized speech. During training, we strategically replace and duplicate segments of the data to enhance timbre uniformity. Moreover, a pretrained few-shot voice conversion model is utilized to generate a plethora of voices with identical content yet varied timbres. This facilitates the explicit learning of utterance-level one-to-many map**s, enriching speech diversity and also ensuring consistency in timbre. Comparative experiments (Demo page: https://anonymous.4open.science/w/ham-tts/)demonstrate our model's superiority over VALL-E in pronunciation precision and maintaining speaking style, as well as timbre continuity.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
FedFMS: Exploring Federated Foundation Models for Medical Image Segmentation
Authors:
Yuxi Liu,
Guibo Luo,
Yuesheng Zhu
Abstract:
Medical image segmentation is crucial for clinical diagnosis. The Segmentation Anything Model (SAM) serves as a powerful foundation model for visual segmentation and can be adapted for medical image segmentation. However, medical imaging data typically contain privacy-sensitive information, making it challenging to train foundation models with centralized storage and sharing. To date, there are fe…
▽ More
Medical image segmentation is crucial for clinical diagnosis. The Segmentation Anything Model (SAM) serves as a powerful foundation model for visual segmentation and can be adapted for medical image segmentation. However, medical imaging data typically contain privacy-sensitive information, making it challenging to train foundation models with centralized storage and sharing. To date, there are few foundation models tailored for medical image deployment within the federated learning framework, and the segmentation performance, as well as the efficiency of communication and training, remain unexplored. In response to these issues, we developed Federated Foundation models for Medical image Segmentation (FedFMS), which includes the Federated SAM (FedSAM) and a communication and training-efficient Federated SAM with Medical SAM Adapter (FedMSA). Comprehensive experiments on diverse datasets are conducted to investigate the performance disparities between centralized training and federated learning across various configurations of FedFMS. The experiments revealed that FedFMS could achieve performance comparable to models trained via centralized training methods while maintaining privacy. Furthermore, FedMSA demonstrated the potential to enhance communication and training efficiency. Our model implementation codes are available at https://github.com/LIU-YUXI/FedFMS.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation
Authors:
Weibin Liao,
Yinghao Zhu,
Xinyuan Wang,
Chengwei Pan,
Yasha Wang,
Liantao Ma
Abstract:
UNet and its variants have been widely used in medical image segmentation. However, these models, especially those based on Transformer architectures, pose challenges due to their large number of parameters and computational loads, making them unsuitable for mobile health applications. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as competitive alternatives to CNN and Tr…
▽ More
UNet and its variants have been widely used in medical image segmentation. However, these models, especially those based on Transformer architectures, pose challenges due to their large number of parameters and computational loads, making them unsuitable for mobile health applications. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as competitive alternatives to CNN and Transformer architectures. Building upon this, we employ Mamba as a lightweight substitute for CNN and Transformer within UNet, aiming at tackling challenges stemming from computational resource limitations in real medical settings. To this end, we introduce the Lightweight Mamba UNet (LightM-UNet) that integrates Mamba and UNet in a lightweight framework. Specifically, LightM-UNet leverages the Residual Vision Mamba Layer in a pure Mamba fashion to extract deep semantic features and model long-range spatial dependencies, with linear computational complexity. Extensive experiments conducted on two real-world 2D/3D datasets demonstrate that LightM-UNet surpasses existing state-of-the-art literature. Notably, when compared to the renowned nnU-Net, LightM-UNet achieves superior segmentation performance while drastically reducing parameter and computation costs by 116x and 21x, respectively. This highlights the potential of Mamba in facilitating model lightweighting. Our code implementation is publicly available at https://github.com/MrBlankness/LightM-UNet.
△ Less
Submitted 11 March, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
A Fisher Information based Receding Horizon Control Method for Signal Strength Model Estimation
Authors:
Yancheng Zhu,
Sean B. Andersson
Abstract:
This paper considers the problem of localizing a set of nodes in a wireless sensor network when both their positions and the parameters of the communication model are unknown. We assume that a single agent moves through the environment, taking measurements of the Received Signal Strength (RSS), and seek a controller that optimizes a performance metric based on the Fisher Information Matrix (FIM).…
▽ More
This paper considers the problem of localizing a set of nodes in a wireless sensor network when both their positions and the parameters of the communication model are unknown. We assume that a single agent moves through the environment, taking measurements of the Received Signal Strength (RSS), and seek a controller that optimizes a performance metric based on the Fisher Information Matrix (FIM). We develop a receding horizon (RH) approach that alternates between estimating the parameter values (using a maximum likelihood estimator) and determining where to move so as to maximally inform the estimation problem. The receding horizon controller solves a multi-stage look ahead problem to determine the next control to be applied, executes the move, collects the next measurement, and then re-estimates the parameters before repeating the sequence. We consider both a Dynamic Programming (DP) approach to solving the optimal control problem at each step, and a simplified heuristic based on a pruning algorithm that significantly reduces the computational complexity. We also consider a modified cost function that seeks to balance the information acquired about each of the parameters to ensure the controller does not focus on a single value in its optimization. These approaches are compared against two baselines, one based on a purely random trajectory and one on a greedy control solution. The simulations indicate our RH schemes outperform the baselines, while the pruning algorithm produces significant reductions in computation time with little effect on overall performance.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Coverage and Rate Analysis for Distributed RISs-Assisted mmWave Communications
Authors:
Yuan Xu,
Chongwen Huang,
Wei Li,
Yongxu Zhu,
Zhaohui Yang,
Jiguang He,
Jun Yang,
Zhaoyang Zhang,
Chau Yuen,
Merouane Debbah
Abstract:
The millimeter wave (mmWave) has received considerable interest due to its expansive bandwidth and high frequency. However, a noteworthy challenge arises from its vulnerability to blockages, leading to reduced coverage and achievable rates. To address these limitations, a potential solution is to deploy distributed reconfigurable intelligent surfaces (RISs), which comprise many low-cost and passiv…
▽ More
The millimeter wave (mmWave) has received considerable interest due to its expansive bandwidth and high frequency. However, a noteworthy challenge arises from its vulnerability to blockages, leading to reduced coverage and achievable rates. To address these limitations, a potential solution is to deploy distributed reconfigurable intelligent surfaces (RISs), which comprise many low-cost and passively reflected elements, and can facilitate the establishment of extra communication links. In this paper, we leverage stochastic geometry to investigate the ergodic coverage probability and the achievable rate in both distributed RISs-assisted single-cell and multi-cell mmWave wireless communication systems. Specifically, we first establish the system model considering the stochastically distributed blockages, RISs and users by the Poisson point process. Then we give the association criterion and derive the association probabilities, the distance distributions, and the conditional coverage probabilities for two cases of associations between base stations and users without or with RISs. Finally, we use Campbell's theorem and the total probability theorem to obtain the closed-form expressions of the ergodic coverage probability and the achievable rate. Simulation results verify the effectiveness of our analysis method, and demonstrate that by deploying distributed RISs, the ergodic coverage probability is significantly improved by approximately 50%, and the achievable rate is increased by more than 1.5 times.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Perceptual Video Quality Assessment: A Survey
Authors:
Xiongkuo Min,
Huiyu Duan,
Wei Sun,
Yucheng Zhu,
Guangtao Zhai
Abstract:
Perceptual video quality assessment plays a vital role in the field of video processing due to the existence of quality degradations introduced in various stages of video signal acquisition, compression, transmission and display. With the advancement of internet communication and cloud service technology, video content and traffic are growing exponentially, which further emphasizes the requirement…
▽ More
Perceptual video quality assessment plays a vital role in the field of video processing due to the existence of quality degradations introduced in various stages of video signal acquisition, compression, transmission and display. With the advancement of internet communication and cloud service technology, video content and traffic are growing exponentially, which further emphasizes the requirement for accurate and rapid assessment of video quality. Therefore, numerous subjective and objective video quality assessment studies have been conducted over the past two decades for both generic videos and specific videos such as streaming, user-generated content (UGC), 3D, virtual and augmented reality (VR and AR), high frame rate (HFR), audio-visual, etc. This survey provides an up-to-date and comprehensive review of these video quality assessment studies. Specifically, we first review the subjective video quality assessment methodologies and databases, which are necessary for validating the performance of video quality metrics. Second, the objective video quality assessment algorithms for general purposes are surveyed and concluded according to the methodologies utilized in the quality measures. Third, we overview the objective video quality assessment measures for specific applications and emerging topics. Finally, the performances of the state-of-the-art video quality assessment measures are compared and analyzed. This survey provides a systematic overview of both classical works and recent progresses in the realm of video quality assessment, which can help other researchers quickly access the field and conduct relevant research.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
A Comprehensive Approach to Diagnosing Temporomandibular Joint Diseases: AI-driven TMD Diagnostic System
Authors:
Y. Gua,
C. T. Kong,
D. D Zhangc,
Y. J Baid,
J. K. H. Tsoia,
Hua Huangc,
Y. Q. Dengc,
Y. M Zhue
Abstract:
AI-driven TMD diagnostic system uses AI segmentation method to diagnose Temporomandibular Joint Disorders (TMD). By using segmentation, three important parts: temporal bone, temporomandibular joint (TMJ) disc and the condyle can be identified. The location and the size of each segment are used as the basic information to determine if the patient has a high chance of having Temporomandibular Joint…
▽ More
AI-driven TMD diagnostic system uses AI segmentation method to diagnose Temporomandibular Joint Disorders (TMD). By using segmentation, three important parts: temporal bone, temporomandibular joint (TMJ) disc and the condyle can be identified. The location and the size of each segment are used as the basic information to determine if the patient has a high chance of having Temporomandibular Joint Disorders (TMD).
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Hybrid Message Passing-Based Detectors for Uplink Grant-Free NOMA Systems
Authors:
Yi Song,
Yiwen Zhu,
Kun Chen-Hu,
Xinhua Lu,
Peng Sun,
Zhongyong Wang
Abstract:
This paper studies improving the detector performance which considers the activity state (AS) temporal correlation of the user equipments (UEs) in the time domain under the uplink grant-free non-orthogonal multiple access (GF-NOMA) system. The Bernoulli Gaussian-Markov chain (BG-MC) probability model is used for exploiting both the sparsity and slow change characteristic of the AS of the UE. The G…
▽ More
This paper studies improving the detector performance which considers the activity state (AS) temporal correlation of the user equipments (UEs) in the time domain under the uplink grant-free non-orthogonal multiple access (GF-NOMA) system. The Bernoulli Gaussian-Markov chain (BG-MC) probability model is used for exploiting both the sparsity and slow change characteristic of the AS of the UE. The GAMP Bernoulli Gaussian-Markov chain (GAMP-BG-MC) algorithm is proposed to improve the detector performance, which can utilize the bidirectional message passing between the neighboring time slots to fully exploit the temporally-correlated AS of the UE. Furthermore, the parameters of the BG-MC model can be updated adaptively during the estimation procedure with unknown system statistics. Simulation results show that the proposed algorithm can improve the detection accuracy compared with the existing methods while kee** the same order complexity.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Window Stacking Meta-Models for Clinical EEG Classification
Authors:
Yixuan Zhu,
Rohan Kandasamy,
Luke J. W. Canham,
David Western
Abstract:
Windowing is a common technique in EEG machine learning classification and other time series tasks. However, a challenge arises when employing this technique: computational expense inhibits learning global relationships across an entire recording or set of recordings. Furthermore, the labels inherited by windows from their parent recordings may not accurately reflect the content of that window in…
▽ More
Windowing is a common technique in EEG machine learning classification and other time series tasks. However, a challenge arises when employing this technique: computational expense inhibits learning global relationships across an entire recording or set of recordings. Furthermore, the labels inherited by windows from their parent recordings may not accurately reflect the content of that window in isolation. To resolve these issues, we introduce a multi-stage model architecture, incorporating meta-learning principles tailored to time-windowed data aggregation. We further tested two distinct strategies to alleviate these issues: lengthening the window and utilizing overlap** to augment data. Our methods, when tested on the Temple University Hospital Abnormal EEG Corpus (TUAB), dramatically boosted the benchmark accuracy from 89.8 percent to 99.0 percent. This breakthrough performance surpasses prior performance projections for this dataset and paves the way for clinical applications of machine learning solutions to EEG interpretation challenges. On a broader and more varied dataset from the Temple University Hospital EEG Corpus (TUEG), we attained an accuracy of 86.7%, nearing the assumed performance ceiling set by variable inter-rater agreement on such datasets.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Privacy Protected Contactless Cardio-respiratory Monitoring using Defocused Cameras during Sleep
Authors:
Yingen Zhu,
Jia Huang,
Hongzhou Lu,
Wen** Wang
Abstract:
The monitoring of vital signs such as heart rate (HR) and respiratory rate (RR) during sleep is important for the assessment of sleep quality and detection of sleep disorders. Camera-based HR and RR monitoring gained popularity in sleep monitoring in recent years. However, they are all facing with serious privacy issues when using a video camera in the slee** scenario. In this paper, we propose…
▽ More
The monitoring of vital signs such as heart rate (HR) and respiratory rate (RR) during sleep is important for the assessment of sleep quality and detection of sleep disorders. Camera-based HR and RR monitoring gained popularity in sleep monitoring in recent years. However, they are all facing with serious privacy issues when using a video camera in the slee** scenario. In this paper, we propose to use the defocused camera to measure vital signs from optically blurred images, which can fundamentally eliminate the privacy invasion as face is difficult to be identified in obtained blurry images. A spatial-redundant framework involving living-skin detection is used to extract HR and RR from the defocused camera in NIR, and a motion metric is designed to exclude outliers caused by body motions. In the benchmark, the overall Mean Absolute Error (MAE) for HR measurement is 4.4 bpm, for RR measurement is 5.9 bpm. Both have quality drops as compared to the measurement using a focused camera, but the degradation in HR is much less, i.e. HR measurement has strong correlation with the reference ($R \geq 0.90$). Preliminary experiments suggest that it is feasible to use a defocused camera for cardio-respiratory monitoring while protecting the privacy. Further improvement is needed for robust RR measurement, such as by PPG-modulation based RR extraction.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
UAV Trajectory Tracking via RNN-enhanced IMM-KF with ADS-B Data
Authors:
Yian Zhu,
Ziye Jia,
Qihui Wu,
Chao Dong,
Zirui Zhuang,
Huiling Hu,
Qi Cai
Abstract:
With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic de…
▽ More
With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic dependent surveillance-broadcast (ADS-B) emerges as a promising method to monitor UAVs, due to the advantages of real-time capabilities, easy deployment and affordable cost. Therefore, we employ the ADS-B for UAV trajectory tracking in this work. However, the inherent noise in the transmitted data poses an obstacle for precisely tracking UAVs. Hence, we propose the algorithm of recurrent neural network-enhanced interacting multiple model-Kalman filter (RNN-enhanced IMM-KF) for UAV trajectory filtering. Specifically, the algorithm utilizes the RNN to capture the maneuvering behavior of UAVs and the noise level in the ADS-B data. Moreover, accurate UAV tracking is achieved by adaptively adjusting the process noise matrix and observation noise matrix of IMM-KF with the assistance of the RNN. The proposed algorithm can facilitate GSs to make timely decisions during trajectory deviations of UAVs and improve the airspace safety. Finally, via comprehensive simulations, the total root mean square error of the proposed algorithm decreases by 28.56%, compared to the traditional IMM-KF.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Material decomposition for dual-energy propagation-based phase-contrast CT
Authors:
Suyu Liao,
Huitao Zhang,
Peng Zhang,
Yining Zhu
Abstract:
Material decomposition refers to using the energy dependence of material physical properties to differentiate materials in a sample, which is a very important application in computed tomography(CT). In propagation-based X-ray phase-contrast CT, the phase retrieval and Reconstruction are always independent. Moreover, like in conventional CT, the material decomposition methods in this technique can…
▽ More
Material decomposition refers to using the energy dependence of material physical properties to differentiate materials in a sample, which is a very important application in computed tomography(CT). In propagation-based X-ray phase-contrast CT, the phase retrieval and Reconstruction are always independent. Moreover, like in conventional CT, the material decomposition methods in this technique can be classified into two types based on pre-reconstruction and post-reconstruction (two-step). The CT images often suffer from noise and artifacts in those methods because of no feedback and correction from the intensity data. This work investigates an iterative method to obtain material decomposition directly from the intensity data in different energies, which means that we perform phase retrieval, reconstruction and material decomposition in a one step. Fresnel diffraction is applied to forward propagation and CT images interact with this intensity data throughout the iterative process. Experiments results demonstrate that compared with two-step methods, the proposed method is superior in accurate material decomposition and noise reduction.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Adapting to climate change: Long-term impact of wind resource changes on China's power system resilience
Authors:
Jiaqi Ruan,
Xiangrui Meng,
Yifan Zhu,
Gaoqi Liang,
Xianzhuo Sun,
Huayi Wu,
Huijuan Xiao,
Mengqian Lu,
Pin Gao,
Jiapeng Li,
Wai-Kin Wong,
Zhao Xu,
Junhua Zhao
Abstract:
Modern society's reliance on power systems is at risk from the escalating effects of wind-related climate change. Yet, failure to identify the intricate relationship between wind-related climate risks and power systems could lead to serious short- and long-term issues, including partial or complete blackouts. Here, we develop a comprehensive framework to assess China's power system resilience acro…
▽ More
Modern society's reliance on power systems is at risk from the escalating effects of wind-related climate change. Yet, failure to identify the intricate relationship between wind-related climate risks and power systems could lead to serious short- and long-term issues, including partial or complete blackouts. Here, we develop a comprehensive framework to assess China's power system resilience across various climate change scenarios, enabling a holistic evaluation of the repercussions induced by wind-related climate change. Our findings indicate that China's current wind projects and planning strategies could be jeopardized by wind-related climate change, with up to a 12\% decline in regional wind power availability. Moreover, our results underscore a pronounced vulnerability of power system resilience amidst the rigors of hastened climate change, unveiling a potential amplification of resilience deterioration, even approaching fourfold by 2060 under the most severe scenario, relative to the 2020 benchmark. This work advocates for strategic financial deployment within the power sector aimed at climate adaptation, enhancing power system resilience to avert profound losses from long-term, wind-influenced climatic fluctuations.
△ Less
Submitted 24 January, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser
Authors:
Peng Chen,
Xiaobao Wei,
Ming Lu,
Yitong Zhu,
Naiming Yao,
Xingyu Xiao,
Hui Chen
Abstract:
Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic map** from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation ge…
▽ More
Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic map** from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation generation are still two major limitations of existing diffusion-based methods. To address the above limitations, we propose DiffusionTalker, a diffusion-based method that utilizes contrastive learning to personalize 3D facial animation and knowledge distillation to accelerate 3D animation generation. Specifically, to enable personalization, we introduce a learnable talking identity to aggregate knowledge in audio sequences. The proposed identity embeddings extract customized facial cues across different people in a contrastive learning manner. During inference, users can obtain personalized facial animation based on input audio, reflecting a specific talking style. With a trained diffusion model with hundreds of steps, we distill it into a lightweight model with 8 steps for acceleration. Extensive experiments are conducted to demonstrate that our method outperforms state-of-the-art methods. The code will be released.
△ Less
Submitted 2 December, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
On RIS-Aided SIMO Gaussian Channels: Towards A Single-RF MIMO Transceiver Architecture
Authors:
Ru-Han Chen,
**g Zhou,
Yonggang Zhu,
Kai Zhang
Abstract:
In this paper, for a single-input multiple-output (SIMO) system aided by a passive reconfigurable intelligent surface (RIS), the joint transmission accomplished by the single transmit antenna and the RIS with multiple controllable reflective elements is considered. Relying on a general capacity upper bound derived by a maximum-trace argument, we respectively characterize the capacity of such \rev{…
▽ More
In this paper, for a single-input multiple-output (SIMO) system aided by a passive reconfigurable intelligent surface (RIS), the joint transmission accomplished by the single transmit antenna and the RIS with multiple controllable reflective elements is considered. Relying on a general capacity upper bound derived by a maximum-trace argument, we respectively characterize the capacity of such \rev{a} channel in the low-SNR or the rank-one regimes, in which the optimal configuration of the RIS is proved to be beamforming with carefully-chosen phase shifts. To exploit the potential of modulating extra information on the RIS, based on the QR decomposition, successive interference cancellation, and a strategy named \textit{partially beamforming and partially information-carrying}, we propose a novel transceiver architecture with only a single RF front end at the transmitter, by which the considered channel can be regarded as a concatenation of a vector Gaussian channel and several phase-modulated channels. Especially, we investigate a class of vector Gaussian channels with a hypersphere input support constraint, and not only generalize the existing result to arbitrary-dimensional real spaces but also present its high-order capacity asymptotics, by which both capacities of hypersphere-constrained channels and achievable rates of the proposed transceiver with two different signaling schemes can be well-approximated. Information-theoretic analyses show that the transceiver architecture designed for the SIMO channel has a boosted multiplexing gain, rather than one for the conventionally-used optimized beamforming scheme.Numerical results verify our derived asymptotics and show notable superiority of the proposed transceiver.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Joint Diffusion: Mutual Consistency-Driven Diffusion Model for PET-MRI Co-Reconstruction
Authors:
Taofeng Xie,
Zhuo-Xu Cui,
Chen Luo,
Huayu Wang,
Congcong Liu,
Yuanzhi Zhang,
Xuemei Wang,
Yanjie Zhu,
Qiyu **,
Guoqing Chen,
Yihang Zhou,
Dong Liang,
Haifeng Wang
Abstract:
Positron Emission Tomography and Magnetic Resonance Imaging (PET-MRI) systems can obtain functional and anatomical scans. PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming. The study aims to accelerate MRI and enhance PET image quality. Conventional approaches involve the separate reconstruction of each modality within PET-MRI sy…
▽ More
Positron Emission Tomography and Magnetic Resonance Imaging (PET-MRI) systems can obtain functional and anatomical scans. PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming. The study aims to accelerate MRI and enhance PET image quality. Conventional approaches involve the separate reconstruction of each modality within PET-MRI systems. However, there exists complementary information among multi-modal images. The complementary information can contribute to image reconstruction. In this study, we propose a novel PET-MRI joint reconstruction model employing a mutual consistency-driven diffusion mode, namely MC-Diffusion. MC-Diffusion learns the joint probability distribution of PET and MRI for utilizing complementary information. We conducted a series of contrast experiments about LPLS, Joint ISAT-net and MC-Diffusion by the ADNI dataset. The results underscore the qualitative and quantitative improvements achieved by MC-Diffusion, surpassing the state-of-the-art method.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
MSPB: a longitudinal multi-sensor dataset with phenotypic trait measurements from honey bees
Authors:
Yi Zhu,
Mahsa Abdollahi,
Ségolène Maucourt,
Nico Coallier,
Heitor R. Guimarães,
Pierre Giovenazzo,
Tiago H. Falk
Abstract:
We present a longitudinal multi-sensor dataset collected from honey bee colonies (Apis mellifera) with rich phenotypic measurements. Data were continuously collected between May-2020 and April-2021 from 53 hives located at two apiaries in Québec, Canada. The sensor data included audio features, temperature, and relative humidity. The phenotypic measurements contained beehive population, number of…
▽ More
We present a longitudinal multi-sensor dataset collected from honey bee colonies (Apis mellifera) with rich phenotypic measurements. Data were continuously collected between May-2020 and April-2021 from 53 hives located at two apiaries in Québec, Canada. The sensor data included audio features, temperature, and relative humidity. The phenotypic measurements contained beehive population, number of brood cells (eggs, larva and pupa), Varroa destructor infestation levels, defensive and hygienic behaviors, honey yield, and winter mortality. Our study is amongst the first to provide a wide variety of phenotypic trait measurements annotated by apicultural science experts, which facilitate a broader scope of analysis. We first summarize the data collection procedure, sensor data pre-processing steps, and data composition. We then provide an overview of the phenotypic data distribution as well as a visualization of the sensor data patterns. Lastly, we showcase several hive monitoring applications based on sensor data analysis and machine learning, such as winter mortality prediction, hive population estimation, and the presence of an active and laying queen.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Time-Optimal Control for High-Order Chain-of-Integrators Systems with Full State Constraints and Arbitrary Terminal States (Extended Version)
Authors:
Yunan Wang,
Chuxiong Hu,
Zeyang Li,
Shize Lin,
Suqin He,
Yu Zhu
Abstract:
Time-optimal control for high-order chain-of-integrators systems with full state constraints and arbitrarily given terminal states remains a challenging problem in the optimal control theory domain, yet to be resolved. To enhance further comprehension of the problem, this paper establishes a novel notation system and theoretical framework, providing the switching manifold for high-order problems i…
▽ More
Time-optimal control for high-order chain-of-integrators systems with full state constraints and arbitrarily given terminal states remains a challenging problem in the optimal control theory domain, yet to be resolved. To enhance further comprehension of the problem, this paper establishes a novel notation system and theoretical framework, providing the switching manifold for high-order problems in the form of switching laws. Through deriving properties of switching laws regarding signs and dimension, this paper proposes a definite condition for time-optimal control. Guided by the developed theory, a trajectory planning method named the manifold-intercept method (MIM) is developed. The proposed MIM can plan time-optimal jerk-limited trajectories with full state constraints, and can also plan near-optimal non-chattering higher-order trajectories with negligible extra motion time compared to optimal profiles. Numerical results indicate that the proposed MIM outperforms all baselines in computational time, computational accuracy, and trajectory quality by a large gap.
△ Less
Submitted 28 March, 2024; v1 submitted 12 November, 2023;
originally announced November 2023.
-
A Two-Stage Generative Model with CycleGAN and Joint Diffusion for MRI-based Brain Tumor Detection
Authors:
Wenxin Wang,
Zhuo-Xu Cui,
Guanxun Cheng,
Chentao Cao,
Xi Xu,
Ziwei Liu,
Haifeng Wang,
Yulong Qi,
Dong Liang,
Yanjie Zhu
Abstract:
Accurate detection and segmentation of brain tumors is critical for medical diagnosis. However, current supervised learning methods require extensively annotated images and the state-of-the-art generative models used in unsupervised methods often have limitations in covering the whole data distribution. In this paper, we propose a novel framework Two-Stage Generative Model (TSGM) that combines Cyc…
▽ More
Accurate detection and segmentation of brain tumors is critical for medical diagnosis. However, current supervised learning methods require extensively annotated images and the state-of-the-art generative models used in unsupervised methods often have limitations in covering the whole data distribution. In this paper, we propose a novel framework Two-Stage Generative Model (TSGM) that combines Cycle Generative Adversarial Network (CycleGAN) and Variance Exploding stochastic differential equation using joint probability (VE-JP) to improve brain tumor detection and segmentation. The CycleGAN is trained on unpaired data to generate abnormal images from healthy images as data prior. Then VE-JP is implemented to reconstruct healthy images using synthetic paired abnormal images as a guide, which alters only pathological regions but not regions of healthy. Notably, our method directly learned the joint probability distribution for conditional generation. The residual between input and reconstructed images suggests the abnormalities and a thresholding method is subsequently applied to obtain segmentation results. Furthermore, the multimodal results are weighted with different weights to improve the segmentation accuracy further. We validated our method on three datasets, and compared with other unsupervised methods for anomaly detection and segmentation. The DSC score of 0.8590 in BraTs2020 dataset, 0.6226 in ITCS dataset and 0.7403 in In-house dataset show that our method achieves better segmentation performance and has better generalization.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Towards High-quality HDR Deghosting with Conditional Diffusion Models
Authors:
Qingsen Yan,
Tao Hu,
Yuan Sun,
Hao Tang,
Yu Zhu,
Wei Dong,
Luc Van Gool,
Yanning Zhang
Abstract:
High Dynamic Range (HDR) images can be recovered from several Low Dynamic Range (LDR) images by existing Deep Neural Networks (DNNs) techniques. Despite the remarkable progress, DNN-based methods still generate ghosting artifacts when LDR images have saturation and large motion, which hinders potential applications in real-world scenarios. To address this challenge, we formulate the HDR deghosting…
▽ More
High Dynamic Range (HDR) images can be recovered from several Low Dynamic Range (LDR) images by existing Deep Neural Networks (DNNs) techniques. Despite the remarkable progress, DNN-based methods still generate ghosting artifacts when LDR images have saturation and large motion, which hinders potential applications in real-world scenarios. To address this challenge, we formulate the HDR deghosting problem as an image generation that leverages LDR features as the diffusion model's condition, consisting of the feature condition generator and the noise predictor. Feature condition generator employs attention and Domain Feature Alignment (DFA) layer to transform the intermediate features to avoid ghosting artifacts. With the learned features as conditions, the noise predictor leverages a stochastic iterative denoising process for diffusion models to generate an HDR image by steering the sampling process. Furthermore, to mitigate semantic confusion caused by the saturation problem of LDR images, we design a sliding window noise estimator to sample smooth noise in a patch-based manner. In addition, an image space loss is proposed to avoid the color distortion of the estimated HDR results. We empirically evaluate our model on benchmark datasets for HDR imaging. The results demonstrate that our approach achieves state-of-the-art performances and well generalization to real-world images.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Progressive Dual Priori Network for Generalized Breast Tumor Segmentation
Authors:
Li Wang,
Lihui Wang,
Zixiang Kuai,
Lei Tang,
Yingfeng Ou,
Chen Ye,
Yuemin Zhu
Abstract:
To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regi…
▽ More
To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regions with a coarse-segmentation based localization module, then the breast tumor mask was progressively refined by using the weak semantic priori and cross-scale correlation prior knowledge. To validate the effectiveness of PDPNet, we compared it with several state-of-the-art methods on multi-center datasets. The results showed that, comparing against the suboptimal method, the DSC and HD95 of PDPNet were improved at least by 5.13% and 7.58% respectively on multi-center test sets. In addition, through ablations, we demonstrated that the proposed localization module can decrease the influence of normal tissues and therefore improve the generalization ability of the model. The weak semantic priors allow focusing on tumor regions to avoid missing small tumors and low-contrast tumors. The cross-scale correlation priors are beneficial for promoting the shape-aware ability for irregular tumors. Thus integrating them in a unified framework improved the multi-center breast tumor segmentation performance. The source code and open data can be accessed at https://github.com/wangli100209/PDPNet.
△ Less
Submitted 16 June, 2024; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Towards Intelligent Network Management: Leveraging AI for Network Service Detection
Authors:
Khuong N. Nguyen,
Abhishek Sehgal,
Yuming Zhu,
Junsu Choi,
Guanbo Chen,
Hao Chen,
Boon Loong Ng,
Charlie Zhang
Abstract:
As the complexity and scale of modern computer networks continue to increase, there has emerged an urgent need for precise traffic analysis, which plays a pivotal role in cutting-edge wireless connectivity technologies. This study focuses on leveraging Machine Learning methodologies to create an advanced network traffic classification system. We introduce a novel data-driven approach that excels i…
▽ More
As the complexity and scale of modern computer networks continue to increase, there has emerged an urgent need for precise traffic analysis, which plays a pivotal role in cutting-edge wireless connectivity technologies. This study focuses on leveraging Machine Learning methodologies to create an advanced network traffic classification system. We introduce a novel data-driven approach that excels in identifying various network service types in real-time, by analyzing patterns within the network traffic. Our method organizes similar kinds of network traffic into distinct categories, referred to as network services, based on latency requirement. Furthermore, it decomposes the network traffic stream into multiple, smaller traffic flows, with each flow uniquely carrying a specific service. Our ML models are trained on a dataset comprised of labeled examples representing different network service types collected on various Wi-Fi network conditions. Upon evaluation, our system demonstrates a remarkable accuracy in distinguishing the network services. These results emphasize the substantial promise of integrating Artificial Intelligence in wireless technologies. Such an approach encourages more efficient energy consumption, enhances Quality of Service assurance, and optimizes the allocation of network resources, thus laying a solid groundwork for the development of advanced intelligent networks.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Ultrasound Image Segmentation of Thyroid Nodule via Latent Semantic Feature Co-Registration
Authors:
Xuewei Li,
Yaqiao Zhu,
Jie Gao,
Xi Wei,
Ruixuan Zhang,
Yuan Tian,
ZhiQiang Liu
Abstract:
Segmentation of nodules in thyroid ultrasound imaging plays a crucial role in the detection and treatment of thyroid cancer. However, owing to the diversity of scanner vendors and imaging protocols in different hospitals, the automatic segmentation model, which has already demonstrated expert-level accuracy in the field of medical image segmentation, finds its accuracy reduced as the result of its…
▽ More
Segmentation of nodules in thyroid ultrasound imaging plays a crucial role in the detection and treatment of thyroid cancer. However, owing to the diversity of scanner vendors and imaging protocols in different hospitals, the automatic segmentation model, which has already demonstrated expert-level accuracy in the field of medical image segmentation, finds its accuracy reduced as the result of its weak generalization performance when being applied in clinically realistic environments. To address this issue, the present paper proposes ASTN, a framework for thyroid nodule segmentation achieved through a new type co-registration network. By extracting latent semantic information from the atlas and target images and utilizing in-depth features to accomplish the co-registration of nodules in thyroid ultrasound images, this framework can ensure the integrity of anatomical structure and reduce the impact on segmentation as the result of overall differences in image caused by different devices. In addition, this paper also provides an atlas selection algorithm to mitigate the difficulty of co-registration. As shown by the evaluation results collected from the datasets of different devices, thanks to the method we proposed, the model generalization has been greatly improved while maintaining a high level of segmentation accuracy.
△ Less
Submitted 21 January, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Score-based Diffusion Models With Self-supervised Learning For Accelerated 3D Multi-contrast Cardiac Magnetic Resonance Imaging
Authors:
Yuanyuan Liu,
Zhuo-Xu Cui,
Congcong Liu,
Hairong Zheng,
Haifeng Wang,
Yihang Zhou,
Yanjie Zhu
Abstract:
Long scan time significantly hinders the widespread applications of three-dimensional multi-contrast cardiac magnetic resonance (3D-MC-CMR) imaging. This study aims to accelerate 3D-MC-CMR acquisition by a novel method based on score-based diffusion models with self-supervised learning. Specifically, we first establish a map** between the undersampled k-space measurements and the MR images, util…
▽ More
Long scan time significantly hinders the widespread applications of three-dimensional multi-contrast cardiac magnetic resonance (3D-MC-CMR) imaging. This study aims to accelerate 3D-MC-CMR acquisition by a novel method based on score-based diffusion models with self-supervised learning. Specifically, we first establish a map** between the undersampled k-space measurements and the MR images, utilizing a self-supervised Bayesian reconstruction network. Secondly, we develop a joint score-based diffusion model on 3D-MC-CMR images to capture their inherent distribution. The 3D-MC-CMR images are finally reconstructed using the conditioned Langenvin Markov chain Monte Carlo sampling. This approach enables accurate reconstruction without fully sampled training data. Its performance was tested on the dataset acquired by a 3D joint myocardial T1 and T1rho map** sequence. The T1 and T1rho maps were estimated via a dictionary matching method from the reconstructed images. Experimental results show that the proposed method outperforms traditional compressed sensing and existing self-supervised deep learning MRI reconstruction methods. It also achieves high quality T1 and T1rho parametric maps close to the reference maps obtained by traditional map** sequences, even at a high acceleration rate of 14.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection
Authors:
Yi Zhu,
Saurabh Powar,
Tiago H. Falk
Abstract:
Existing deepfake speech detection systems lack generalizability to unseen attacks (i.e., samples generated by generative algorithms not seen during training). Recent studies have explored the use of universal speech representations to tackle this issue and have obtained inspiring results. These works, however, have focused on innovating downstream classifiers while leaving the representation itse…
▽ More
Existing deepfake speech detection systems lack generalizability to unseen attacks (i.e., samples generated by generative algorithms not seen during training). Recent studies have explored the use of universal speech representations to tackle this issue and have obtained inspiring results. These works, however, have focused on innovating downstream classifiers while leaving the representation itself untouched. In this study, we argue that characterizing the long-term temporal dynamics of these representations is crucial for generalizability and propose a new method to assess representation dynamics. Indeed, we show that different generative models generate similar representation dynamics patterns with our proposed method. Experiments on the ASVspoof 2019 and 2021 datasets validate the benefits of the proposed method to detect deepfakes from methods unseen during training, significantly improving on several benchmark methods.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Generalized Minimum Error with Fiducial Points Criterion for Robust Learning
Authors:
Haiquan Zhao,
Yuan Gao,
Yingying Zhu
Abstract:
The conventional Minimum Error Entropy criterion (MEE) has its limitations, showing reduced sensitivity to error mean values and uncertainty regarding error probability density function locations. To overcome this, a MEE with fiducial points criterion (MEEF), was presented. However, the efficacy of the MEEF is not consistent due to its reliance on a fixed Gaussian kernel. In this paper, a generali…
▽ More
The conventional Minimum Error Entropy criterion (MEE) has its limitations, showing reduced sensitivity to error mean values and uncertainty regarding error probability density function locations. To overcome this, a MEE with fiducial points criterion (MEEF), was presented. However, the efficacy of the MEEF is not consistent due to its reliance on a fixed Gaussian kernel. In this paper, a generalized minimum error with fiducial points criterion (GMEEF) is presented by adopting the Generalized Gaussian Density (GGD) function as kernel. The GGD extends the Gaussian distribution by introducing a shape parameter that provides more control over the tail behavior and peakedness. In addition, due to the high computational complexity of GMEEF criterion, the quantized idea is introduced to notably lower the computational load of the GMEEF-type algorithm. Finally, the proposed criterions are introduced to the domains of adaptive filter, kernel recursive algorithm, and multilayer perceptron. Several numerical simulations, which contain system identification, acoustic echo cancellation, times series prediction, and supervised classification, indicate that the novel algorithms' performance performs excellently.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning
Authors:
Mengliang Zhang,
Xinyue Hu,
Lin Gu,
Liangchen Liu,
Kazuma Kobayashi,
Tatsuya Harada,
Ronald M. Summers,
Yingying Zhu
Abstract:
Patients undergoing chest X-rays (CXR) often endure multiple lung diseases. When evaluating a patient's condition, due to the complex pathologies, subtle texture changes of different lung lesions in images, and patient condition differences, radiologists may make uncertain even when they have experienced long-term clinical training and professional guidance, which makes much noise in extracting di…
▽ More
Patients undergoing chest X-rays (CXR) often endure multiple lung diseases. When evaluating a patient's condition, due to the complex pathologies, subtle texture changes of different lung lesions in images, and patient condition differences, radiologists may make uncertain even when they have experienced long-term clinical training and professional guidance, which makes much noise in extracting disease labels based on CXR reports. In this paper, we re-extract disease labels from CXR reports to make them more realistic by considering disease severity and uncertainty in classification. Our contributions are as follows: 1. We re-extracted the disease labels with severity and uncertainty by a rule-based approach with keywords discussed with clinical experts. 2. To further improve the explainability of chest X-ray diagnosis, we designed a multi-relationship graph learning method with an expert uncertainty-aware loss function. 3. Our multi-relationship graph learning method can also interpret the disease classification results. Our experimental results show that models considering disease severity and uncertainty outperform previous state-of-the-art methods.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Dynamic Dual-Graph Fusion Convolutional Network For Alzheimer's Disease Diagnosis
Authors:
Fanshi Li,
Zhihui Wang,
Yifan Guo,
Congcong Liu,
Yanjie Zhu,
Yihang Zhou,
Jun Li,
Dong Liang,
Haifeng Wang
Abstract:
In this paper, a dynamic dual-graph fusion convolutional network is proposed to improve Alzheimer's disease (AD) diagnosis performance. The following are the paper's main contributions: (a) propose a novel dynamic GCN architecture, which is an end-to-end pipeline for diagnosis of the AD task; (b) the proposed architecture can dynamically adjust the graph structure for GCN to produce better diagnos…
▽ More
In this paper, a dynamic dual-graph fusion convolutional network is proposed to improve Alzheimer's disease (AD) diagnosis performance. The following are the paper's main contributions: (a) propose a novel dynamic GCN architecture, which is an end-to-end pipeline for diagnosis of the AD task; (b) the proposed architecture can dynamically adjust the graph structure for GCN to produce better diagnosis outcomes by learning the optimal underlying latent graph; (c) incorporate feature graph learning and dynamic graph learning, giving those useful features of subjects more weight while decreasing the weights of other noise features. Experiments indicate that our model provides flexibility and stability while achieving excellent classification results in AD diagnosis.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.