-
EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition
Authors:
Yi Ding,
Chengxuan Tong,
Shuailei Zhang,
Muyun Jiang,
Yong Li,
Kevin Lim Jun Liang,
Cuntai Guan
Abstract:
Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a…
▽ More
Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a novel transformer model called emotion transformer (EmT). EmT is designed to excel in both generalized cross-subject EEG emotion classification and regression tasks. In EmT, EEG signals are transformed into a temporal graph format, creating a sequence of EEG feature graphs using a temporal graph construction module (TGC). A novel residual multi-view pyramid GCN module (RMPG) is then proposed to learn dynamic graph representations for each EEG feature graph within the series, and the learned representations of each graph are fused into one token. Furthermore, we design a temporal contextual transformer module (TCT) with two types of token mixers to learn the temporal contextual information. Finally, the task-specific output module (TSO) generates the desired outputs. Experiments on four publicly available datasets show that EmT achieves higher results than the baseline methods for both EEG emotion classification and regression tasks. The code is available at https://github.com/yi-ding-cs/EmT.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Federated Transfer Learning Aided Interference Classification in GNSS Signals
Authors:
Min Jiang,
Ziqiang Ye,
Yue Xiao,
Xiaogang Gou
Abstract:
This study delves into the classification of interference signals to global navigation satellite systems (GNSS) stemming from mobile jammers such as unmanned aerial vehicles (UAVs) across diverse wireless communication zones, employing federated learning (FL) and transfer learning (TL). Specifically, we employ a neural network classifier, enhanced with FL to decentralize data processing and TL to…
▽ More
This study delves into the classification of interference signals to global navigation satellite systems (GNSS) stemming from mobile jammers such as unmanned aerial vehicles (UAVs) across diverse wireless communication zones, employing federated learning (FL) and transfer learning (TL). Specifically, we employ a neural network classifier, enhanced with FL to decentralize data processing and TL to hasten the training process, aiming to improve interference classification accuracy while preserving data privacy. Our evaluations span multiple data scenarios, incorporating both independent and identically distributed (IID) and non-identically distributed (non-IID), to gauge the performance of our approach under different interference conditions. Our results indicate an improvement of approximately $8\%$ in classification accuracy compared to basic convolutional neural network (CNN) model, accompanied by expedited convergence in networks utilizing pre-trained models. Additionally, the implementation of FL not only developed privacy but also matched the robustness of centralized learning methods, particularly under IID scenarios. Moreover, the federated averaging (FedAvg) algorithm effectively manages regional interference variability, thereby enhancing the regional communication performance indicator, $C/N_0$, by roughly $5\text{dB}\cdot \text{Hz}$ compared to isolated setups.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Uncertainty-Aware Adapter: Adapting Segment Anything Model (SAM) for Ambiguous Medical Image Segmentation
Authors:
Mingzhou Jiang,
Jiaying Zhou,
Junde Wu,
Tianyang Wang,
Yueming **,
Min Xu
Abstract:
The Segment Anything Model (SAM) gained significant success in natural image segmentation, and many methods have tried to fine-tune it to medical image segmentation. An efficient way to do so is by using Adapters, specialized modules that learn just a few parameters to tailor SAM specifically for medical images. However, unlike natural images, many tissues and lesions in medical images have blurry…
▽ More
The Segment Anything Model (SAM) gained significant success in natural image segmentation, and many methods have tried to fine-tune it to medical image segmentation. An efficient way to do so is by using Adapters, specialized modules that learn just a few parameters to tailor SAM specifically for medical images. However, unlike natural images, many tissues and lesions in medical images have blurry boundaries and may be ambiguous. Previous efforts to adapt SAM ignore this challenge and can only predict distinct segmentation. It may mislead clinicians or cause misdiagnosis, especially when encountering rare variants or situations with low model confidence. In this work, we propose a novel module called the Uncertainty-aware Adapter, which efficiently fine-tuning SAM for uncertainty-aware medical image segmentation. Utilizing a conditional variational autoencoder, we encoded stochastic samples to effectively represent the inherent uncertainty in medical imaging. We designed a new module on a standard adapter that utilizes a condition-based strategy to interact with samples to help SAM integrate uncertainty. We evaluated our method on two multi-annotated datasets with different modalities: LIDC-IDRI (lung abnormalities segmentation) and REFUGE2 (optic-cup segmentation). The experimental results show that the proposed model outperforms all the previous methods and achieves the new state-of-the-art (SOTA) on both benchmarks. We also demonstrated that our method can generate diverse segmentation hypotheses that are more realistic as well as heterogeneous.
△ Less
Submitted 18 March, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
GAN Based Near-Field Channel Estimation for Extremely Large-Scale MIMO Systems
Authors:
Ming Ye,
Xiao Liang,
Cunhua Pan,
Yinfei Xu,
Ming Jiang,
Chunguo Li
Abstract:
Extremely large-scale multiple-input-multiple-output (XL-MIMO) is a promising technique to achieve ultra-high spectral efficiency for future 6G communications. The mixed line-of-sight (LoS) and non-line-of-sight (NLoS) XL-MIMO near-field channel model is adopted to describe the XL-MIMO near-field channel accurately. In this paper, a generative adversarial network (GAN) variant based channel estima…
▽ More
Extremely large-scale multiple-input-multiple-output (XL-MIMO) is a promising technique to achieve ultra-high spectral efficiency for future 6G communications. The mixed line-of-sight (LoS) and non-line-of-sight (NLoS) XL-MIMO near-field channel model is adopted to describe the XL-MIMO near-field channel accurately. In this paper, a generative adversarial network (GAN) variant based channel estimation method is proposed for XL-MIMO systems. Specifically, the GAN variant is developed to simultaneously estimate the LoS and NLoS path components of the XL-MIMO channel. The initially estimated channels instead of the received signals are input into the GAN variant as the conditional input to generate the XL-MIMO channels more efficiently. The GAN variant not only learns the map** from the initially estimated channels to the XL-MIMO channels but also learns an adversarial loss. Moreover, we combine the adversarial loss with a conventional loss function to ensure the correct direction of training the generator. To further enhance the estimation performance, we investigate the impact of the hyper-parameter of the loss function on the performance of our method. Simulation results show that the proposed method outperforms the existing channel estimation approaches in the adopted channel model. In addition, the proposed method surpasses the Cram$\acute{\mathbf{e}}$r-Rao lower bound (CRLB) under low pilot overhead.
△ Less
Submitted 17 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Distance Guided Generative Adversarial Network for Explainable Binary Classifications
Authors:
Xiangyu Xiong,
Yue Sun,
Xiaohong Liu,
Wei Ke,
Chan-Tong Lam,
Jiangang Chen,
Mingfeng Jiang,
Mingwei Wang,
Hui Xie,
Tong Tong,
Qinquan Gao,
Hao Chen,
Tao Tan
Abstract:
Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classi…
▽ More
Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classification. In this paper, we propose a distance guided GAN (DisGAN) which controls the variation degrees of generated samples in the hyperplane space. Specifically, we instantiate the idea of DisGAN by combining two ways. The first way is vertical distance GAN (VerDisGAN) where the inter-domain generation is conditioned on the vertical distances. The second way is horizontal distance GAN (HorDisGAN) where the intra-domain generation is conditioned on the horizontal distances. Furthermore, VerDisGAN can produce the class-specific regions by map** the source images to the hyperplane. Experimental results show that DisGAN consistently outperforms the GAN-based augmentation methods with explainable binary classification. The proposed method can apply to different classification architectures and has potential to extend to multi-class classification.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
Realizing In-Memory Baseband Processing for Ultra-Fast and Energy-Efficient 6G
Authors:
Qunsong Zeng,
Jiawei Liu,
Mingrui Jiang,
Jun Lan,
Yi Gong,
Zhongrui Wang,
Yida Li,
Can Li,
Jim Ignowski,
Kaibin Huang
Abstract:
To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challenges, in-…
▽ More
To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challenges, in-memory computing-based baseband processors using resistive random-access memory (RRAM) present an attractive solution. In this paper, we propose and demonstrate RRAM-implemented in-memory baseband processing for the widely adopted multiple-input-multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) air interface. Its key feature is to execute the key operations, including discrete Fourier transform (DFT) and MIMO detection using linear minimum mean square error (L-MMSE) and zero forcing (ZF), in one-step. In addition, RRAM-based channel estimation module is proposed and discussed. By prototy** and simulations, we demonstrate the feasibility of RRAM-based full-fledged communication system in hardware, and reveal it can outperform state-of-the-art baseband processors with a gain of 91.2$\times$ in latency and 671$\times$ in energy efficiency by large-scale simulations. Our results pave a potential pathway for RRAM-based in-memory computing to be implemented in the era of the sixth generation (6G) mobile communications.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
Mpox-AISM: AI-Mediated Super Monitoring for Mpox and Like-Mpox
Authors:
Yubiao Yue,
Minghua Jiang,
Xinyue Zhang,
Jialong Xu,
Huacong Ye,
Fan Zhang,
Zhenzhang Li,
Yang Li
Abstract:
Swift and accurate diagnosis for earlier-stage monkeypox (mpox) patients is crucial to avoiding its spread. However, the similarities between common skin disorders and mpox and the need for professional diagnosis unavoidably impaired the diagnosis of earlier-stage mpox patients and contributed to mpox outbreak. To address the challenge, we proposed "Super Monitoring", a real-time visualization tec…
▽ More
Swift and accurate diagnosis for earlier-stage monkeypox (mpox) patients is crucial to avoiding its spread. However, the similarities between common skin disorders and mpox and the need for professional diagnosis unavoidably impaired the diagnosis of earlier-stage mpox patients and contributed to mpox outbreak. To address the challenge, we proposed "Super Monitoring", a real-time visualization technique employing artificial intelligence (AI) and Internet technology to diagnose earlier-stage mpox cheaply, conveniently, and quickly. Concretely, AI-mediated "Super Monitoring" (mpox-AISM) integrates deep learning models, data augmentation, self-supervised learning, and cloud services. According to publicly accessible datasets, mpox-AISM's Precision, Recall, Specificity, and F1-score in diagnosing mpox reach 99.3%, 94.1%, 99.9%, and 96.6%, respectively, and it achieves 94.51% accuracy in diagnosing mpox, six like-mpox skin disorders, and normal skin. With the Internet and communication terminal, mpox-AISM has the potential to perform real-time and accurate diagnosis for earlier-stage mpox in real-world scenarios, thereby preventing mpox outbreak.
△ Less
Submitted 15 June, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Day-Ahead PV Power Forecasting Based on MSTL-TFT
Authors:
Xuetao Jiang,
Meiyu Jiang,
Qingguo Zhou
Abstract:
In recent years, renewable energy resources have accounted for an increasing share of electricity energy.Among them, photovoltaic (PV) power generation has received broad attention due to its economic and environmental benefits.Accurate PV generation forecasts can reduce power dispatch from the grid, thus increasing the supplier's profit in the day-ahead electricity market.The power system of a PV…
▽ More
In recent years, renewable energy resources have accounted for an increasing share of electricity energy.Among them, photovoltaic (PV) power generation has received broad attention due to its economic and environmental benefits.Accurate PV generation forecasts can reduce power dispatch from the grid, thus increasing the supplier's profit in the day-ahead electricity market.The power system of a PV site is affected by solar radiation, PV plant properties and meteorological factors, resulting in uncertainty in its power output.This study used multiple seasonal-trend decomposition using LOESS (MSTL) and temporal fusion transformer (TFT) to perform day-ahead PV prediction on the desert knowledge Australia solar centre (DKASC) dataset.We compare the decomposition algorithms (VMD, EEMD and VMD-EEMD) and prediction models (BP, LSTM and XGBoost, etc.) which are commonly used in PV prediction presently.The results show that the MSTL-TFT method is more accurate than the aforementioned methods, which have noticeable improvement compared to other recent day-ahead PV predictions on desert knowledge Australia solar centre (DKASC).
△ Less
Submitted 31 January, 2023; v1 submitted 14 January, 2023;
originally announced January 2023.
-
FECAM: Frequency Enhanced Channel Attention Mechanism for Time Series Forecasting
Authors:
Maowei Jiang,
Pengyu Zeng,
Kai Wang,
Huan Liu,
Wenbo Chen,
Haoran Liu
Abstract:
Time series forecasting is a long-standing challenge due to the real-world information is in various scenario (e.g., energy, weather, traffic, economics, earthquake warning). However some mainstream forecasting model forecasting result is derailed dramatically from ground truth. We believe it's the reason that model's lacking ability of capturing frequency information which richly contains in real…
▽ More
Time series forecasting is a long-standing challenge due to the real-world information is in various scenario (e.g., energy, weather, traffic, economics, earthquake warning). However some mainstream forecasting model forecasting result is derailed dramatically from ground truth. We believe it's the reason that model's lacking ability of capturing frequency information which richly contains in real world datasets. At present, the mainstream frequency information extraction methods are Fourier transform(FT) based. However, use of FT is problematic due to Gibbs phenomenon. If the values on both sides of sequences differ significantly, oscillatory approximations are observed around both sides and high frequency noise will be introduced. Therefore We propose a novel frequency enhanced channel attention that adaptively modelling frequency interdependencies between channels based on Discrete Cosine Transform which would intrinsically avoid high frequency noise caused by problematic periodity during Fourier Transform, which is defined as Gibbs Phenomenon. We show that this network generalize extremely effectively across six real-world datasets and achieve state-of-the-art performance, we further demonstrate that frequency enhanced channel attention mechanism module can be flexibly applied to different networks. This module can improve the prediction ability of existing mainstream networks, which reduces 35.99% MSE on LSTM, 10.01% on Reformer, 8.71% on Informer, 8.29% on Autoformer, 8.06% on Transformer, etc., at a slight computational cost ,with just a few line of code. Our codes and data are available at https://github.com/Zero-coder/FECAM.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Proportionate Recursive Maximum Correntropy Criterion Adaptive Filtering Algorithms and their Performance Analysis
Authors:
Zhen Qin,
Jun Tao,
Le Yang,
Ming Jiang
Abstract:
The maximum correntropy criterion (MCC) has been employed to design outlier-robust adaptive filtering algorithms, among which the recursive MCC (RMCC) algorithm is a typical one. Motivated by the success of our recently proposed proportionate recursive least squares (PRLS) algorithm for sparse system identification, we propose to introduce the proportionate updating (PU) mechanism into the RMCC, l…
▽ More
The maximum correntropy criterion (MCC) has been employed to design outlier-robust adaptive filtering algorithms, among which the recursive MCC (RMCC) algorithm is a typical one. Motivated by the success of our recently proposed proportionate recursive least squares (PRLS) algorithm for sparse system identification, we propose to introduce the proportionate updating (PU) mechanism into the RMCC, leading to two sparsity-aware RMCC algorithms: the proportionate recursive MCC (PRMCC) algorithm and the combinational PRMCC (CPRMCC) algorithm. The CPRMCC is implemented as an adaptive convex combination of two PRMCC filters. For PRMCC, its stability condition and mean-square performance were analyzed. Based on the analysis, optimal parameter selection in nonstationary environments was obtained. Performance study of CPRMCC was also provided and showed that the CPRMCC performs at least as well as the better component PRMCC filter in steady state. Numerical simulations of sparse system identification corroborate the advantage of proposed algorithms as well as the validity of theoretical analysis.
△ Less
Submitted 7 October, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Parallel faceted imaging in radio interferometry via proximal splitting (Faceted HyperSARA): II. Code and real data proof of concept
Authors:
Pierre-Antoine Thouvenin,
Arwa Dabbech,
Ming Jiang,
Abdullah Abdulaziz,
Jean-Philippe Thiran,
Adrian Jackson,
Yves Wiaux
Abstract:
In a companion paper, a faceted wideband imaging technique for radio interferometry, dubbed Faceted HyperSARA, has been introduced and validated on synthetic data. Building on the recent HyperSARA approach, Faceted HyperSARA leverages the splitting functionality inherent to the underlying primal-dual forward-backward algorithm to decompose the image reconstruction over multiple spatio-spectral fac…
▽ More
In a companion paper, a faceted wideband imaging technique for radio interferometry, dubbed Faceted HyperSARA, has been introduced and validated on synthetic data. Building on the recent HyperSARA approach, Faceted HyperSARA leverages the splitting functionality inherent to the underlying primal-dual forward-backward algorithm to decompose the image reconstruction over multiple spatio-spectral facets. The approach allows complex regularization to be injected into the imaging process while providing additional parallelization flexibility compared to HyperSARA. The present paper introduces new algorithm functionalities to address real datasets, implemented as part of a fully fledged MATLAB imaging library made available on Github. A large scale proof-of-concept is proposed to validate Faceted HyperSARA in a new data and parameter scale regime, compared to the state-of-the-art. The reconstruction of a 15 GB wideband image of Cyg A from 7.4 GB of VLA data is considered, utilizing 1440 CPU cores on a HPC system for about 9 hours. The conducted experiments illustrate the reconstruction performance of the proposed approach on real data, exploiting new functionalities to leverage known direction-dependent effects (DDEs), for an accurate model of the measurement operator, and an effective noise level accounting for imperfect calibration. They also demonstrate that, when combined with a further dimensionality reduction functionality, Faceted HyperSARA enables the recovery of a 3.6 GB image of Cyg A from the same data using only 91 CPU cores for 39 hours. In this setting, the proposed approach is shown to provide a superior reconstruction quality compared to the state-of-the-art wideband CLEAN-based algorithm of the WSClean software.
△ Less
Submitted 21 August, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Authors:
Tao Li,
Xinsheng Wang,
Qicong Xie,
Zhichao Wang,
Mingqi Jiang,
Lei Xie
Abstract:
Cross-speaker emotion transfer speech synthesis aims to synthesize emotional speech for a target speaker by transferring the emotion from reference speech recorded by another (source) speaker. In this task, extracting speaker-independent emotion embedding from reference speech plays an important role. However, the emotional information conveyed by such emotion embedding tends to be weakened in the…
▽ More
Cross-speaker emotion transfer speech synthesis aims to synthesize emotional speech for a target speaker by transferring the emotion from reference speech recorded by another (source) speaker. In this task, extracting speaker-independent emotion embedding from reference speech plays an important role. However, the emotional information conveyed by such emotion embedding tends to be weakened in the process to squeeze out the source speaker's timbre information. In response to this problem, a prosody compensation module (PCM) is proposed in this paper to compensate for the emotional information loss. Specifically, the PCM tries to obtain speaker-independent emotional information from the intermediate feature of a pre-trained ASR model. To this end, a prosody compensation encoder with global context (GC) blocks is introduced to obtain global emotional information from the ASR model's intermediate feature. Experiments demonstrate that the proposed PCM can effectively compensate the emotion embedding for the emotional information loss, and meanwhile maintain the timbre of the target speaker. Comparisons with state-of-the-art models show that our proposed method presents obvious superiority on the cross-speaker emotion transfer task.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Authors:
Sarthak Pati,
Ujjwal Baid,
Brandon Edwards,
Micah Sheller,
Shih-Han Wang,
G Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Deepthi Karkada,
Christos Davatzikos,
Chiharu Sako,
Satyam Ghodasara,
Michel Bilello,
Suyash Mohan,
Philipp Vollmuth,
Gianluca Brugnara,
Chandrakanth J Preetha,
Felix Sahm,
Klaus Maier-Hein,
Maximilian Zenk,
Martin Bendszus,
Wolfgang Wick,
Evan Calabrese,
Jeffrey Rudie,
Javier Villanueva-Meyer
, et al. (254 additional authors not shown)
Abstract:
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc…
▽ More
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
△ Less
Submitted 25 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
IOP-FL: Inside-Outside Personalization for Federated Medical Image Segmentation
Authors:
Meirui Jiang,
Hongzheng Yang,
Chen Cheng,
Qi Dou
Abstract:
Federated learning (FL) allows multiple medical institutions to collaboratively learn a global model without centralizing client data. It is difficult, if possible at all, for such a global model to commonly achieve optimal performance for each individual client, due to the heterogeneity of medical images from various scanners and patient demographics. This problem becomes even more significant wh…
▽ More
Federated learning (FL) allows multiple medical institutions to collaboratively learn a global model without centralizing client data. It is difficult, if possible at all, for such a global model to commonly achieve optimal performance for each individual client, due to the heterogeneity of medical images from various scanners and patient demographics. This problem becomes even more significant when deploying the global model to unseen clients outside the FL with unseen distributions not presented during federated training. To optimize the prediction accuracy of each individual client for medical imaging tasks, we propose a novel unified framework for both \textit{Inside and Outside model Personalization in FL} (IOP-FL). Our inside personalization uses a lightweight gradient-based approach that exploits the local adapted model for each client, by accumulating both the global gradients for common knowledge and the local gradients for client-specific optimization. Moreover, and importantly, the obtained local personalized models and the global model can form a diverse and informative routing space to personalize an adapted model for outside FL clients. Hence, we design a new test-time routing scheme using the consistency loss with a shape constraint to dynamically incorporate the models, given the distribution information conveyed by the test data. Our extensive experimental results on two medical image segmentation tasks present significant improvements over SOTA methods on both inside and outside personalization, demonstrating the potential of our IOP-FL scheme for clinical practice.
△ Less
Submitted 29 March, 2023; v1 submitted 16 April, 2022;
originally announced April 2022.
-
HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images
Authors:
Meirui Jiang,
Zirui Wang,
Qi Dou
Abstract:
Multiple medical institutions collaboratively training a model using federated learning (FL) has become a promising solution for maximizing the potential of data-driven models, yet the non-independent and identically distributed (non-iid) data in medical images is still an outstanding challenge in real-world practice. The feature heterogeneity caused by diverse scanners or protocols introduces a d…
▽ More
Multiple medical institutions collaboratively training a model using federated learning (FL) has become a promising solution for maximizing the potential of data-driven models, yet the non-independent and identically distributed (non-iid) data in medical images is still an outstanding challenge in real-world practice. The feature heterogeneity caused by diverse scanners or protocols introduces a drift in the learning process, in both local (client) and global (server) optimizations, which harms the convergence as well as model performance. Many previous works have attempted to address the non-iid issue by tackling the drift locally or globally, but how to jointly solve the two essentially coupled drifts is still unclear. In this work, we concentrate on handling both local and global drifts and introduce a new harmonizing framework called HarmoFL. First, we propose to mitigate the local update drift by normalizing amplitudes of images transformed into the frequency domain to mimic a unified imaging setting, in order to generate a harmonized feature space across local clients. Second, based on harmonized features, we design a client weight perturbation guiding each local model to reach a flat optimum, where a neighborhood area of the local optimal solution has a uniformly low loss. Without any extra communication cost, the perturbation assists the global model to optimize towards a converged optimal solution by aggregating several local flat optima. We have theoretically analyzed the proposed method and empirically conducted extensive experiments on three medical image classification and segmentation tasks, showing that HarmoFL outperforms a set of recent state-of-the-art methods with promising convergence behavior. Code is available at https://github.com/med-air/HarmoFL.
△ Less
Submitted 24 April, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring
Authors:
Wenbin Zou,
Mingchao Jiang,
Yunchen Zhang,
Liang Chen,
Zhiyong Lu,
Yi Wu
Abstract:
Image deblurring is a classical computer vision problem that aims to recover a sharp image from a blurred image. To solve this problem, existing methods apply the Encode-Decode architecture to design the complex networks to make a good performance. However, most of these methods use repeated up-sampling and down-sampling structures to expand the receptive field, which results in texture informatio…
▽ More
Image deblurring is a classical computer vision problem that aims to recover a sharp image from a blurred image. To solve this problem, existing methods apply the Encode-Decode architecture to design the complex networks to make a good performance. However, most of these methods use repeated up-sampling and down-sampling structures to expand the receptive field, which results in texture information loss during the sampling process and some of them design the multiple stages that lead to difficulties with convergence. Therefore, our model uses dilated convolution to enable the obtainment of the large receptive field with high spatial resolution. Through making full use of the different receptive fields, our method can achieve better performance. On this basis, we reduce the number of up-sampling and down-sampling and design a simple network structure. Besides, we propose a novel module using the wavelet transform, which effectively helps the network to recover clear high-frequency texture details. Qualitative and quantitative evaluations of real and synthetic datasets show that our deblurring method is comparable to existing algorithms in terms of performance with much lower training requirements. The source code and pre-trained models are available at https://github.com/FlyEgle/SDWNet.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark
Authors:
Martin Wagner,
Beat-Peter Müller-Stich,
Anna Kisilenko,
Duc Tran,
Patrick Heger,
Lars Mündermann,
David M Lubotsky,
Benjamin Müller,
Tornike Davitashvili,
Manuela Capek,
Annika Reinke,
Tong Yu,
Armine Vardazaryan,
Chinedu Innocent Nwoye,
Nicolas Padoy,
Xinyang Liu,
Eung-Joo Lee,
Constantin Disch,
Hans Meine,
Tong Xia,
Fucang Jia,
Satoshi Kondo,
Wolfgang Reiter,
Yueming **,
Yonghao Long
, et al. (16 additional authors not shown)
Abstract:
PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo…
▽ More
PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center dataset. In this work we investigated the generalizability of phase recognition algorithms in a multi-center setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 teams submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n=9 teams), for instrument presence detection between 38.5% and 63.8% (n=8 teams), but for action recognition only between 21.8% and 23.3% (n=5 teams). The average absolute error for skill assessment was 0.78 (n=1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but are not solved yet, as shown by our comparison of algorithms. This novel benchmark can be used for comparable evaluation and validation of future work.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
An Integrated Framework for the Heterogeneous Spatio-Spectral-Temporal Fusion of Remote Sensing Images
Authors:
Menghui Jiang,
Huanfeng Shen,
Jie Li,
Liangpei Zhang
Abstract:
Image fusion technology is widely used to fuse the complementary information between multi-source remote sensing images. Inspired by the frontier of deep learning, this paper first proposes a heterogeneous-integrated framework based on a novel deep residual cycle GAN. The proposed network consists of a forward fusion part and a backward degeneration feedback part. The forward part generates the de…
▽ More
Image fusion technology is widely used to fuse the complementary information between multi-source remote sensing images. Inspired by the frontier of deep learning, this paper first proposes a heterogeneous-integrated framework based on a novel deep residual cycle GAN. The proposed network consists of a forward fusion part and a backward degeneration feedback part. The forward part generates the desired fusion result from the various observations; the backward degeneration feedback part considers the imaging degradation process and regenerates the observations inversely from the fusion result. The proposed network can effectively fuse not only the homogeneous but also the heterogeneous information. In addition, for the first time, a heterogeneous-integrated fusion framework is proposed to simultaneously merge the complementary heterogeneous spatial, spectral and temporal information of multi-source heterogeneous observations. The proposed heterogeneous-integrated framework also provides a uniform mode that can complete various fusion tasks, including heterogeneous spatio-spectral fusion, spatio-temporal fusion, and heterogeneous spatio-spectral-temporal fusion. Experiments are conducted for two challenging scenarios of land cover changes and thick cloud coverage. Images from many remote sensing satellites, including MODIS, Landsat-8, Sentinel-1, and Sentinel-2, are utilized in the experiments. Both qualitative and quantitative evaluations confirm the effectiveness of the proposed method.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Coupling Model-Driven and Data-Driven Methods for Remote Sensing Image Restoration and Fusion
Authors:
Huanfeng Shen,
Menghui Jiang,
Jie Li,
Chenxia Zhou,
Qiangqiang Yuan,
Liangpei Zhang
Abstract:
In the fields of image restoration and image fusion, model-driven methods and data-driven methods are the two representative frameworks. However, both approaches have their respective advantages and disadvantages. The model-driven methods consider the imaging mechanism, which is deterministic and theoretically reasonable; however, they cannot easily model complicated nonlinear problems. The data-d…
▽ More
In the fields of image restoration and image fusion, model-driven methods and data-driven methods are the two representative frameworks. However, both approaches have their respective advantages and disadvantages. The model-driven methods consider the imaging mechanism, which is deterministic and theoretically reasonable; however, they cannot easily model complicated nonlinear problems. The data-driven methods have a stronger prior knowledge learning capability for huge data, especially for nonlinear statistical features; however, the interpretability of the networks is poor, and they are over-dependent on training data. In this paper, we systematically investigate the coupling of model-driven and data-driven methods, which has rarely been considered in the remote sensing image restoration and fusion communities. We are the first to summarize the coupling approaches into the following three categories: 1) data-driven and model-driven cascading methods; 2) variational models with embedded learning; and 3) model-constrained network learning methods. The typical existing and potential coupling methods for remote sensing image restoration and fusion are introduced with application examples. This paper also gives some new insights into the potential future directions, in terms of both methods and applications.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
FA-GAN: Fused Attentive Generative Adversarial Networks for MRI Image Super-Resolution
Authors:
Mingfeng Jiang,
Minghao Zhi,
Liying Wei,
Xiaocheng Yang,
Jucheng Zhang,
Yongming Li,
Pin Wang,
Jiahao Huang,
Guang Yang
Abstract:
High-resolution magnetic resonance images can provide fine-grained anatomical information, but acquiring such data requires a long scanning time. In this paper, a framework called the Fused Attentive Generative Adversarial Networks(FA-GAN) is proposed to generate the super-resolution MR image from low-resolution magnetic resonance images, which can reduce the scanning time effectively but with hig…
▽ More
High-resolution magnetic resonance images can provide fine-grained anatomical information, but acquiring such data requires a long scanning time. In this paper, a framework called the Fused Attentive Generative Adversarial Networks(FA-GAN) is proposed to generate the super-resolution MR image from low-resolution magnetic resonance images, which can reduce the scanning time effectively but with high resolution MR images. In the framework of the FA-GAN, the local fusion feature block, consisting of different three-pass networks by using different convolution kernels, is proposed to extract image features at different scales. And the global feature fusion module, including the channel attention module, the self-attention module, and the fusion operation, is designed to enhance the important features of the MR image. Moreover, the spectral normalization process is introduced to make the discriminator network stable. 40 sets of 3D magnetic resonance images (each set of images contains 256 slices) are used to train the network, and 10 sets of images are used to test the proposed method. The experimental results show that the PSNR and SSIM values of the super-resolution magnetic resonance image generated by the proposed FA-GAN method are higher than the state-of-the-art reconstruction methods.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Segmentation of common and internal carotid arteries from 3D ultrasound images using adaptive triple U-Net
Authors:
Mingjie Jiang,
Yuan Zhao,
Bernard Chiu
Abstract:
Objective: Vessel-wall-volume (VWV) and localized vessel-wall-thickness (VWT) measured from 3D ultrasound (US) carotid images are sensitive to anti-atherosclerotic effects of medical/dietary treatments. VWV and VWT measurements require the lumen-intima (LIB) and media-adventitia boundaries (MAB) at the common and internal carotid arteries (CCA and ICA). However, most existing segmentation techniqu…
▽ More
Objective: Vessel-wall-volume (VWV) and localized vessel-wall-thickness (VWT) measured from 3D ultrasound (US) carotid images are sensitive to anti-atherosclerotic effects of medical/dietary treatments. VWV and VWT measurements require the lumen-intima (LIB) and media-adventitia boundaries (MAB) at the common and internal carotid arteries (CCA and ICA). However, most existing segmentation techniques were capable of automating only CCA segmentation. An approach capable of segmenting the MAB and LIB from the CCA and ICA was required to accelerate VWV and VWT quantification. Methods: Segmentation for CCA and ICA were performed independently using the proposed two-channel U-Net, which was driven by a novel loss function known as the adaptive triple Dice loss (ADTL). A test-time augmentation (TTA) approach is used, in which segmentation was performed three times based on axial images and its flipped versions; the final segmentation was generated by pixel-wise majority voting. Results: Experiments involving 224 3DUS volumes produce a Dice-similarity-coefficient (DSC) of 95.1%$\pm$4.1% and 91.6%$\pm$6.6% for the MAB and LIB, in the CCA, respectively, and 94.2%$\pm$3.3% and 89.0%$\pm$8.1% for the MAB and LIB, in the ICA, respectively. TTA and ATDL independently contributed to a statistically significant improvement to all boundaries except the LIB in ICA. The total time required to segment the entire 3DUS volume (CCA+ICA) is 1.4s. Conclusion: The proposed two-channel U-Net with ADTL and TTA can segment the CCA and ICA accurately and efficiently from the 3DUS volume. Significance: Our approach has the potential to accelerate the transition of 3DUS measurements of carotid atherosclerosis to clinical research.
△ Less
Submitted 9 February, 2021; v1 submitted 27 January, 2021;
originally announced January 2021.
-
A Multi-intersection Vehicular Cooperative Control based on End-Edge-Cloud Computing
Authors:
Mingzhi Jiang,
Tianhao Wu,
Zhe Wang,
Yi Gong,
Lin Zhang,
Ren ** Liu
Abstract:
Cooperative Intelligent Transportation Systems (C-ITS) will change the modes of road safety and traffic management, especially at intersections without traffic lights, namely unsignalized intersections. Existing researches focus on vehicle control within a small area around an unsignalized intersection. In this paper, we expand the control domain to a large area with multiple intersections. In par…
▽ More
Cooperative Intelligent Transportation Systems (C-ITS) will change the modes of road safety and traffic management, especially at intersections without traffic lights, namely unsignalized intersections. Existing researches focus on vehicle control within a small area around an unsignalized intersection. In this paper, we expand the control domain to a large area with multiple intersections. In particular, we propose a Multi-intersection Vehicular Cooperative Control (MiVeCC) to enable cooperation among vehicles in a large area with multiple unsignalized intersections. Firstly, a vehicular end-edge-cloud computing framework is proposed to facilitate end-edge-cloud vertical cooperation and horizontal cooperation among vehicles. Then, the vehicular cooperative control problems in the cloud and edge layers are formulated as Markov Decision Process (MDP) and solved by two-stage reinforcement learning. Furthermore, to deal with high-density traffic, vehicle selection methods are proposed to reduce the state space and accelerate algorithm convergence without performance degradation. A multi-intersection simulation platform is developed to evaluate the proposed scheme. Simulation results show that the proposed MiVeCC can improve travel efficiency at multiple intersections by up to 4.59 times without collision compared with existing methods.
△ Less
Submitted 1 December, 2020;
originally announced December 2020.
-
Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets Training
Authors:
Dingquan Li,
Tingting Jiang,
Ming Jiang
Abstract:
Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and disto…
▽ More
Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of data-driven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear map**, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e., LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at https://github.com/lidq92/MDTVSFA.
△ Less
Submitted 15 November, 2020; v1 submitted 9 November, 2020;
originally announced November 2020.
-
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Authors:
Di Hu,
Rui Qian,
Minyue Jiang,
Xiao Tan,
Shilei Wen,
Errui Ding,
Weiyao Lin,
De**g Dou
Abstract:
Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the s…
▽ More
Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes. Code is available at https://github.com/DTaoo/Discriminative-Sounding-Objects-Localization.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Intelligent Hotel ROS-based Service Robot
Authors:
Yanyu Zhang,
Xiu Wang,
Xuan Wu,
Wen**g Zhang,
Meiqian Jiang,
Mahmood Al-Khassaweneh
Abstract:
With the advances of artificial intelligence (AI) technology, many studies and work have been carried out on how robots could replace human labor. In this paper, we present a ROS based intelligence hotel robot, which simplifies the check-in process. We use pioneer 3dx robot and considered different environment settings. The robot combined with Hokuyo Lidar and Kinect Xbox camera, can plan the rout…
▽ More
With the advances of artificial intelligence (AI) technology, many studies and work have been carried out on how robots could replace human labor. In this paper, we present a ROS based intelligence hotel robot, which simplifies the check-in process. We use pioneer 3dx robot and considered different environment settings. The robot combined with Hokuyo Lidar and Kinect Xbox camera, can plan the routes accurately and reach rooms in different floors. In addition, we added an intelligent voice system which provides an assistant for the customers.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment
Authors:
Dingquan Li,
Tingting Jiang,
Ming Jiang
Abstract:
Currently, most image quality assessment (IQA) models are supervised by the MAE or MSE loss with empirically slow convergence. It is well-known that normalization can facilitate fast convergence. Therefore, we explore normalization in the design of loss functions for IQA. Specifically, we first normalize the predicted quality scores and the corresponding subjective quality scores. Then, the loss i…
▽ More
Currently, most image quality assessment (IQA) models are supervised by the MAE or MSE loss with empirically slow convergence. It is well-known that normalization can facilitate fast convergence. Therefore, we explore normalization in the design of loss functions for IQA. Specifically, we first normalize the predicted quality scores and the corresponding subjective quality scores. Then, the loss is defined based on the norm of the differences between these normalized values. The resulting "Norm-in-Norm'' loss encourages the IQA model to make linear predictions with respect to subjective quality scores. After training, the least squares regression is applied to determine the linear map** from the predicted quality to the subjective quality. It is shown that the new loss is closely connected with two common IQA performance criteria (PLCC and RMSE). Through theoretical analysis, it is proved that the embedded normalization makes the gradients of the loss function more stable and more predictable, which is conducive to the faster convergence of the IQA model. Furthermore, to experimentally verify the effectiveness of the proposed loss, it is applied to solve a challenging problem: quality assessment of in-the-wild images. Experiments on two relevant datasets (KonIQ-10k and CLIVE) show that, compared to MAE or MSE loss, the new loss enables the IQA model to converge about 10 times faster and the final model achieves better performance. The proposed model also achieves state-of-the-art prediction performance on this challenging problem. For reproducible scientific research, our code is publicly available at https://github.com/lidq92/LinearityIQA.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Specification mining and automated task planning for autonomous robots based on a graph-based spatial temporal logic
Authors:
Zhiyu Liu,
Meng Jiang,
Hai Lin
Abstract:
We aim to enable an autonomous robot to learn new skills from demo videos and use these newly learned skills to accomplish non-trivial high-level tasks. The goal of develo** such autonomous robot involves knowledge representation, specification mining, and automated task planning. For knowledge representation, we use a graph-based spatial temporal logic (GSTL) to capture spatial and temporal inf…
▽ More
We aim to enable an autonomous robot to learn new skills from demo videos and use these newly learned skills to accomplish non-trivial high-level tasks. The goal of develo** such autonomous robot involves knowledge representation, specification mining, and automated task planning. For knowledge representation, we use a graph-based spatial temporal logic (GSTL) to capture spatial and temporal information of related skills demonstrated by demo videos. We design a specification mining algorithm to generate a set of parametric GSTL formulas from demo videos by inductively constructing spatial terms and temporal formulas. The resulting parametric GSTL formulas from specification mining serve as a domain theory, which is used in automated task planning for autonomous robots. We propose an automatic task planning based on GSTL where a proposer is used to generate ordered actions, and a verifier is used to generate executable task plans. A table setting example is used throughout the paper to illustrate the main ideas.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
Joint Device Scheduling and Resource Allocation for Latency Constrained Wireless Federated Learning
Authors:
Wenqi Shi,
Sheng Zhou,
Zhisheng Niu,
Miao Jiang,
Lu Geng
Abstract:
In federated learning (FL), devices contribute to the global training by uploading their local model updates via wireless channels. Due to limited computation and communication resources, device scheduling is crucial to the convergence rate of FL. In this paper, we propose a joint device scheduling and resource allocation policy to maximize the model accuracy within a given total training time bud…
▽ More
In federated learning (FL), devices contribute to the global training by uploading their local model updates via wireless channels. Due to limited computation and communication resources, device scheduling is crucial to the convergence rate of FL. In this paper, we propose a joint device scheduling and resource allocation policy to maximize the model accuracy within a given total training time budget for latency constrained wireless FL. A lower bound on the reciprocal of the training performance loss, in terms of the number of training rounds and the number of scheduled devices per round, is derived. Based on the bound, the accuracy maximization problem is solved by decoupling it into two sub-problems. First, given the scheduled devices, the optimal bandwidth allocation suggests allocating more bandwidth to the devices with worse channel conditions or weaker computation capabilities. Then, a greedy device scheduling algorithm is introduced, which in each step selects the device consuming the least updating time obtained by the optimal bandwidth allocation, until the lower bound begins to increase, meaning that scheduling more devices will degrade the model accuracy. Experiments show that the proposed policy outperforms state-of-the-art scheduling policies under extensive settings of data distributions and cell radius.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
MeshfreeFlowNet: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework
Authors:
Chiyu Max Jiang,
Soheil Esmaeilzadeh,
Kamyar Azizzadenesheli,
Karthik Kashinath,
Mustafa Mustafa,
Hamdi A. Tchelepi,
Philip Marcus,
Prabhat,
Anima Anandkumar
Abstract:
We propose MeshfreeFlowNet, a novel deep learning-based super-resolution framework to generate continuous (grid-free) spatio-temporal solutions from the low-resolution inputs. While being computationally efficient, MeshfreeFlowNet accurately recovers the fine-scale quantities of interest. MeshfreeFlowNet allows for: (i) the output to be sampled at all spatio-temporal resolutions, (ii) a set of Par…
▽ More
We propose MeshfreeFlowNet, a novel deep learning-based super-resolution framework to generate continuous (grid-free) spatio-temporal solutions from the low-resolution inputs. While being computationally efficient, MeshfreeFlowNet accurately recovers the fine-scale quantities of interest. MeshfreeFlowNet allows for: (i) the output to be sampled at all spatio-temporal resolutions, (ii) a set of Partial Differential Equation (PDE) constraints to be imposed, and (iii) training on fixed-size inputs on arbitrarily sized spatio-temporal domains owing to its fully convolutional encoder. We empirically study the performance of MeshfreeFlowNet on the task of super-resolution of turbulent flows in the Rayleigh-Benard convection problem. Across a diverse set of evaluation metrics, we show that MeshfreeFlowNet significantly outperforms existing baselines. Furthermore, we provide a large scale implementation of MeshfreeFlowNet and show that it efficiently scales across large clusters, achieving 96.80% scaling efficiency on up to 128 GPUs and a training time of less than 4 minutes.
△ Less
Submitted 21 August, 2020; v1 submitted 1 May, 2020;
originally announced May 2020.
-
Joint Shortening and Puncturing Optimization for Structured LDPC Codes
Authors:
Yuejun Wei,
Yuhang Yang,
Ming Jiang,
Wen Chen,
Lili Wei
Abstract:
The demand for flexible broadband wireless services makes the pruning technique, including both shortening and puncturing, an indispensable component of error correcting codes. The analysis of the pruning process for structured lowdensity parity-check (LDPC) codes can be considerably simplified with their equivalent representations through base-matrices or protographs. In this letter, we evaluate…
▽ More
The demand for flexible broadband wireless services makes the pruning technique, including both shortening and puncturing, an indispensable component of error correcting codes. The analysis of the pruning process for structured lowdensity parity-check (LDPC) codes can be considerably simplified with their equivalent representations through base-matrices or protographs. In this letter, we evaluate the thresholds of the pruned base-matrices by using protograph based on extrinsic information transfer (PEXIT). We also provide an efficient method to optimize the pruning patterns, which can significantly improve the thresholds of both the full-length patterns and the sub-patterns. Numerical results show that the structured LDPC codes pruned by the improved patterns outperform those with the existing patterns.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
A CRC-aided Hybrid Decoding for Turbo Codes
Authors:
Yuejun Wei,
Ming Jiang,
Wen Chen,
Yuhang Yang
Abstract:
Turbo codes and CRC codes are usually decoded separately according to the serially concatenated inner codes and outer codes respectively. In this letter, we propose a hybrid decoding algorithm of turbo-CRC codes, where the outer codes, CRC codes, are not used for error detection but as an assistance to improve the error correction performance. Two independent iterative decoding and reliability bas…
▽ More
Turbo codes and CRC codes are usually decoded separately according to the serially concatenated inner codes and outer codes respectively. In this letter, we propose a hybrid decoding algorithm of turbo-CRC codes, where the outer codes, CRC codes, are not used for error detection but as an assistance to improve the error correction performance. Two independent iterative decoding and reliability based decoding are carried out in a hybrid schedule, which can efficiently decode the two different codes as an entire codeword. By introducing an efficient error detecting method based on normalized Euclidean distance without CRC check, significant gain can be obtained by using the hybrid decoding method without loss of the error detection ability.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
An inverse-system method for identification of dam** rate functions in non-Markovian quantum systems
Authors:
Shibei Xue,
Lingyu Tan,
Rebing Wu,
Min Jiang,
Ian R. Petersen
Abstract:
Identification of complicated quantum environments lies in the core of quantum engineering, which systematically constructs an environment model with the aim of accurate control of quantum systems. In this paper, we present an inverse-system method to identify dam** rate functions which describe non-Markovian environments in time-convolution-less master equations. To access information on the en…
▽ More
Identification of complicated quantum environments lies in the core of quantum engineering, which systematically constructs an environment model with the aim of accurate control of quantum systems. In this paper, we present an inverse-system method to identify dam** rate functions which describe non-Markovian environments in time-convolution-less master equations. To access information on the environment, we couple a finite-level quantum system to the environment and measure time traces of local observables of the system. By using sufficient measurement results, an algorithm is designed, which can simultaneously estimate multiple dam** rate functions for different dissipative channels. Further, we show that identifiability for the dam** rate functions corresponds to the invertibility of the system and a necessary condition for identifiability is also given. The effectiveness of our method is shown in examples of an atom and three-spin-chain non-Markovian systems.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
Segmentation of carotid vessel wall using U-Net and segmentation average network
Authors:
Mingjie Jiang,
J. David Spence,
Bernard Chiu
Abstract:
Segmentation of carotid vessel wall is required in vessel wall volume (VWV) and local vessel-wall-plus-plaque thickness (VWT) quantification of the carotid artery. Manual segmentation of the vessel wall is time-consuming and prone to interobserver variability. In this paper, we proposed a convolution neural network to segment the common carotid artery (CCA) from 3D carotid ultrasound images. The p…
▽ More
Segmentation of carotid vessel wall is required in vessel wall volume (VWV) and local vessel-wall-plus-plaque thickness (VWT) quantification of the carotid artery. Manual segmentation of the vessel wall is time-consuming and prone to interobserver variability. In this paper, we proposed a convolution neural network to segment the common carotid artery (CCA) from 3D carotid ultrasound images. The proposed CNN involves three U-Nets that segmented the 3D ultrasound (3DUS) images in the axial, lateral and frontal orientations. The segmentation maps generated by three U-Nets were consolidated by a novel segmentation average network (SAN) we proposed in this paper. The experimental results show that the proposed CNN improved the Dice similarity coefficient (DSC) for vessel wall segmentation from 64.8% to 67.5%, the sensitivity from 63.8% to 70.5%, and the area under receiver operator characteristic curve (AUC) from 0.89 to 0.94.
△ Less
Submitted 26 February, 2020;
originally announced February 2020.
-
Joint Beamforming Design in Multi-Cluster MISO NOMA Intelligent Reflecting Surface-Aided Downlink Communication Networks
Authors:
Yiqing Li,
Miao Jiang,
Qi Zhang,
Jiayin Qin
Abstract:
Considering intelligent reflecting surface (IRS), we study a multi-cluster multiple-input-single-output (MISO) non-orthogonal multiple access (NOMA) downlink communication network. In the network, an IRS assists the communication from the base station (BS) to all users by passive beamforming. Our goal is to minimize the total transmit power by jointly optimizing the transmit beamforming vectors at…
▽ More
Considering intelligent reflecting surface (IRS), we study a multi-cluster multiple-input-single-output (MISO) non-orthogonal multiple access (NOMA) downlink communication network. In the network, an IRS assists the communication from the base station (BS) to all users by passive beamforming. Our goal is to minimize the total transmit power by jointly optimizing the transmit beamforming vectors at the BS and the reflection coefficient vector at the IRS. Because of the restrictions on the IRS reflection amplitudes and phase shifts, the formulated quadratically constrained quadratic problem is highly non-convex. For the aforementioned problem, the conventional semidefinite programming (SDP) based algorithm has prohibitively high computational complexity and deteriorating performance. Here, we propose an effective second-order cone programming (SOCP)-alternating direction method of multipliers (ADMM) based algorithm to obtain the locally optimal solution. To reduce the computational complexity, we also propose a low-complexity zero-forcing (ZF) based suboptimal algorithm. It is shown through simulation results that our proposed SOCP-ADMM based algorithm achieves significant performance gain over the conventional SDP based algorithm. Furthermore, when the number of passive reflection elements is relatively high, our proposed ZF-based suboptimal algorithm also outperforms the SDP based algorithm.
△ Less
Submitted 15 September, 2019;
originally announced September 2019.
-
MSU-Net: Multiscale Statistical U-Net for Real-time 3D Cardiac MRI Video Segmentation
Authors:
Tianchen Wang,
**jun Xiong,
Xiaowei Xu,
Meng Jiang,
Yiyu Shi,
Haiyun Yuan,
Mei** Huang,
Jian Zhuang
Abstract:
Cardiac magnetic resonance imaging (MRI) is an essential tool for MRI-guided surgery and real-time intervention. The MRI videos are expected to be segmented on-the-fly in real practice. However, existing segmentation methods would suffer from drastic accuracy loss when modified for speedup. In this work, we propose Multiscale Statistical U-Net (MSU-Net) for real-time 3D MRI video segmentation in c…
▽ More
Cardiac magnetic resonance imaging (MRI) is an essential tool for MRI-guided surgery and real-time intervention. The MRI videos are expected to be segmented on-the-fly in real practice. However, existing segmentation methods would suffer from drastic accuracy loss when modified for speedup. In this work, we propose Multiscale Statistical U-Net (MSU-Net) for real-time 3D MRI video segmentation in cardiac surgical guidance. Our idea is to model the input samples as multiscale canonical form distributions for speedup, while the spatio-temporal correlation is still fully utilized. A parallel statistical U-Net is then designed to efficiently process these distributions. The fast data sampling and efficient parallel structure of MSU-Net endorse the fast and accurate inference. Compared with vanilla U-Net and a modified state-of-the-art method GridNet, our method achieves up to 268% and 237% speedup with 1.6% and 3.6% increased Dice scores.
△ Less
Submitted 14 September, 2019;
originally announced September 2019.
-
Fiber-optic joint time and frequency transfer with the same wavelength
Authors:
Jialiang Wang,
Chaolei Yue,
Yueli Xi,
Yanguang Sun,
Nan Cheng,
Fei Yang,
Mingyu Jiang,
Jianfeng Sun,
Youzhen Gui,
Haiwen Cai
Abstract:
Optical fiber links have demonstrated their ability to transfer the ultra-stable clock signals. In this paper we propose and demonstrate a new scheme to transfer both time and radio frequency with the same wavelength based on coherent demodulation technique. Time signal is encoded as a binary phase-shift keying (BPSK) to the optical carrier using electro optic modulator (EOM) by phase modulation a…
▽ More
Optical fiber links have demonstrated their ability to transfer the ultra-stable clock signals. In this paper we propose and demonstrate a new scheme to transfer both time and radio frequency with the same wavelength based on coherent demodulation technique. Time signal is encoded as a binary phase-shift keying (BPSK) to the optical carrier using electro optic modulator (EOM) by phase modulation and makes sure the frequency signal free from interference with single pulse. The phase changes caused by the fluctuations of the transfer links are actively cancelled at local site by optical delay lines. Radio frequency with 1GHz and time signal with one pulse per second (1PPS) transmitted over a 110km fiber spools are obtained. The experimental results demonstrate that frequency instabilities of 1.7E-14 at 1s and 5.9E-17 at 104s. Moreover, time interval transfer of 1PPS signal reaches sub-ps stability after 1000s. This scheme offers advantages with respect to reduce the channel in fiber network, and can keep time and frequency signal independent of each other.
△ Less
Submitted 7 September, 2019;
originally announced September 2019.
-
Quality Assessment of In-the-Wild Videos
Authors:
Dingquan Li,
Tingting Jiang,
Ming Jiang
Abstract:
Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for t…
▽ More
Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.
△ Less
Submitted 5 October, 2019; v1 submitted 1 August, 2019;
originally announced August 2019.
-
A gradient algorithm for Hamiltonian identification of open quantum systems
Authors:
Shibei Xue,
Rebing Wu,
Dewei Li,
Min Jiang
Abstract:
In this paper, we present a gradient algorithm for identifying unknown parameters in an open quantum system from the measurements of time traces of local observables. The open system dynamics is described by a general Markovian master equation based on which the Hamiltonian identification problem can be formulated as minimizing the distance between the real time traces of the observables and those…
▽ More
In this paper, we present a gradient algorithm for identifying unknown parameters in an open quantum system from the measurements of time traces of local observables. The open system dynamics is described by a general Markovian master equation based on which the Hamiltonian identification problem can be formulated as minimizing the distance between the real time traces of the observables and those predicted by the master equation. The unknown parameters can then be learned with a gradient descent algorithm from the measurement data. We verify the effectiveness of our algorithm in a circuit QED system described by a Jaynes-Cumming model whose Hamiltonian identification has been rarely considered. We also show that our gradient algorithm can learn the spectrum of a non-Markovian environment based on an augmented system model.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Cascade-Net: a New Deep Learning Architecture for OFDM Detection
Authors:
Qisheng Huang,
Chunming Zhao,
Ming Jiang,
Xiaoming Li,
**g Liang
Abstract:
In this paper, we consider using deep neural network for OFDM symbol detection and demonstrate its performance advantages in combating large Doppler Shift. In particular, a new architecture named Cascade-Net is proposed for detection, where deep neural network is cascading with a zero-forcing preprocessor to prevent the network stucking in a saddle point or a local minimum point. In addition, we p…
▽ More
In this paper, we consider using deep neural network for OFDM symbol detection and demonstrate its performance advantages in combating large Doppler Shift. In particular, a new architecture named Cascade-Net is proposed for detection, where deep neural network is cascading with a zero-forcing preprocessor to prevent the network stucking in a saddle point or a local minimum point. In addition, we propose a sliding detection approach in order to detect OFDM symbols with large number of subcarriers. We evaluate this new architecture, as well as the sliding algorithm, using the Rayleigh channel with large Doppler spread, which could degrade detection performance in an OFDM system and is especially severe for high frequency band and mmWave communications. The numerical results of OFDM detection in SISO scenario show that cascade-net can achieve better performance than zero-forcing method while providing robustness against ill conditioned channels. We also show the better performance of the sliding cascade network (SCN) compared to sliding zero-forcing detector through numerical simulation.
△ Less
Submitted 30 November, 2018;
originally announced December 2018.
-
Exploiting High-Level Semantics for No-Reference Image Quality Assessment of Realistic Blur Images
Authors:
Dingquan Li,
Tingting Jiang,
Ming Jiang
Abstract:
To guarantee a satisfying Quality of Experience (QoE) for consumers, it is required to measure image quality efficiently and reliably. The neglect of the high-level semantic information may result in predicting a clear blue sky as bad quality, which is inconsistent with human perception. Therefore, in this paper, we tackle this problem by exploiting the high-level semantics and propose a novel no-…
▽ More
To guarantee a satisfying Quality of Experience (QoE) for consumers, it is required to measure image quality efficiently and reliably. The neglect of the high-level semantic information may result in predicting a clear blue sky as bad quality, which is inconsistent with human perception. Therefore, in this paper, we tackle this problem by exploiting the high-level semantics and propose a novel no-reference image quality assessment method for realistic blur images. Firstly, the whole image is divided into multiple overlap** patches. Secondly, each patch is represented by the high-level feature extracted from the pre-trained deep convolutional neural network model. Thirdly, three different kinds of statistical structures are adopted to aggregate the information from different patches, which mainly contain some common statistics (i.e., the mean\&standard deviation, quantiles and moments). Finally, the aggregated features are fed into a linear regression model to predict the image quality. Experiments show that, compared with low-level features, high-level features indeed play a more critical role in resolving the aforementioned challenging problem for quality estimation. Besides, the proposed method significantly outperforms the state-of-the-art methods on two realistic blur image databases and achieves comparable performance on two synthetic blur image databases.
△ Less
Submitted 18 October, 2018;
originally announced October 2018.
-
Joint bi-modal image reconstruction of DOT and XCT with an extended Mumford-Shah functional
Authors:
Di He,
Ming Jiang,
Alfred K. Louis,
Peter Maass,
Thomas Page
Abstract:
Feature similarity measures are indispensable for joint image reconstruction in multi-modality medical imaging, which enable joint multi-modal image reconstruction (JmmIR) by communication of feature information from one modality to another, and vice versa. In this work, we establish an image similarity measure in terms of image edges from Tversky's theory of feature similarity in psychology. For…
▽ More
Feature similarity measures are indispensable for joint image reconstruction in multi-modality medical imaging, which enable joint multi-modal image reconstruction (JmmIR) by communication of feature information from one modality to another, and vice versa. In this work, we establish an image similarity measure in terms of image edges from Tversky's theory of feature similarity in psychology. For joint bi-modal image reconstruction (JbmIR), it is found that this image similarity measure is an extended Mumford-Shah functional with a-priori edge information proposed previously from the perspective of regularization approach. This image similarity measure consists of Hausdorff measures of the common and different parts of image edges from both modalities. By construction, it posits that two images are more similar if they have more common edges and fewer unique/distinctive features, and will not force the nonexistent structures to be reconstructed when applied to JbmIR. With the Gamma-approximation of the JbmIR functional, an alternating minimization method is proposed for the JbmIR of diffuse optical tomography and x-ray computed tomography. The performance of the proposed method is evaluated by three numerical phantoms. It is found that the proposed method improves the reconstructed image quality by more than 10% compared to single modality image reconstruction (SmIR) in terms of the structural similarity index measure (SSIM)
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
Multi-Sensor Control for Multi-Target Tracking Using Cauchy-Schwarz Divergence
Authors:
Meng Jiang,
Wei Yi,
Lingjiang Kong
Abstract:
The paper addresses the problem of multi-sensor control for multi-target tracking via labelled random finite sets (RFS) in the sensor network systems. Based on an information theoretic divergence measure, namely Cauchy-Schwarz (CS) divergence which admits a closed form solution for GLMB densities, we propose two novel multi-sensor control approaches in the framework of generalized Covariance Inter…
▽ More
The paper addresses the problem of multi-sensor control for multi-target tracking via labelled random finite sets (RFS) in the sensor network systems. Based on an information theoretic divergence measure, namely Cauchy-Schwarz (CS) divergence which admits a closed form solution for GLMB densities, we propose two novel multi-sensor control approaches in the framework of generalized Covariance Intersection (GCI). The first joint decision making (JDM) method is optimal and can achieve overall good performance, while the second independent decision making (IDM) method is suboptimal as a fast realization with smaller amount of computations. Simulation in challenging situation is presented to verify the effectiveness of the two proposed approaches.
△ Less
Submitted 28 March, 2016;
originally announced March 2016.
-
Distributed Multi-Sensor Fusion Using Generalized Multi-Bernoulli Densities
Authors:
Meng Jiang,
Wei Yi,
Reza Hoseinnezhad,
Lingjiang Kong
Abstract:
The paper addresses distributed multi-target tracking in the framework of generalized Covariance Intersection (GCI) over multistatic radar system. The proposed method is based on the unlabeled version of generalized labeled multi-Bernoulli (GLMB) family by discarding the labels, referred as generalized multi-Bernoulli (GMB) family. However, it doesn't permit closed form solution for GCI fusion wit…
▽ More
The paper addresses distributed multi-target tracking in the framework of generalized Covariance Intersection (GCI) over multistatic radar system. The proposed method is based on the unlabeled version of generalized labeled multi-Bernoulli (GLMB) family by discarding the labels, referred as generalized multi-Bernoulli (GMB) family. However, it doesn't permit closed form solution for GCI fusion with GMB family. To solve this challenging problem, firstly, we propose an efficient approximation to the GMB family which preserves both the probability hypothesis density (PHD) and cardinality distribution, named as second-order approximation of GMB (SO-GMB) density. Then, we derive explicit expression for the GCI fusion with SO-GMB density. Finally, we compare the first-order approximation of GMB (FO-GMB) density with SO-GMB density in two scenarios and make a concrete analysis of the advantages of the second-order approximation. Simulation results are presented to verify the proposed approach.
△ Less
Submitted 21 March, 2016;
originally announced March 2016.