Search | arXiv e-print repository

Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction

Authors: Akash Awasthi, Ngan Le, Zhigang Deng, Rishi Agrawal, Carol C. Wu, Hien Van Nguyen

Abstract: Predicting human gaze behavior within computer vision is integral for develo** interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models… ▽ More Predicting human gaze behavior within computer vision is integral for develo** interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models to medical imaging for scanpath prediction remains unexplored. Our proposed system aims to predict eye gaze sequences from radiology reports and CXR images, potentially streamlining data collection and enhancing AI systems using larger datasets. However, predicting human scanpaths on medical images presents unique challenges due to the diverse nature of abnormal regions. Our model predicts fixation coordinates and durations critical for medical scanpath prediction, outperforming existing models in the computer vision community. Utilizing a two-stage training process and large publicly available datasets, our approach generates static heatmaps and eye gaze videos aligned with radiology reports, facilitating comprehensive analysis. We validate our approach by comparing its performance with state-of-the-art methods and assessing its generalizability among different radiologists, introducing novel strategies to model radiologists' search patterns during CXR image diagnosis. Based on the radiologist's evaluation, MedGaze can generate human-like gaze sequences with a high focus on relevant regions over the CXR images. It sometimes also outperforms humans in terms of redundancy and randomness in the scanpaths. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Submitted to the Journal

arXiv:2406.19686 [pdf]

Enhancing Radiological Diagnosis: A Collaborative Approach Integrating AI and Human Expertise for Visual Miss Correction

Authors: Akash Awasthi, Ngan Le, Zhigang Deng, Carol C. Wu, Hien Van Nguyen

Abstract: Human-AI collaboration to identify and correct perceptual errors in chest radiographs has not been previously explored. This study aimed to develop a collaborative AI system, CoRaX, which integrates eye gaze data and radiology reports to enhance diagnostic accuracy in chest radiology by pinpointing perceptual errors and refining the decision-making process. Using public datasets REFLACX and EGD-CX… ▽ More Human-AI collaboration to identify and correct perceptual errors in chest radiographs has not been previously explored. This study aimed to develop a collaborative AI system, CoRaX, which integrates eye gaze data and radiology reports to enhance diagnostic accuracy in chest radiology by pinpointing perceptual errors and refining the decision-making process. Using public datasets REFLACX and EGD-CXR, the study retrospectively developed CoRaX, employing a large multimodal model to analyze image embeddings, eye gaze data, and radiology reports. The system's effectiveness was evaluated based on its referral-making process, the quality of referrals, and performance in collaborative diagnostic settings. CoRaX was tested on a simulated error dataset of 271 samples with 28% (93 of 332) missed abnormalities. The system corrected 21% (71 of 332) of these errors, leaving 7% (22 of 312) unresolved. The Referral-Usefulness score, indicating the accuracy of predicted regions for all true referrals, was 0.63 (95% CI 0.59, 0.68). The Total-Usefulness score, reflecting the diagnostic accuracy of CoRaX's interactions with radiologists, showed that 84% (237 of 280) of these interactions had a score above 0.40. In conclusion, CoRaX efficiently collaborates with radiologists to address perceptual errors across various abnormalities, with potential applications in the education and training of novice radiologists. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Under Review in Journal

arXiv:2405.07994 [pdf]

BubbleID: A Deep Learning Framework for Bubble Interface Dynamics Analysis

Authors: Christy Dunlap, Changgen Li, Hari Pandey, Ngan Le, Han Hu

Abstract: This paper presents BubbleID, a sophisticated deep learning architecture designed to comprehensively identify both static and dynamic attributes of bubbles within sequences of boiling images. By amalgamating segmentation powered by Mask R-CNN with SORT-based tracking techniques, the framework is capable of analyzing each bubble's location, dimensions, interface shape, and velocity over its lifetim… ▽ More This paper presents BubbleID, a sophisticated deep learning architecture designed to comprehensively identify both static and dynamic attributes of bubbles within sequences of boiling images. By amalgamating segmentation powered by Mask R-CNN with SORT-based tracking techniques, the framework is capable of analyzing each bubble's location, dimensions, interface shape, and velocity over its lifetime, and capturing dynamic events such as bubble departure. BubbleID is trained and tested on boiling images across diverse heater surfaces and operational settings. This paper also offers a comparative analysis of bubble interface dynamics prior to and post-critical heat flux (CHF) conditions. △ Less

Submitted 20 March, 2024; originally announced May 2024.

Comments: 16 pages, 4 figures

arXiv:2403.10735 [pdf, other]

Time-Robust Path Planning with Piece-Wise Linear Trajectory for Signal Temporal Logic Specifications

Authors: Nhan-Khanh Le, Erfaun Noorani, Sandra Hirche, John Baras

Abstract: Real-world scenarios are characterized by timing uncertainties, e.g., delays, and disturbances. Algorithms with temporal robustness are crucial in guaranteeing the successful execution of tasks and missions in such scenarios. We study time-robust path planning for synthesizing robots' trajectories that adhere to spatial-temporal specifications expressed in Signal Temporal Logic (STL). In contrast… ▽ More Real-world scenarios are characterized by timing uncertainties, e.g., delays, and disturbances. Algorithms with temporal robustness are crucial in guaranteeing the successful execution of tasks and missions in such scenarios. We study time-robust path planning for synthesizing robots' trajectories that adhere to spatial-temporal specifications expressed in Signal Temporal Logic (STL). In contrast to prior approaches that rely on {discretize}d trajectories with fixed time steps, we leverage Piece-Wise Linear (PWL) signals for the synthesis. PWL signals represent a trajectory through a sequence of time-stamped waypoints. This allows us to encode the STL formula into a Mixed-Integer Linear Program (MILP) with fewer variables. This reduction is more pronounced for specifications with a long planning horizon. To that end, we define time-robustness for PWL signals. Subsequently, we propose quantitative semantics for PWL signals according to the recursive syntax of STL and prove their soundness. We then propose an encoding strategy to transform our semantics into a MILP. Our simulations showcase the soundness and the performance of our algorithm. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2401.10761 [pdf, other]

NN-VVC: Versatile Video Coding boosted by self-supervisedly learned image coding for machines

Authors: Jukka I. Ahonen, Nam Le, Honglei Zhang, Antti Hallapuro, Francesco Cricri, Hamed Rezazadegan Tavakoli, Miska M. Hannuksela, Esa Rahtu

Abstract: The recent progress in artificial intelligence has led to an ever-increasing usage of images and videos by machine analysis algorithms, mainly neural networks. Nonetheless, compression, storage and transmission of media have traditionally been designed considering human beings as the viewers of the content. Recent research on image and video coding for machine analysis has progressed mainly in two… ▽ More The recent progress in artificial intelligence has led to an ever-increasing usage of images and videos by machine analysis algorithms, mainly neural networks. Nonetheless, compression, storage and transmission of media have traditionally been designed considering human beings as the viewers of the content. Recent research on image and video coding for machine analysis has progressed mainly in two almost orthogonal directions. The first is represented by end-to-end (E2E) learned codecs which, while offering high performance on image coding, are not yet on par with state-of-the-art conventional video codecs and lack interoperability. The second direction considers using the Versatile Video Coding (VVC) standard or any other conventional video codec (CVC) together with pre- and post-processing operations targeting machine analysis. While the CVC-based methods benefit from interoperability and broad hardware and software support, the machine task performance is often lower than the desired level, particularly in low bitrates. This paper proposes a hybrid codec for machines called NN-VVC, which combines the advantages of an E2E-learned image codec and a CVC to achieve high performance in both image and video coding for machines. Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bjøntegaard Delta rate reduction over VVC for image and video data, respectively, when evaluated on multiple different datasets and machine vision tasks. To the best of our knowledge, this is the first research paper showing a hybrid video codec that outperforms VVC on multiple datasets and multiple machine vision tasks. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: ISM 2023 Best paper award winner version

arXiv:2401.10732 [pdf, other]

doi 10.1109/ICIP46576.2022.9897916

Bridging the gap between image coding for machines and humans

Authors: Nam Le, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed Rezazadegan Tavakoli, Emre Aksu, Miska M. Hannuksela, Esa Rahtu

Abstract: Image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy. In many use cases, such as surveillance, it is also important that the visual quality is not drastically deteriorated by the compression process. Recent works on using neural network (NN) based ICM codecs have shown significant coding gains agai… ▽ More Image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy. In many use cases, such as surveillance, it is also important that the visual quality is not drastically deteriorated by the compression process. Recent works on using neural network (NN) based ICM codecs have shown significant coding gains against traditional methods; however, the decompressed images, especially at low bitrates, often contain checkerboard artifacts. We propose an effective decoder finetuning scheme based on adversarial training to significantly enhance the visual quality of ICM codecs, while preserving the machine analysis accuracy, without adding extra bitcost or parameters at the inference phase. The results show complete removal of the checkerboard artifacts at the negligible cost of -1.6% relative change in task performance score. In the cases where some amount of artifacts is tolerable, such as when machine consumption is the primary target, this technique can enhance both pixel-fidelity and feature-fidelity scores without losing task performance. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Journal ref: IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 3411-3415

arXiv:2312.10187 [pdf, other]

TSRNet: Simple Framework for Real-time ECG Anomaly Detection with Multimodal Time and Spectrogram Restoration Network

Authors: Nhat-Tan Bui, Dinh-Hieu Hoang, Thinh Phan, Minh-Triet Tran, Brijesh Patel, Donald Adjeroh, Ngan Le

Abstract: The electrocardiogram (ECG) is a valuable signal used to assess various aspects of heart health, such as heart rate and rhythm. It plays a crucial role in identifying cardiac conditions and detecting anomalies in ECG data. However, distinguishing between normal and abnormal ECG signals can be a challenging task. In this paper, we propose an approach that leverages anomaly detection to identify unh… ▽ More The electrocardiogram (ECG) is a valuable signal used to assess various aspects of heart health, such as heart rate and rhythm. It plays a crucial role in identifying cardiac conditions and detecting anomalies in ECG data. However, distinguishing between normal and abnormal ECG signals can be a challenging task. In this paper, we propose an approach that leverages anomaly detection to identify unhealthy conditions using solely normal ECG data for training. Furthermore, to enhance the information available and build a robust system, we suggest considering both the time series and time-frequency domain aspects of the ECG signal. As a result, we introduce a specialized network called the Multimodal Time and Spectrogram Restoration Network (TSRNet) designed specifically for detecting anomalies in ECG signals. TSRNet falls into the category of restoration-based anomaly detection and draws inspiration from both the time series and spectrogram domains. By extracting representations from both domains, TSRNet effectively captures the comprehensive characteristics of the ECG signal. This approach enables the network to learn robust representations with superior discrimination abilities, allowing it to distinguish between normal and abnormal ECG patterns more effectively. Furthermore, we introduce a novel inference method, termed Peak-based Error, that specifically focuses on ECG peaks, a critical component in detecting abnormalities. The experimental result on the large-scale dataset PTB-XL has demonstrated the effectiveness of our approach in ECG anomaly detection, while also prioritizing efficiency by minimizing the number of trainable parameters. Our code is available at https://github.com/UARK-AICV/TSRNet. △ Less

Submitted 5 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted at ISBI 2024

arXiv:2311.11096 [pdf, other]

On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation

Authors: Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert

Abstract: Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for… ▽ More Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on develo** better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance. △ Less

Submitted 18 November, 2023; originally announced November 2023.

Comments: Advances in Neural Information Processing Systems (NeurIPS) 2023, Workshop on robustness of zero/few-shot learning in foundation models

arXiv:2309.03493 [pdf, other]

SAM3D: Segment Anything Model in Volumetric Medical Images

Authors: Nhat-Tan Bui, Dinh-Hieu Hoang, Minh-Triet Tran, Gianfranco Doretto, Donald Adjeroh, Brijesh Patel, Arabinda Choudhary, Ngan Le

Abstract: Image segmentation remains a pivotal component in medical image analysis, aiding in the extraction of critical information for precise diagnostic practices. With the advent of deep learning, automated image segmentation methods have risen to prominence, showcasing exceptional proficiency in processing medical imagery. Motivated by the Segment Anything Model (SAM)-a foundational model renowned for… ▽ More Image segmentation remains a pivotal component in medical image analysis, aiding in the extraction of critical information for precise diagnostic practices. With the advent of deep learning, automated image segmentation methods have risen to prominence, showcasing exceptional proficiency in processing medical imagery. Motivated by the Segment Anything Model (SAM)-a foundational model renowned for its remarkable precision and robust generalization capabilities in segmenting 2D natural images-we introduce SAM3D, an innovative adaptation tailored for 3D volumetric medical image analysis. Unlike current SAM-based methods that segment volumetric data by converting the volume into separate 2D slices for individual analysis, our SAM3D model processes the entire 3D volume image in a unified approach. Extensive experiments are conducted on multiple medical image datasets to demonstrate that our network attains competitive results compared with other state-of-the-art methods in 3D medical segmentation tasks while being significantly efficient in terms of parameters. Code and checkpoints are available at https://github.com/UARK-AICV/SAM3D. △ Less

Submitted 5 March, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted at ISBI 2024

arXiv:2304.07693 [pdf, other]

Translating Simulation Images to X-ray Images via Multi-Scale Semantic Matching

Authors: **gxuan Kang, Tudor Jianu, Baoru Huang, Binod Bhattarai, Ngan Le, Frans Coenen, Anh Nguyen

Abstract: Endovascular intervention training is increasingly being conducted in virtual simulators. However, transferring the experience from endovascular simulators to the real world remains an open problem. The key challenge is the virtual environments are usually not realistically simulated, especially the simulation images. In this paper, we propose a new method to translate simulation images from an en… ▽ More Endovascular intervention training is increasingly being conducted in virtual simulators. However, transferring the experience from endovascular simulators to the real world remains an open problem. The key challenge is the virtual environments are usually not realistically simulated, especially the simulation images. In this paper, we propose a new method to translate simulation images from an endovascular simulator to X-ray images. Previous image-to-image translation methods often focus on visual effects and neglect structure information, which is critical for medical images. To address this gap, we propose a new method that utilizes multi-scale semantic matching. We apply self-domain semantic matching to ensure that the input image and the generated image have the same positional semantic relationships. We further apply cross-domain matching to eliminate the effects of different styles. The intensive experiment shows that our method generates realistic X-ray images and outperforms other state-of-the-art approaches by a large margin. We also collect a new large-scale dataset to serve as the new benchmark for this task. Our source code and dataset will be made publicly available. △ Less

Submitted 16 April, 2023; originally announced April 2023.

Comments: 11 pages

arXiv:2304.05723 [pdf, ps, other]

Distributed Coverage Control of Constrained Constant-Speed Unicycle Multi-Agent Systems

Authors: Qingchen Liu, Zengjie Zhang, Nhan Khanh Le, Jiahu Qin, Fangzhou Liu, Sandra Hirche

Abstract: This paper proposes a novel distributed coverage controller for a multi-agent system with constant-speed unicycle robots (CSUR). The work is motivated by the limitation of the conventional method that does not ensure the satisfaction of hard state- and input-dependent constraints and leads to feasibility issues for multi-CSUR systems. In this paper, we solve these problems by designing a novel cov… ▽ More This paper proposes a novel distributed coverage controller for a multi-agent system with constant-speed unicycle robots (CSUR). The work is motivated by the limitation of the conventional method that does not ensure the satisfaction of hard state- and input-dependent constraints and leads to feasibility issues for multi-CSUR systems. In this paper, we solve these problems by designing a novel coverage cost function and a saturated gradient-search-based control law. Invariant set theory and Lyapunov-based techniques are used to prove the state-dependent confinement and the convergence of the system state to the optimal coverage configuration, respectively. The controller is implemented in a distributed manner based on a novel communication standard among the agents. A series of simulation case studies are conducted to validate the effectiveness of the proposed coverage controller in different initial conditions and with control parameters. A comparison study in simulation reveals the advantage of the proposed method in terms of avoiding infeasibility. The experiment study verifies the applicability of the method to real robots with uncertainties. The development procedure of the method from theoretical analysis to experimental validation provides a novel framework for multi-agent system coordinate control with complex agent dynamics. △ Less

Submitted 14 March, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

arXiv:2303.12337 [pdf, other]

Music-Driven Group Choreography

Authors: Nhat Le, Thang Pham, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen

Abstract: Music-driven choreography is a challenging problem with a wide variety of industrial applications. Recently, many methods have been proposed to synthesize dance motions from music for a single dancer. However, generating dance motion for a group remains an open problem. In this paper, we present $\rm AIOZ-GDANCE$, a new large-scale dataset for music-driven group dance generation. Unlike existing d… ▽ More Music-driven choreography is a challenging problem with a wide variety of industrial applications. Recently, many methods have been proposed to synthesize dance motions from music for a single dancer. However, generating dance motion for a group remains an open problem. In this paper, we present $\rm AIOZ-GDANCE$, a new large-scale dataset for music-driven group dance generation. Unlike existing datasets that only support single dance, our new dataset contains group dance videos, hence supporting the study of group choreography. We propose a semi-autonomous labeling method with humans in the loop to obtain the 3D ground truth for our dataset. The proposed dataset consists of 16.7 hours of paired music and 3D motion from in-the-wild videos, covering 7 dance styles and 16 music genres. We show that naively applying single dance generation technique to creating group dance motion may lead to unsatisfactory results, such as inconsistent movements and collisions between dancers. Based on our new dataset, we propose a new method that takes an input music sequence and a set of 3D positions of dancers to efficiently produce multiple group-coherent choreographies. We propose new evaluation metrics for measuring group dance quality and perform intensive experiments to demonstrate the effectiveness of our method. Our project facilitates future research on group dance generation and is available at: https://aioz-ai.github.io/AIOZ-GDANCE/ △ Less

Submitted 26 March, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: accepted in CVPR 2023

arXiv:2302.02547 [pdf, other]

A Quantum Neural Network Regression for Modeling Lithium-ion Battery Capacity Degradation

Authors: Anh Phuong Ngo, Nhat Le, Hieu T. Nguyen, Abdullah Eroglu, Duong T. Nguyen

Abstract: Given the high power density low discharge rate and decreasing cost rechargeable lithium-ion batteries LiBs have found a wide range of applications such as power grid level storage systems electric vehicles and mobile devices. Develo** a framework to accurately model the nonlinear degradation process of LiBs which is indeed a supervised learning problem becomes an important research topic. This… ▽ More Given the high power density low discharge rate and decreasing cost rechargeable lithium-ion batteries LiBs have found a wide range of applications such as power grid level storage systems electric vehicles and mobile devices. Develo** a framework to accurately model the nonlinear degradation process of LiBs which is indeed a supervised learning problem becomes an important research topic. This paper presents a classical-quantum hybrid machine learning approach to capture the LiB degradation model that assesses battery cell life loss from operating profiles. Our work is motivated by recent advances in quantum computers as well as the similarity between neural networks and quantum circuits. Similar to adjusting weight parameters in conventional neural networks the parameters of the quantum circuit namely the qubits degree of freedom can be tuned to learn a nonlinear function in a supervised learning fashion. As a proof of concept paper our obtained numerical results with the battery dataset provided by NASA demonstrate the ability of the quantum neural networks in modeling the nonlinear relationship between the degraded capacity and the operating cycles. We also discuss the potential advantage of the quantum approach compared to conventional neural networks in classical computers in dealing with massive data especially in the context of future penetration of EVs and energy storage. △ Less

Submitted 5 February, 2023; originally announced February 2023.

Comments: Accepted for 2023 IEEE Green Technology Conference, Denver, Colorado, USA

arXiv:2301.06268 [pdf, other]

Analyze the Effects of COVID-19 on Energy Storage Systems: A Techno-Economic Approach

Authors: Nhat Le, Alexis Plasencia Leos, Juan Henriquez, Anh Phuong Ngo, Hieu T. Nguyen

Abstract: During the COVID-19 pandemic, the U.S. power sector witnessed remarkable electricity demand changes in many geographical regions. these changes were evident in population-dense cities. This paper incorporates a techno-economic analysis of energy storage systems to investigate the pandemic's influence on ESS development, In particular, we employ a linear program-based revenue maximization model to… ▽ More During the COVID-19 pandemic, the U.S. power sector witnessed remarkable electricity demand changes in many geographical regions. these changes were evident in population-dense cities. This paper incorporates a techno-economic analysis of energy storage systems to investigate the pandemic's influence on ESS development, In particular, we employ a linear program-based revenue maximization model to capture the revenues of ESS from participating in the electricity market, by performing arbitrage on energy trading, and regulation market, by providing regulation services to stabilize the grid's frequency. We consider five dominant energy storage technologies in the U.S., namely, Lithium-ion, Advanced Lead Acid, Flywheel, Vanadium Redox Flow, and Lithium-Iron Phosphate storage technologies. Extensive numerical results conducted on the case of New York City allow us to highlight the negative impact that COVID-19 had on the NYC power sector. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2210.06297 [pdf, other]

Multimodality Multi-Lead ECG Arrhythmia Classification using Self-Supervised Learning

Authors: Thinh Phan, Duc Le, Patel Brijesh, Donald Adjeroh, **gxian Wu, Morten Olgaard Jensen, Ngan Le

Abstract: Electrocardiogram (ECG) signal is one of the most effective sources of information mainly employed for the diagnosis and prediction of cardiovascular diseases (CVDs) connected with the abnormalities in heart rhythm. Clearly, single modality ECG (i.e. time series) cannot convey its complete characteristics, thus, exploiting both time and time-frequency modalities in the form of time-series data and… ▽ More Electrocardiogram (ECG) signal is one of the most effective sources of information mainly employed for the diagnosis and prediction of cardiovascular diseases (CVDs) connected with the abnormalities in heart rhythm. Clearly, single modality ECG (i.e. time series) cannot convey its complete characteristics, thus, exploiting both time and time-frequency modalities in the form of time-series data and spectrogram is needed. Leveraging the cutting-edge self-supervised learning (SSL) technique on unlabeled data, we propose SSL-based multimodality ECG classification. Our proposed network follows SSL learning paradigm and consists of two modules corresponding to pre-stream task, and down-stream task, respectively. In the SSL-pre-stream task, we utilize self-knowledge distillation (KD) techniques with no labeled data, on various transformations and in both time and frequency domains. In the down-stream task, which is trained on labeled data, we propose a gate fusion mechanism to fuse information from multimodality.To evaluate the effectiveness of our approach, ten-fold cross validation on the 12-lead PhysioNet 2020 dataset has been conducted. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2203.08965 [pdf, other]

3D-UCaps: 3D Capsules Unet for Volumetric Image Segmentation

Authors: Tan Nguyen, Binh-Son Hua, Ngan Le

Abstract: Medical image segmentation has been so far achieving promising results with Convolutional Neural Networks (CNNs). However, it is arguable that in traditional CNNs, its pooling layer tends to discard important information such as positions. Moreover, CNNs are sensitive to rotation and affine transformation. Capsule network is a data-efficient network design proposed to overcome such limitations by… ▽ More Medical image segmentation has been so far achieving promising results with Convolutional Neural Networks (CNNs). However, it is arguable that in traditional CNNs, its pooling layer tends to discard important information such as positions. Moreover, CNNs are sensitive to rotation and affine transformation. Capsule network is a data-efficient network design proposed to overcome such limitations by replacing pooling layers with dynamic routing and convolutional strides, which aims to preserve the part-whole relationships. Capsule network has shown a great performance in image recognition and natural language processing, but applications for medical image segmentation, particularly volumetric image segmentation, has been limited. In this work, we propose 3D-UCaps, a 3D voxel-based Capsule network for medical volumetric image segmentation. We build the concept of capsules into a CNN by designing a network with two pathways: the first pathway is encoded by 3D Capsule blocks, whereas the second pathway is decoded by 3D CNNs blocks. 3D-UCaps, therefore inherits the merits from both Capsule network to preserve the spatial relationship and CNNs to learn visual representation. We conducted experiments on various datasets to demonstrate the robustness of 3D-UCaps including iSeg-2017, LUNA16, Hippocampus, and Cardiac, where our method outperforms previous Capsule networks and 3D-Unets. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Accepted in MICCAI 2021

arXiv:2203.08951 [pdf, other]

Meta-Learning of NAS for Few-shot Learning in Medical Image Applications

Authors: Viet-Khoa Vo-Ho, Kashu Yamazaki, Hieu Hoang, Minh-Triet Tran, Ngan Le

Abstract: Deep learning methods have been successful in solving tasks in machine learning and have made breakthroughs in many sectors owing to their ability to automatically extract features from unstructured data. However, their performance relies on manual trial-and-error processes for selecting an appropriate network architecture, hyperparameters for training, and pre-/post-procedures. Even though it has… ▽ More Deep learning methods have been successful in solving tasks in machine learning and have made breakthroughs in many sectors owing to their ability to automatically extract features from unstructured data. However, their performance relies on manual trial-and-error processes for selecting an appropriate network architecture, hyperparameters for training, and pre-/post-procedures. Even though it has been shown that network architecture plays a critical role in learning feature representation feature from data and the final performance, searching for the best network architecture is computationally intensive and heavily relies on researchers' experience. Automated machine learning (AutoML) and its advanced techniques i.e. Neural Architecture Search (NAS) have been promoted to address those limitations. Not only in general computer vision tasks, but NAS has also motivated various applications in multiple areas including medical imaging. In medical imaging, NAS has significant progress in improving the accuracy of image classification, segmentation, reconstruction, and more. However, NAS requires the availability of large annotated data, considerable computation resources, and pre-defined tasks. To address such limitations, meta-learning has been adopted in the scenarios of few-shot learning and multiple tasks. In this book chapter, we first present a brief review of NAS by discussing well-known approaches in search space, search strategy, and evaluation strategy. We then introduce various NAS approaches in medical imaging with different applications such as classification, segmentation, detection, reconstruction, etc. Meta-learning in NAS for few-shot learning and multiple tasks is then explained. Finally, we describe several open problems in NAS. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: book chapter, in Meta-Learning with Medical Imaging and Health Informatics Applications

arXiv:2203.08948 [pdf, other]

CapsNet for Medical Image Segmentation

Authors: Minh Tran, Viet-Khoa Vo-Ho, Kyle Quinn, Hien Nguyen, Khoa Luu, Ngan Le

Abstract: Convolutional Neural Networks (CNNs) have been successful in solving tasks in computer vision including medical image segmentation due to their ability to automatically extract features from unstructured data. However, CNNs are sensitive to rotation and affine transformation and their success relies on huge-scale labeled datasets capturing various input variations. This network paradigm has posed… ▽ More Convolutional Neural Networks (CNNs) have been successful in solving tasks in computer vision including medical image segmentation due to their ability to automatically extract features from unstructured data. However, CNNs are sensitive to rotation and affine transformation and their success relies on huge-scale labeled datasets capturing various input variations. This network paradigm has posed challenges at scale because acquiring annotated data for medical segmentation is expensive, and strict privacy regulations. Furthermore, visual representation learning with CNNs has its own flaws, e.g., it is arguable that the pooling layer in traditional CNNs tends to discard positional information and CNNs tend to fail on input images that differ in orientations and sizes. Capsule network (CapsNet) is a recent new architecture that has achieved better robustness in representation learning by replacing pooling layers with dynamic routing and convolutional strides, which has shown potential results on popular tasks such as classification, recognition, segmentation, and natural language processing. Different from CNNs, which result in scalar outputs, CapsNet returns vector outputs, which aim to preserve the part-whole relationships. In this work, we first introduce the limitations of CNNs and fundamentals of CapsNet. We then provide recent developments of CapsNet for the task of medical image segmentation. We finally discuss various effective network architectures to implement a CapsNet for both 2D images and 3D volumetric medical image segmentation. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Deep Learning for Medical Image Analysis, Elsevier/Academic Press (accepted)

arXiv:2201.05905 [pdf, other]

SS-3DCapsNet: Self-supervised 3D Capsule Networks for Medical Segmentation on Less Labeled Data

Authors: Minh Tran, Loi Ly, Binh-Son Hua, Ngan Le

Abstract: Capsule network is a recent new deep network architecture that has been applied successfully for medical image segmentation tasks. This work extends capsule networks for volumetric medical image segmentation with self-supervised learning. To improve on the problem of weight initialization compared to previous capsule networks, we leverage self-supervised learning for capsule networks pre-training,… ▽ More Capsule network is a recent new deep network architecture that has been applied successfully for medical image segmentation tasks. This work extends capsule networks for volumetric medical image segmentation with self-supervised learning. To improve on the problem of weight initialization compared to previous capsule networks, we leverage self-supervised learning for capsule networks pre-training, where our pretext-task is optimized by self-reconstruction. Our capsule network, SS-3DCapsNet, has a UNet-based architecture with a 3D Capsule encoder and 3D CNNs decoder. Our experiments on multiple datasets including iSeg-2017, Hippocampus, and Cardiac demonstrate that our 3D capsule network with self-supervised pre-training considerably outperforms previous capsule networks and 3D-UNets. △ Less

Submitted 28 March, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

Comments: Accepted to ISBI 2022

arXiv:2112.13559 [pdf, other]

doi 10.1145/3477314.3507112

DAM-AL: Dilated Attention Mechanism with Attention Loss for 3D Infant Brain Image Segmentation

Authors: Dinh-Hieu Hoang, Gia-Han Diep, Minh-Triet Tran, Ngan T. H Le

Abstract: While Magnetic Resonance Imaging (MRI) has played an essential role in infant brain analysis, segmenting MRI into a number of tissues such as gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) is crucial and complex due to the extremely low intensity contrast between tissues at around 6-9 months of age as well as amplified noise, myelination, and incomplete volume. In this paper, w… ▽ More While Magnetic Resonance Imaging (MRI) has played an essential role in infant brain analysis, segmenting MRI into a number of tissues such as gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) is crucial and complex due to the extremely low intensity contrast between tissues at around 6-9 months of age as well as amplified noise, myelination, and incomplete volume. In this paper, we tackle those limitations by develo** a new deep learning model, named DAM-AL, which contains two main contributions, i.e., dilated attention mechanism and hard-case attention loss. Our DAM-AL network is designed with skip block layers and atrous block convolution. It contains both channel-wise attention at high-level context features and spatial attention at low-level spatial structural features. Our attention loss consists of two terms corresponding to region information and hard samples attention. Our proposed DAM-AL has been evaluated on the infant brain iSeg 2017 dataset and the experiments have been conducted on both validation and testing sets. We have benchmarked DAM-AL on Dice coefficient and ASD metrics and compared it with state-of-the-art methods. △ Less

Submitted 27 December, 2021; originally announced December 2021.

arXiv:2108.09993 [pdf, other]

doi 10.1109/ICASSP39728.2021.9414465

Image coding for machines: an end-to-end learned approach

Authors: Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Esa Rahtu

Abstract: Over recent years, deep learning-based computer vision systems have been applied to images at an ever-increasing pace, oftentimes representing the only type of consumption for those images. Given the dramatic explosion in the number of images generated per day, a question arises: how much better would an image codec targeting machine-consumption perform against state-of-the-art codecs targeting hu… ▽ More Over recent years, deep learning-based computer vision systems have been applied to images at an ever-increasing pace, oftentimes representing the only type of consumption for those images. Given the dramatic explosion in the number of images generated per day, a question arises: how much better would an image codec targeting machine-consumption perform against state-of-the-art codecs targeting human-consumption? In this paper, we propose an image codec for machines which is neural network (NN) based and end-to-end learned. In particular, we propose a set of training strategies that address the delicate problem of balancing competing loss functions, such as computer vision task losses, image distortion losses, and rate loss. Our experimental results show that our NN-based codec outperforms the state-of-the-art Versa-tile Video Coding (VVC) standard on the object detection and instance segmentation tasks, achieving -37.87% and -32.90% of BD-rate gain, respectively, while being fast thanks to its compact size. To the best of our knowledge, this is the first end-to-end learned machine-targeted image codec. △ Less

Submitted 30 August, 2021; v1 submitted 23 August, 2021; originally announced August 2021.

Comments: Fixed a couple of mistakes since the version accepted in IEEE ICASSP2021

Journal ref: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2021), 2021, pp. 1590-1594

arXiv:2108.09992 [pdf, other]

doi 10.1109/ICME51207.2021.9428224

Learned Image Coding for Machines: A Content-Adaptive Approach

Authors: Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Hamed Rezazadegan Tavakoli, Esa Rahtu

Abstract: Today, according to the Cisco Annual Internet Report (2018-2023), the fastest-growing category of Internet traffic is machine-to-machine communication. In particular, machine-to-machine communication of images and videos represents a new challenge and opens up new perspectives in the context of data compression. One possible solution approach consists of adapting current human-targeted image and v… ▽ More Today, according to the Cisco Annual Internet Report (2018-2023), the fastest-growing category of Internet traffic is machine-to-machine communication. In particular, machine-to-machine communication of images and videos represents a new challenge and opens up new perspectives in the context of data compression. One possible solution approach consists of adapting current human-targeted image and video coding standards to the use case of machine consumption. Another approach consists of develo** completely new compression paradigms and architectures for machine-to-machine communications. In this paper, we focus on image compression and present an inference-time content-adaptive finetuning scheme that optimizes the latent representation of an end-to-end learned image codec, aimed at improving the compression efficiency for machine-consumption. The conducted experiments show that our online finetuning brings an average bitrate saving (BD-rate) of -3.66% with respect to our pretrained image codec. In particular, at low bitrate points, our proposed method results in a significant bitrate saving of -9.85%. Overall, our pretrained-and-then-finetuned system achieves -30.54% BD-rate over the state-of-the-art image/video codec Versatile Video Coding (VVC). △ Less

Submitted 13 October, 2021; v1 submitted 23 August, 2021; originally announced August 2021.

Comments: Fig 4 correction

Journal ref: 2021 IEEE International Conference on Multimedia and Expo (ICME), 2021, pp. 1-6

arXiv:2108.03256 [pdf, other]

The Right to Talk: An Audio-Visual Transformer Approach

Authors: Thanh-Dat Truong, Chi Nhan Duong, The De Vu, Hoang Anh Pham, Bhiksha Raj, Ngan Le, Khoa Luu

Abstract: Turn-taking has played an essential role in structuring the regulation of a conversation. The task of identifying the main speaker (who is properly taking his/her turn of speaking) and the interrupters (who are interrupting or reacting to the main speaker's utterances) remains a challenging task. Although some prior methods have partially addressed this task, there still remain some limitations. F… ▽ More Turn-taking has played an essential role in structuring the regulation of a conversation. The task of identifying the main speaker (who is properly taking his/her turn of speaking) and the interrupters (who are interrupting or reacting to the main speaker's utterances) remains a challenging task. Although some prior methods have partially addressed this task, there still remain some limitations. Firstly, a direct association of Audio and Visual features may limit the correlations to be extracted due to different modalities. Secondly, the relationship across temporal segments hel** to maintain the consistency of localization, separation, and conversation contexts is not effectively exploited. Finally, the interactions between speakers that usually contain the tracking and anticipatory decisions about the transition to a new speaker are usually ignored. Therefore, this work introduces a new Audio-Visual Transformer approach to the problem of localization and highlighting the main speaker in both audio and visual channels of a multi-speaker conversation video in the wild. The proposed method exploits different types of correlations presented in both visual and audio signals. The temporal audio-visual relationships across spatial-temporal space are anticipated and optimized via the self-attention mechanism in a Transformerstructure. Moreover, a newly collected dataset is introduced for the main speaker detection. To the best of our knowledge, it is one of the first studies that is able to automatically localize and highlight the main speaker in both visual and audio channels in multi-speaker conversation videos. △ Less

Submitted 6 August, 2021; originally announced August 2021.

Comments: Accepted to ICCV 2021

arXiv:2103.12350 [pdf, other]

Roughness Index and Roughness Distance for Benchmarking Medical Segmentation

Authors: Vidhiwar Singh Rathour, Kashu Yamakazi, T. Hoang Ngan Le

Abstract: Medical image segmentation is one of the most challenging tasks in medical image analysis and has been widely developed for many clinical applications. Most of the existing metrics have been first designed for natural images and then extended to medical images. While object surface plays an important role in medical segmentation and quantitative analysis i.e. analyze brain tumor surface, measure g… ▽ More Medical image segmentation is one of the most challenging tasks in medical image analysis and has been widely developed for many clinical applications. Most of the existing metrics have been first designed for natural images and then extended to medical images. While object surface plays an important role in medical segmentation and quantitative analysis i.e. analyze brain tumor surface, measure gray matter volume, most of the existing metrics are limited when it comes to analyzing the object surface, especially to tell about surface smoothness or roughness of a given volumetric object or to analyze the topological errors. In this paper, we first analysis both pros and cons of all existing medical image segmentation metrics, specially on volumetric data. We then propose an appropriate roughness index and roughness distance for medical image segmentation analysis and evaluation. Our proposed method addresses two kinds of segmentation errors, i.e. (i)topological errors on boundary/surface and (ii)irregularities on the boundary/surface. The contribution of this work is four-fold: (i) detect irregular spikes/holes on a surface, (ii) propose roughness index to measure surface roughness of a given object, (iii) propose a roughness distance to measure the distance of two boundaries/surfaces by utilizing the proposed roughness index and (iv) suggest an algorithm which helps to remove the irregular spikes/holes to smooth the surface. Our proposed roughness index and roughness distance are built upon the solid surface roughness parameter which has been successfully developed in the civil engineering. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: Paper has been accepted at BIOIMAGING2021

arXiv:2103.09042 [pdf, ps, other]

Invertible Residual Network with Regularization for Effective Medical Image Segmentation

Authors: Kashu Yamazaki, Vidhiwar Singh Rathour, T. Hoang Ngan Le

Abstract: Deep Convolutional Neural Networks (CNNs) i.e. Residual Networks (ResNets) have been used successfully for many computer vision tasks, but are difficult to scale to 3D volumetric medical data. Memory is increasingly often the bottleneck when training 3D Convolutional Neural Networks (CNNs). Recently, invertible neural networks have been applied to significantly reduce activation memory footprint w… ▽ More Deep Convolutional Neural Networks (CNNs) i.e. Residual Networks (ResNets) have been used successfully for many computer vision tasks, but are difficult to scale to 3D volumetric medical data. Memory is increasingly often the bottleneck when training 3D Convolutional Neural Networks (CNNs). Recently, invertible neural networks have been applied to significantly reduce activation memory footprint when training neural networks with backpropagation thanks to the invertible functions that allow retrieving its input from its output without storing intermediate activations in memory to perform the backpropagation. Among many successful network architectures, 3D Unet has been established as a standard architecture for volumetric medical segmentation. Thus, we choose 3D Unet as a baseline for a non-invertible network and we then extend it with the invertible residual network. In this paper, we proposed two versions of the invertible Residual Network, namely Partially Invertible Residual Network (Partially-InvRes) and Fully Invertible Residual Network (Fully-InvRes). In Partially-InvRes, the invertible residual layer is defined by a technique called additive coupling whereas in Fully-InvRes, both invertible upsampling and downsampling operations are learned based on squeezing (known as pixel shuffle). Furthermore, to avoid the overfitting problem because of less training data, a variational auto-encoder (VAE) branch is added to reconstruct the input volumetric data itself. Our results indicate that by using partially/fully invertible networks as the central workhorse in volumetric segmentation, we not only reduce memory overhead but also achieve compatible segmentation performance compared against the non-invertible 3D Unet. We have demonstrated the proposed networks on various volumetric datasets such as iSeg 2019 and BraTS 2020. △ Less

Submitted 16 March, 2021; originally announced March 2021.

arXiv:2103.05115 [pdf, other]

Deep reinforcement learning in medical imaging: A literature review

Authors: S. Kevin Zhou, Hoang Ngan Le, Khoa Luu, Hien V. Nguyen, Nicholas Ayache

Abstract: Deep reinforcement learning (DRL) augments the reinforcement learning framework, which learns a sequence of actions that maximizes the expected reward, with the representative power of deep neural networks. Recent works have demonstrated the great potential of DRL in medicine and healthcare. This paper presents a literature review of DRL in medical imaging. We start with a comprehensive tutorial o… ▽ More Deep reinforcement learning (DRL) augments the reinforcement learning framework, which learns a sequence of actions that maximizes the expected reward, with the representative power of deep neural networks. Recent works have demonstrated the great potential of DRL in medicine and healthcare. This paper presents a literature review of DRL in medical imaging. We start with a comprehensive tutorial of DRL, including the latest model-free and model-based algorithms. We then cover existing DRL applications for medical imaging, which are roughly divided into three main categories: (I) parametric medical image analysis tasks including landmark detection, object/lesion detection, registration, and view plane localization; (ii) solving optimization tasks including hyperparameter tuning, selecting augmentation strategies, and neural architecture search; and (iii) miscellaneous applications including surgical gesture segmentation, personalized mobile health intervention, and computational model personalization. The paper concludes with discussions of future perspectives. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: 39 pages, 20 figures

arXiv:2012.11736 [pdf, ps, other]

Energy Efficiency Maximization in RIS-Aided Cell-Free Network with Limited Backhaul

Authors: Quang Nhat Le, Van-Dinh Nguyen, Octavia A. Dobre, Ruiqin Zhao

Abstract: Integrating the reconfigurable intelligent surface in a cell-free (RIS-CF) network is an effective solution to improve the capacity and coverage of future wireless systems with low cost and power consumption. The reflecting coefficients of RISs can be programmed to enhance signals received at users. This letter addresses a joint design of transmit beamformers at access points and reflecting coeffi… ▽ More Integrating the reconfigurable intelligent surface in a cell-free (RIS-CF) network is an effective solution to improve the capacity and coverage of future wireless systems with low cost and power consumption. The reflecting coefficients of RISs can be programmed to enhance signals received at users. This letter addresses a joint design of transmit beamformers at access points and reflecting coefficients at RISs to maximize the energy efficiency (EE) of RIS-CF networks, taking into account the limited backhaul capacity constraints. Due to a very computationally challenging nonconvex problem, we develop a simple yet efficient alternating descent algorithm for its solution. Numerical results verify that the EE of RIS-CF networks is greatly improved, showing the benefit of using RISs. △ Less

Submitted 8 March, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: submitted for possible publication

arXiv:2012.02463 [pdf, other]

Offset Curves Loss for Imbalanced Problem in Medical Segmentation

Authors: Ngan Le, Trung Le, Kashu Yamazaki, Toan Duc Bui, Khoa Luu, Marios Savides

Abstract: Medical image segmentation has played an important role in medical analysis and widely developed for many clinical applications. Deep learning-based approaches have achieved high performance in semantic segmentation but they are limited to pixel-wise setting and imbalanced classes data problem. In this paper, we tackle those limitations by develo** a new deep learning-based model which takes int… ▽ More Medical image segmentation has played an important role in medical analysis and widely developed for many clinical applications. Deep learning-based approaches have achieved high performance in semantic segmentation but they are limited to pixel-wise setting and imbalanced classes data problem. In this paper, we tackle those limitations by develo** a new deep learning-based model which takes into account both higher feature level i.e. region inside contour, intermediate feature level i.e. offset curves around the contour and lower feature level i.e. contour. Our proposed Offset Curves (OsC) loss consists of three main fitting terms. The first fitting term focuses on pixel-wise level segmentation whereas the second fitting term acts as attention model which pays attention to the area around the boundaries (offset curves). The third terms plays a role as regularization term which takes the length of boundaries into account. We evaluate our proposed OsC loss on both 2D network and 3D network. Two common medical datasets, i.e. retina DRIVE and brain tumor BRATS 2018 datasets are used to benchmark our proposed loss performance. The experiments have shown that our proposed OsC loss function outperforms other mainstream loss functions such as Cross-Entropy, Dice, Focal on the most common segmentation networks Unet, FCN. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: ICPR 2020

arXiv:2012.01777 [pdf, other]

Flow-based Deformation Guidance for Unpaired Multi-Contrast MRI Image-to-Image Translation

Authors: Toan Duc Bui, Manh Nguyen, Ngan Le, Khoa Luu

Abstract: Image synthesis from corrupted contrasts increases the diversity of diagnostic information available for many neurological diseases. Recently the image-to-image translation has experienced significant levels of interest within medical research, beginning with the successful use of the Generative Adversarial Network (GAN) to the introduction of cyclic constraint extended to multiple domains. Howeve… ▽ More Image synthesis from corrupted contrasts increases the diversity of diagnostic information available for many neurological diseases. Recently the image-to-image translation has experienced significant levels of interest within medical research, beginning with the successful use of the Generative Adversarial Network (GAN) to the introduction of cyclic constraint extended to multiple domains. However, in current approaches, there is no guarantee that the map** between the two image domains would be unique or one-to-one. In this paper, we introduce a novel approach to unpaired image-to-image translation based on the invertible architecture. The invertible property of the flow-based architecture assures a cycle-consistency of image-to-image translation without additional loss functions. We utilize the temporal information between consecutive slices to provide more constraints to the optimization for transforming one domain to another in unpaired volumetric medical images. To capture temporal structures in the medical images, we explore the displacement between the consecutive slices using a deformation field. In our approach, the deformation field is used as a guidance to keep the translated slides realistic and consistent across the translation. The experimental results have shown that the synthesized images using our proposed approach are able to archive a competitive performance in terms of mean squared error, peak signal-to-noise ratio, and structural similarity index when compared with the existing deep learning-based methods on three standard datasets, i.e. HCP, MRBrainS13, and Brats2019. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: Medical Image Computing and Computer Assisted Interventions

arXiv:2011.10133 [pdf, ps, other]

doi 10.1109/TGCN.2020.3036026

Full-Duplex Non-Orthogonal Multiple Access Cooperative Overlay Spectrum-Sharing Networks with SWIPT

Authors: Quang Nhat Le, Animesh Yadav, Nam-Phong Nguyen, Octavia A. Dobre, Ruiqin Zhao

Abstract: This paper proposes a novel non-orthogonal multiple access (NOMA) assisted cooperative spectrum sharing network, in which one of the full-duplex (FD) secondary transmitters (STs) is chosen among many for forwarding the primary transmitter's and its own information to primary receiver and secondary receivers, respectively, using NOMA technique. To stimulate the ST to conduct cooperative transmissio… ▽ More This paper proposes a novel non-orthogonal multiple access (NOMA) assisted cooperative spectrum sharing network, in which one of the full-duplex (FD) secondary transmitters (STs) is chosen among many for forwarding the primary transmitter's and its own information to primary receiver and secondary receivers, respectively, using NOMA technique. To stimulate the ST to conduct cooperative transmission and sustain its operations, the simultaneous wireless information and power transfer (SWIPT) technique is utilized by the ST to harvest the primary signal's energy. In order to evaluate the proposed system's performance, the outage probability and system throughput for the primary and secondary networks are derived in tight closed-form approximations. Further, the sum rate optimization problem is formulated for the proposed cooperative network and a rapid convergent iterative algorithm is proposed to obtain the optimized power allocation coefficients. Numerical results show that FD, SWIPT, and NOMA techniques greatly boost the performance of cooperative spectrum-sharing network in terms of outage probability, system throughput, and sum rate compared to that of half-duplex NOMA and the conventional orthogonal multiple access-time division multiple access networks. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: accepted for publication in the IEEE Transactions on Green Communications and Networking

arXiv:2011.07549 [pdf, ps, other]

Learning-Assisted User Clustering in Cell-Free Massive MIMO-NOMA Networks

Authors: Quang Nhat Le, Van-Dinh Nguyen, Nam-Phong Nguyen, Symeon Chatzinotas, Octavia A. Dobre, Ruiqin Zhao

Abstract: The superior spectral efficiency (SE) and user fairness feature of non-orthogonal multiple access (NOMA) systems are achieved by exploiting user clustering (UC) more efficiently. However, a random UC certainly results in a suboptimal solution while an exhaustive search method comes at the cost of high complexity, especially for systems of medium-to-large size. To address this problem, we develop t… ▽ More The superior spectral efficiency (SE) and user fairness feature of non-orthogonal multiple access (NOMA) systems are achieved by exploiting user clustering (UC) more efficiently. However, a random UC certainly results in a suboptimal solution while an exhaustive search method comes at the cost of high complexity, especially for systems of medium-to-large size. To address this problem, we develop two efficient unsupervised machine learning (ML) based UC algorithms, namely k-means++ and improved k-means++, to effectively cluster users into disjoint clusters in cell-free massive multiple-input multiple-output (CFmMIMO) system. Using full-pilot zero-forcing at access points, we derive the sum SE in closed-form expression taking into account the impact of intra-cluster pilot contamination, inter-cluster interference, and imperfect successive interference cancellation. To comprehensively assess the system performance, we formulate the sum SE optimization problem, and then develop a simple yet efficient iterative algorithm for its solution. In addition, the performance of collocated massive MIMO-NOMA (COmMIMO-NOMA) system is also characterized. Numerical results are provided to show the superior performance of the proposed UC algorithms compared to other baseline schemes. The effectiveness of applying NOMA in CFmMIMO and COmMIMO systems is also validated. △ Less

Submitted 15 November, 2020; originally announced November 2020.

Comments: submitted for possible publication

arXiv:2008.06828 [pdf, other]

A novel approach to remove foreign objects from chest X-ray images

Authors: Hieu X. Le, Phuong D. Nguyen, Thang H. Nguyen, Khanh N. Q. Le, Thanh T. Nguyen

Abstract: We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, a… ▽ More We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, an object detection model is trained to separate the foreign objects from the given image. Subsequently, the binary mask of each object is extracted utilizing a segmentation model. Each pair of the binary mask and the extracted object are then used for inpainting purposes. Finally, the in-painted regions are now merged back to the original image, resulting in a clean and non-foreign-object-existing output. To conclude, we achieved state-of-the-art accuracy. The experimental results showed a new approach to the possible applications of this method for chest X-ray images detection. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: 9 pages, 7 figures, 7 tables

arXiv:1705.00615 [pdf, other]

Guided-Processing Outperforms Duty-Cycling for Energy-Efficient Systems

Authors: Long N. Le, Douglas L. Jones

Abstract: Energy-efficiency is highly desirable for sensing systems in the Internet of Things (IoT). A common approach to achieve low-power systems is duty-cycling, where components in a system are turned off periodically to meet an energy budget. However, this work shows that such an approach is not necessarily optimal in energy-efficiency, and proposes \textit{guided-processing} as a fundamentally better… ▽ More Energy-efficiency is highly desirable for sensing systems in the Internet of Things (IoT). A common approach to achieve low-power systems is duty-cycling, where components in a system are turned off periodically to meet an energy budget. However, this work shows that such an approach is not necessarily optimal in energy-efficiency, and proposes \textit{guided-processing} as a fundamentally better alternative. The proposed approach offers 1) explicit modeling of performance uncertainties in system internals, 2) a realistic resource consumption model, and 3) a key insight into the superiority of guided-processing over duty-cycling. Generalization from the cascade structure to the more general graph-based one is also presented. Once applied to optimize a large-scale audio sensing system with a practical detection application, empirical results show that the proposed approach significantly improves the detection performance (up to $1.7\times$ and $4\times$ reduction in false-alarm and miss rate, respectively) for the same energy consumption, when compared to the duty-cycling approach. △ Less

Submitted 1 May, 2017; originally announced May 2017.

Comments: preprint, the published version is in IEEE Transactions on Circuits and Systems I, Special Issue on Circuits and Systems for the Internet of Things - From Sensing to Sensemaking, 2017. arXiv admin note: substantial text overlap with arXiv:1705.00596

arXiv:1705.00596 [pdf, other]

doi 10.1109/JSTSP.2017.2679539

Feature-Sharing in Cascade Detection Systems with Multiple Applications

Authors: Long N. Le, Douglas L. Jones

Abstract: Traditional distributed detection systems are often designed for a single target application. However, with the emergence of the Internet of Things (IoT) paradigm, next-generation systems are expected to be a shared infrastructure for multiple applications. To this end, we propose a modular, cascade design for resource-efficient, multi-task detection systems. Two (classes of) applications are cons… ▽ More Traditional distributed detection systems are often designed for a single target application. However, with the emergence of the Internet of Things (IoT) paradigm, next-generation systems are expected to be a shared infrastructure for multiple applications. To this end, we propose a modular, cascade design for resource-efficient, multi-task detection systems. Two (classes of) applications are considered in the system, a primary and a secondary one. The primary application has universal features that can be shared with other applications, to reduce the overall feature extraction cost, while the secondary application does not. In this setting, the two applications can collaborate via feature sharing. We provide a method to optimize the operation of the multi-application cascade system based on an accurate resource consumption model. In addition, the inherent uncertainties in feature models are articulated and taken into account. For evaluation, the twin-comparison argument is invoked, and it is shown that, with the optimal feature sharing strategy, a system can achieve 9$\times$ resource saving and 1.43$\times$ improvement in detection performance. △ Less

Submitted 1 May, 2017; originally announced May 2017.

Comments: preprint, the published version is in IEEE Journal of Selected Topics in Signal Processing, Special Issue on Cooperative Signal Processing for Heterogeneous and Multi-Task Wireless Sensor Networks, 2017

Showing 1–34 of 34 results for author: Le, N