Search | arXiv e-print repository

Neural Graphics Texture Compression Supporting Random Acces

Authors: Farzad Farhadzadeh, Qiqi Hou, Hoang Le, Amir Said, Randall Rauwendaal, Alex Bourd, Fatih Porikli

Abstract: Advances in rendering have led to tremendous growth in texture assets, including resolution, complexity, and novel textures components, but this growth in data volume has not been matched by advances in its compression. Meanwhile Neural Image Compression (NIC) has advanced significantly and shown promising results, but the proposed methods cannot be directly adapted to neural texture compression.… ▽ More Advances in rendering have led to tremendous growth in texture assets, including resolution, complexity, and novel textures components, but this growth in data volume has not been matched by advances in its compression. Meanwhile Neural Image Compression (NIC) has advanced significantly and shown promising results, but the proposed methods cannot be directly adapted to neural texture compression. First, texture compression requires on-demand and real-time decoding with random access during parallel rendering (e.g. block texture decompression on GPUs). Additionally, NIC does not support multi-resolution reconstruction (mip-levels), nor does it have the ability to efficiently jointly compress different sets of texture channels. In this work, we introduce a novel approach to texture set compression that integrates traditional GPU texture representation and NIC techniques, designed to enable random access and support many-channel texture sets. To achieve this goal, we propose an asymmetric auto-encoder framework that employs a convolutional encoder to capture detailed information in a bottleneck-latent space, and at decoder side we utilize a fully connected network, whose inputs are sampled latent features plus positional information, for a given texture coordinate and mip level. This latent data is defined to enable simplified access to multi-resolution data by simply changing the scanning strides. Experimental results demonstrate that this approach provides much better results than conventional texture compression, and significant improvement over the latest method using neural networks. △ Less

Submitted 6 May, 2024; originally announced July 2024.

Comments: ECCV submission

arXiv:2405.18435 [pdf, other]

QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks. △ Less

Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

Comments: initial technical report

arXiv:2405.01979 [pdf, other]

Graph Neural Network based Active and Passive Beamforming for Distributed STAR-RIS-Assisted Multi-User MISO Systems

Authors: Ha An Le, Trinh Van Chien, Wan Choi

Abstract: This paper investigates a joint active and passive beamforming design for distributed simultaneous transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS) assisted multi-user (MU)- mutiple input single output (MISO) systems, where the energy splitting (ES) mode is considered for the STAR-RIS. We aim to design the active beamforming vectors at the base station (BS) and the passi… ▽ More This paper investigates a joint active and passive beamforming design for distributed simultaneous transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS) assisted multi-user (MU)- mutiple input single output (MISO) systems, where the energy splitting (ES) mode is considered for the STAR-RIS. We aim to design the active beamforming vectors at the base station (BS) and the passive beamforming at the STAR-RIS to maximize the user sum rate under transmitting power constraints. The formulated problem is non-convex and nontrivial to obtain the global optimum due to the coupling between active beamforming vectors and STAR-RIS phase shifts. To efficiently solve the problem, we propose a novel graph neural network (GNN)-based framework. Specifically, we first model the interactions among users and network entities are using a heterogeneous graph representation. A heterogeneous graph neural network (HGNN) implementation is then introduced to directly optimizes beamforming vectors and STAR-RIS coefficients with the system objective. Numerical results show that the proposed approach yields efficient performance compared to the previous benchmarks. Furthermore, the proposed GNN is scalable with various system configurations. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 13 pages, 7 figures

arXiv:2405.00681 [pdf, other]

Delay and Overhead Efficient Transmission Scheduling for Federated Learning in UAV Swarms

Authors: Duc N. M. Hoang, Vu Tuan Truong, Hung Duy Le, Long Bao Le

Abstract: This paper studies the wireless scheduling design to coordinate the transmissions of (local) model parameters of federated learning (FL) for a swarm of unmanned aerial vehicles (UAVs). The overall goal of the proposed design is to realize the FL training and aggregation processes with a central aggregator exploiting the sensory data collected by the UAVs but it considers the multi-hop wireless net… ▽ More This paper studies the wireless scheduling design to coordinate the transmissions of (local) model parameters of federated learning (FL) for a swarm of unmanned aerial vehicles (UAVs). The overall goal of the proposed design is to realize the FL training and aggregation processes with a central aggregator exploiting the sensory data collected by the UAVs but it considers the multi-hop wireless network formed by the UAVs. Such transmissions of model parameters over the UAV-based wireless network potentially cause large transmission delays and overhead. Our proposed framework smartly aggregates local model parameters trained by the UAVs while efficiently transmitting the underlying parameters to the central aggregator in each FL global round. We theoretically show that the proposed scheme achieves minimal delay and communication overhead. Extensive numerical experiments demonstrate the superiority of the proposed scheme compared to other baselines. △ Less

Submitted 22 February, 2024; originally announced May 2024.

Comments: accepted to WCNC'24

arXiv:2403.17879 [pdf, other]

Low-Latency Neural Stereo Streaming

Authors: Qiqi Hou, Farzad Farhadzadeh, Amir Said, Guillaume Sautiere, Hoang Le

Abstract: The rise of new video modalities like virtual reality or autonomous driving has increased the demand for efficient multi-view video compression methods, both in terms of rate-distortion (R-D) performance and in terms of delay and runtime. While most recent stereo video compression approaches have shown promising performance, they compress left and right views sequentially, leading to poor parallel… ▽ More The rise of new video modalities like virtual reality or autonomous driving has increased the demand for efficient multi-view video compression methods, both in terms of rate-distortion (R-D) performance and in terms of delay and runtime. While most recent stereo video compression approaches have shown promising performance, they compress left and right views sequentially, leading to poor parallelization and runtime performance. This work presents Low-Latency neural codec for Stereo video Streaming (LLSS), a novel parallel stereo video coding method designed for fast and efficient low-latency stereo video streaming. Instead of using a sequential cross-view motion compensation like existing methods, LLSS introduces a bidirectional feature shifting module to directly exploit mutual information among views and encode them effectively with a joint cross-view prior model for entropy coding. Thanks to this design, LLSS processes left and right views in parallel, minimizing latency; all while substantially improving R-D performance compared to both existing neural and conventional codecs. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024

arXiv:2401.05915 [pdf, other]

Neural Implicit Surface Reconstruction of Freehand 3D Ultrasound Volume with Geometric Constraints

Authors: Hongbo Chen, Logiraj Kumaralingam, Shuhang Zhang, Sheng Song, Fayi Zhang, Haibin Zhang, Thanh-Tu Pham, Edmond H. M. Lou, Kumaradevan Punithakumar, Lawrence H. Le, Rui Zheng

Abstract: Three-dimensional (3D) freehand ultrasound (US) is a widely used imaging modality that allows non-invasive imaging of medical anatomy without radiation exposure. The surface reconstruction of US volume is vital to acquire the accurate anatomical structures needed for modeling, registration, and visualization. However, traditional methods cannot produce a high-quality surface due to image noise. De… ▽ More Three-dimensional (3D) freehand ultrasound (US) is a widely used imaging modality that allows non-invasive imaging of medical anatomy without radiation exposure. The surface reconstruction of US volume is vital to acquire the accurate anatomical structures needed for modeling, registration, and visualization. However, traditional methods cannot produce a high-quality surface due to image noise. Despite improvements in smoothness, continuity, and resolution from deep learning approaches, research on surface reconstruction in freehand 3D US is still limited. This study introduces FUNSR, a self-supervised neural implicit surface reconstruction method to learn signed distance functions (SDFs) from US volumes. In particular, FUNSR iteratively learns the SDFs by moving the 3D queries sampled around the volumetric point clouds to approximate the surface, guided by two novel geometric constraints: sign consistency constraint and on-surface constraint with adversarial learning. Our approach has been thoroughly evaluated across four datasets to demonstrate its adaptability to various anatomical structures, including a hip phantom dataset, two vascular datasets and one publicly available prostate dataset. We also show that smooth and continuous representations greatly enhance the visual appearance of US data. Furthermore, we highlight the robustness of our method to noise distribution and its potential to improve segmentation performance. △ Less

Submitted 1 May, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: Preprint

arXiv:2401.03754 [pdf, other]

Joint Power Allocation and User Scheduling in Integrated Satellite-Terrestrial Cell-Free Massive MIMO IoT Systems

Authors: Trinh Van Chien, Ha An Le, Ta Hai Tung, Hien Quoc Ngo, Symeon Chatzinotas

Abstract: Both space and ground communications have been proven effective solutions under different perspectives in Internet of Things (IoT) networks. This paper investigates multiple-access scenarios, where plenty of IoT users are cooperatively served by a satellite in space and access points (APs) on the ground. Available users in each coherence interval are split into scheduled and unscheduled subsets to… ▽ More Both space and ground communications have been proven effective solutions under different perspectives in Internet of Things (IoT) networks. This paper investigates multiple-access scenarios, where plenty of IoT users are cooperatively served by a satellite in space and access points (APs) on the ground. Available users in each coherence interval are split into scheduled and unscheduled subsets to optimize limited radio resources. We compute the uplink ergodic throughput of each scheduled user under imperfect channel state information (CSI) and non-orthogonal pilot signals. As maximum-radio combining is deployed locally at the ground gateway and the APs, the uplink ergodic throughput is obtained in a closed-form expression. The analytical results explicitly unveil the effects of channel conditions and pilot contamination on each scheduled user. By maximizing the sum throughput, the system can simultaneously determine scheduled users and perform power allocation based on either a model-based approach with alternating optimization or a learning-based approach with the graph neural network. Numerical results manifest that integrated satellite-terrestrial cell-free massive multiple-input multiple-output systems can significantly improve the sum ergodic throughput over coherence intervals. The integrated systems can schedule the vast majority of users; some might be out of service due to the limited power budget. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 15 pages, 10 figures, 1 table. Submitted for publication

arXiv:2312.00921 [pdf, ps, other]

Bitstream Organization for Parallel Entropy Coding on Neural Network-based Video Codecs

Authors: Amir Said, Hoang Le, Farzad Farhadzadeh

Abstract: Video compression systems must support increasing bandwidth and data throughput at low cost and power, and can be limited by entropy coding bottlenecks. Efficiency can be greatly improved by parallelizing coding, which can be done at much larger scales with new neural-based codecs, but with some compression loss related to data organization. We analyze the bit rate overhead needed to support multi… ▽ More Video compression systems must support increasing bandwidth and data throughput at low cost and power, and can be limited by entropy coding bottlenecks. Efficiency can be greatly improved by parallelizing coding, which can be done at much larger scales with new neural-based codecs, but with some compression loss related to data organization. We analyze the bit rate overhead needed to support multiple bitstreams for concurrent decoding, and for its minimization propose a method for compressing parallel-decoding entry points, using bidirectional bitstream packing, and a new form of jointly optimizing arithmetic coding termination. It is shown that those techniques significantly lower the overhead, making it easier to reduce it to a small fraction of the average bitstream size, like, for example, less than 1% and 0.1% when the average number of bitstream bytes is respectively larger than 95 and 1,200 bytes. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Journal ref: Proc. IEEE International Conference on Multimedia, Dec. 2023

arXiv:2311.11096 [pdf, other]

On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation

Authors: Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert

Abstract: Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for… ▽ More Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on develo** better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance. △ Less

Submitted 18 November, 2023; originally announced November 2023.

Comments: Advances in Neural Information Processing Systems (NeurIPS) 2023, Workshop on robustness of zero/few-shot learning in foundation models

arXiv:2310.01258 [pdf, other]

MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Authors: Ties van Rozendaal, Tushar Singhal, Hoang Le, Guillaume Sautiere, Amir Said, Krishna Buska, Anjuman Raha, Dimitris Kalatzis, Hitarth Mehta, Frank Mayer, Liang Zhang, Markus Nagel, Auke Wiggers

Abstract: Neural video codecs have recently become competitive with standard codecs such as HEVC in the low-delay setting. However, most neural codecs are large floating-point networks that use pixel-dense war** operations for temporal modeling, making them too computationally expensive for deployment on mobile devices. Recent work has demonstrated that running a neural decoder in real time on mobile is f… ▽ More Neural video codecs have recently become competitive with standard codecs such as HEVC in the low-delay setting. However, most neural codecs are large floating-point networks that use pixel-dense war** operations for temporal modeling, making them too computationally expensive for deployment on mobile devices. Recent work has demonstrated that running a neural decoder in real time on mobile is feasible, but shows this only for 720p RGB video. This work presents the first neural video codec that decodes 1080p YUV420 video in real time on a mobile device. Our codec relies on two major contributions. First, we design an efficient codec that uses a block-based motion compensation algorithm available on the war** core of the mobile accelerator, and we show how to quantize this model to integer precision. Second, we implement a fast decoder pipeline that concurrently runs neural network components on the neural signal processor, parallel entropy coding on the mobile GPU, and war** on the war** core. Our codec outperforms the previous on-device codec by a large margin with up to 48% BD-rate savings, while reducing the MAC count on the receiver side by $10 \times$. We perform a careful ablation to demonstrate the effect of the introduced motion compensation scheme, and ablate the effect of model quantization. △ Less

Submitted 15 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Matches version published at WACV 2024

arXiv:2309.06956 [pdf, other]

Implicit Neural Multiple Description for DNA-based data storage

Authors: Trung Hieu Le, Xavier Pic, Jeremy Mateos, Marc Antonini

Abstract: DNA exhibits remarkable potential as a data storage solution due to its impressive storage density and long-term stability, stemming from its inherent biomolecular structure. However, develo** this novel medium comes with its own set of challenges, particularly in addressing errors arising from storage and biological manipulations. These challenges are further conditioned by the structural const… ▽ More DNA exhibits remarkable potential as a data storage solution due to its impressive storage density and long-term stability, stemming from its inherent biomolecular structure. However, develo** this novel medium comes with its own set of challenges, particularly in addressing errors arising from storage and biological manipulations. These challenges are further conditioned by the structural constraints of DNA sequences and cost considerations. In response to these limitations, we have pioneered a novel compression scheme and a cutting-edge Multiple Description Coding (MDC) technique utilizing neural networks for DNA data storage. Our MDC method introduces an innovative approach to encoding data into DNA, specifically designed to withstand errors effectively. Notably, our new compression scheme overperforms classic image compression methods for DNA-data storage. Furthermore, our approach exhibits superiority over conventional MDC methods reliant on auto-encoders. Its distinctive strengths lie in its ability to bypass the need for extensive model training and its enhanced adaptability for fine-tuning redundancy levels. Experimental results demonstrate that our solution competes favorably with the latest DNA data storage methods in the field, offering superior compression rates and robust noise resilience. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: Xavier Pic and Trung Hieu Le are both equal contributors and primary authors

arXiv:2309.05472 [pdf, other]

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training. Overall, the newly introduced models trained on 14,000 hours of French speech outperform multilingual and previous LeBenchmark SSL models across the benchmark but also required up to four times more energy for pre-training. △ Less

Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: Published in Computer Science and Language. Preprint allowed

arXiv:2309.04178 [pdf, other]

doi 10.1109/TCOMM.2023.3280209

Double RIS-Assisted MIMO Systems Over Spatially Correlated Rician Fading Channels and Finite Scatterers

Authors: Ha An Le, Trinh Van Chien, Van Duc Nguyen, Wan Choi

Abstract: This paper investigates double RIS-assisted MIMO communication systems over Rician fading channels with finite scatterers, spatial correlation, and the existence of a double-scattering link between the transceiver. First, the statistical information is driven in closed form for the aggregated channels, unveiling various influences of the system and environment on the average channel power gains. N… ▽ More This paper investigates double RIS-assisted MIMO communication systems over Rician fading channels with finite scatterers, spatial correlation, and the existence of a double-scattering link between the transceiver. First, the statistical information is driven in closed form for the aggregated channels, unveiling various influences of the system and environment on the average channel power gains. Next, we study two active and passive beamforming designs corresponding to two objectives. The first problem maximizes channel capacity by jointly optimizing the active precoding and combining matrices at the transceivers and passive beamforming at the double RISs subject to the transmitting power constraint. In order to tackle the inherently non-convex issue, we propose an efficient alternating optimization algorithm (AO) based on the alternating direction method of multipliers (ADMM). The second problem enhances communication reliability by jointly training the encoder and decoder at the transceivers and the phase shifters at the RISs. Each neural network representing a system entity in an end-to-end learning framework is proposed to minimize the symbol error rate of the detected symbols by controlling the transceiver and the RISs phase shifts. Numerical results verify our analysis and demonstrate the superior improvements of phase shift designs to boost system performance. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 15 pages, 9 figures, accepted by IEEE Transactions on Communications

arXiv:2307.04216 [pdf, other]

Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data

Authors: Hieu Le, Jian Tao

Abstract: Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our… ▽ More Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data, but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. 2D simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis. △ Less

Submitted 6 May, 2024; v1 submitted 9 July, 2023; originally announced July 2023.

Comments: 14 pages

arXiv:2306.13919 [pdf, other]

INR-MDSQC: Implicit Neural Representation Multiple Description Scalar Quantization for robust image Coding

Authors: Trung Hieu Le, Xavier Pic, Marc Antonini

Abstract: Multiple Description Coding (MDC) is an error-resilient source coding method designed for transmission over noisy channels. We present a novel MDC scheme employing a neural network based on implicit neural representation. This involves overfitting the neural representation for images. Each description is transmitted along with model parameters and its respective latent spaces. Our method has advan… ▽ More Multiple Description Coding (MDC) is an error-resilient source coding method designed for transmission over noisy channels. We present a novel MDC scheme employing a neural network based on implicit neural representation. This involves overfitting the neural representation for images. Each description is transmitted along with model parameters and its respective latent spaces. Our method has advantages over traditional MDC that utilizes auto-encoders, such as eliminating the need for model training and offering high flexibility in redundancy adjustment. Experiments demonstrate that our solution is competitive with autoencoder-based MDC and classic MDC based on HEVC, delivering superior visual quality. △ Less

Submitted 7 August, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

Comments: Accepted at IEEE MMSP 2023

arXiv:2303.05843 [pdf, other]

doi 10.1109/ICIP49359.2023.10223049

Multiple description video coding for real-time applications using HEVC

Authors: Trung Hieu Le, Marc Antonini, Marc Lambert, Karima Alioua

Abstract: Remote control vehicles require the transmission of large amounts of data, and video is one of the most important sources for the driver. To ensure reliable video transmission, the encoded video stream is transmitted simultaneously over multiple channels. However, this solution incurs a high transmission cost due to the wireless channel's unreliable and random bit loss characteristics. To address… ▽ More Remote control vehicles require the transmission of large amounts of data, and video is one of the most important sources for the driver. To ensure reliable video transmission, the encoded video stream is transmitted simultaneously over multiple channels. However, this solution incurs a high transmission cost due to the wireless channel's unreliable and random bit loss characteristics. To address this issue, it is necessary to use more efficient video encoding methods that can make the video stream robust to noise. In this paper, we propose a low-complexity, low-latency 2-channel Multiple Description Coding (MDC) solution with an adaptive Instantaneous Decoder Refresh (IDR) frame period, which is compatible with the HEVC standard. This method shows better resistance to high packet loss rates with lower complexity. △ Less

Submitted 7 August, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted at IEEE ICIP 2023

arXiv:2301.08752 [pdf, ps, other]

doi 10.1109/ICIP46576.2022.9897505

Optimized learned entropy coding parameters for practical neural-based image and video compression

Authors: Amir Said, Reza Pourreza, Hoang Le

Abstract: Neural-based image and video codecs are significantly more power-efficient when weights and activations are quantized to low-precision integers. While there are general-purpose techniques for reducing quantization effects, large losses can occur when specific entropy coding properties are not considered. This work analyzes how entropy coding is affected by parameter quantizations, and provides a m… ▽ More Neural-based image and video codecs are significantly more power-efficient when weights and activations are quantized to low-precision integers. While there are general-purpose techniques for reducing quantization effects, large losses can occur when specific entropy coding properties are not considered. This work analyzes how entropy coding is affected by parameter quantizations, and provides a method to minimize losses. It is shown that, by using a certain type of coding parameters to be learned, uniform quantization becomes practically optimal, also simplifying the minimization of code memory requirements. The mathematical properties of the new representation are presented, and its effectiveness is demonstrated by coding experiments, showing that good results can be obtained with precision as low as 4~bits per network output, and practically no loss with 8~bits. △ Less

Submitted 20 January, 2023; originally announced January 2023.

Comments: 2022 IEEE International Conference on Image Processing (ICIP)

Journal ref: IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 661-665

arXiv:2208.05433 [pdf, other]

doi 10.1371/journal.pone.0277081

Detecting COVID-19 from digitized ECG printouts using 1D convolutional neural networks

Authors: Thao Nguyen, Hieu H. Pham, Huy Khiem Le, Anh Tu Nguyen, Ngoc Tien Thanh, Cuong Do

Abstract: The COVID-19 pandemic has exposed the vulnerability of healthcare services worldwide, raising the need to develop novel tools to provide rapid and cost-effective screening and diagnosis. Clinical reports indicated that COVID-19 infection may cause cardiac injury, and electrocardiograms (ECG) may serve as a diagnostic biomarker for COVID-19. This study aims to utilize ECG signals to detect COVID-19… ▽ More The COVID-19 pandemic has exposed the vulnerability of healthcare services worldwide, raising the need to develop novel tools to provide rapid and cost-effective screening and diagnosis. Clinical reports indicated that COVID-19 infection may cause cardiac injury, and electrocardiograms (ECG) may serve as a diagnostic biomarker for COVID-19. This study aims to utilize ECG signals to detect COVID-19 automatically. We propose a novel method to extract ECG signals from ECG paper records, which are then fed into a one-dimensional convolution neural network (1D-CNN) to learn and diagnose the disease. To evaluate the quality of digitized signals, R peaks in the paper-based ECG images are labeled. Afterward, RR intervals calculated from each image are compared to RR intervals of the corresponding digitized signal. Experiments on the COVID-19 ECG images dataset demonstrate that the proposed digitization method is able to capture correctly the original signals, with a mean absolute error of 28.11 ms. Our proposed 1D-CNN model, which is trained on the digitized ECG signals, allows identifying individuals with COVID-19 and other subjects accurately, with classification accuracies of 98.42%, 95.63%, and 98.50% for classifying COVID-19 vs. Normal, COVID-19 vs. Abnormal Heartbeats, and COVID-19 vs. other classes, respectively. Furthermore, the proposed method also achieves a high-level of performance for the multi-classification task. Our findings indicate that a deep learning system trained on digitized ECG signals can serve as a potential tool for diagnosing COVID-19. △ Less

Submitted 5 October, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: Accepted with minor revision by Plos One

arXiv:2208.04303 [pdf, other]

Boosting neural video codecs by exploiting hierarchical redundancy

Authors: Reza Pourreza, Hoang Le, Amir Said, Guillaume Sautiere, Auke Wiggers

Abstract: In video compression, coding efficiency is improved by reusing pixels from previously decoded frames via motion and residual compensation. We define two levels of hierarchical redundancy in video frames: 1) first-order: redundancy in pixel space, i.e., similarities in pixel values across neighboring frames, which is effectively captured using motion and residual compensation, 2) second-order: redu… ▽ More In video compression, coding efficiency is improved by reusing pixels from previously decoded frames via motion and residual compensation. We define two levels of hierarchical redundancy in video frames: 1) first-order: redundancy in pixel space, i.e., similarities in pixel values across neighboring frames, which is effectively captured using motion and residual compensation, 2) second-order: redundancy in motion and residual maps due to smooth motion in natural videos. While most of the existing neural video coding literature addresses first-order redundancy, we tackle the problem of capturing second-order redundancy in neural video codecs via predictors. We introduce generic motion and residual predictors that learn to extrapolate from previously decoded data. These predictors are lightweight, and can be employed with most neural video codecs in order to improve their rate-distortion performance. Moreover, while RGB is the dominant colorspace in neural video coding literature, we introduce general modifications for neural video codecs to embrace the YUV420 colorspace and report YUV420 results. Our experiments show that using our predictors with a well-known neural video codec leads to 38% and 34% bitrate savings in RGB and YUV420 colorspaces measured on the UVG dataset. △ Less

Submitted 16 September, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

Comments: WACV 2023

arXiv:2207.14459 [pdf, other]

Generalized BER of MCIK-OFDM with Imperfect CSI: Selection combining GD versus ML receivers

Authors: Vu-Duc Ngo, Thien Van Luong, Nguyen Cong Luong, Minh-Tuan Le, Thi Thanh Huyen Le, Xuan-Nam Tran

Abstract: This paper analyzes the bit error rate (BER) of multicarrier index keying - orthogonal frequency division multiplexing (MCIK-OFDM) with selection combining (SC) diversity reception. Particularly, we propose a generalized framework to derive the BER for both the low-complexity greedy detector (GD) and maximum likelihood (ML) detector. Based on this, closedform expressions for the BERs of MCIK-OFDM… ▽ More This paper analyzes the bit error rate (BER) of multicarrier index keying - orthogonal frequency division multiplexing (MCIK-OFDM) with selection combining (SC) diversity reception. Particularly, we propose a generalized framework to derive the BER for both the low-complexity greedy detector (GD) and maximum likelihood (ML) detector. Based on this, closedform expressions for the BERs of MCIK-OFDM with the SC using either the ML or the GD are derived in presence of the channel state information (CSI) imperfection. The asymptotic analysis is presented to gain helpful insights into effects of different CSI conditions on the BERs of these two detectors. More importantly, we theoretically provide opportunities for using the GD instead of the ML under each specific CSI uncertainty, which depend on the number of receiver antennas and the M-ary modulation size. Finally, extensive simulation results are provided in order to validate our theoretical expressions and analysis. △ Less

Submitted 28 July, 2022; originally announced July 2022.

arXiv:2207.14454 [pdf, other]

Enhancing Diversity of OFDM with Joint Spread Spectrum and Subcarrier Index Modulations

Authors: Vu-Duc Ngo, Thien Van Luong, Nguyen Cong Luong, Mai Xuan Trang, Minh-Tuan Le, Thi Thanh Huyen Le, Xuan-Nam Tran

Abstract: This paper proposes a novel spread spectrum and sub-carrier index modulation (SS-SIM) scheme, which is integrated to orthogonal frequency division multiplexing (OFDM) framework to enhance the diversity over the conventional IM schemes. Particularly, the resulting scheme, called SS-SIMOFDM, jointly employs both spread spectrum and sub-carrier index modulations to form a precoding vector which is th… ▽ More This paper proposes a novel spread spectrum and sub-carrier index modulation (SS-SIM) scheme, which is integrated to orthogonal frequency division multiplexing (OFDM) framework to enhance the diversity over the conventional IM schemes. Particularly, the resulting scheme, called SS-SIMOFDM, jointly employs both spread spectrum and sub-carrier index modulations to form a precoding vector which is then used to spread an M-ary complex symbol across all active sub-carriers. As a result, the proposed scheme enables a novel transmission of three signal domains: SS and sub-carrier indices, and a single M-ary symbol. For practical implementations, two reduced-complexity near-optimal detectors are proposed, which have complexities less depending on the M-ary modulation size. Then, the bit error probability and its upper bound are analyzed to gain an insight into the diversity gain, which is shown to be strongly affected by the order of sub-carrier indices. Based on this observation, we propose two novel sub-carrier index map** methods, which significantly increase the diversity gain of SSSIM-OFDM. Finally, simulation results show that our scheme achieves better error performance than the benchmarks at the cost of lower spectral efficiency compared to classical OFDM and OFDM-IM, which can carry multiple M-ary symbols. △ Less

Submitted 28 July, 2022; originally announced July 2022.

arXiv:2207.08338 [pdf, other]

MobileCodec: Neural Inter-frame Video Compression on Mobile Devices

Authors: Hoang Le, Liang Zhang, Amir Said, Guillaume Sautiere, Yang Yang, Pranav Shrestha, Fei Yin, Reza Pourreza, Auke Wiggers

Abstract: Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time opera… ▽ More Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time operation on a mobile device powered by Snapdragon technology. We show the first-ever inter-frame neural video decoder running on a commercial mobile phone, decoding high-definition videos in real-time while maintaining a low bitrate and high visual quality. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: ACM MMSys 2022

arXiv:2207.08077 [pdf, other]

RIS-Assisted MIMO Communication Systems: Model-based versus Autoencoder Approaches

Authors: Ha An Le, Trinh Van Chien, Van Duc Nguyen, Wan Choi

Abstract: This paper considers reconfigurable intelligent surface (RIS)-assisted point-to-point multiple-input multiple-output (MIMO) communication systems, where a transmitter communicates with a receiver through an RIS. Based on the main target of reducing the bit error rate (BER) and therefore enhancing the communication reliability, we study different model-based and data-driven (autoencoder) approaches… ▽ More This paper considers reconfigurable intelligent surface (RIS)-assisted point-to-point multiple-input multiple-output (MIMO) communication systems, where a transmitter communicates with a receiver through an RIS. Based on the main target of reducing the bit error rate (BER) and therefore enhancing the communication reliability, we study different model-based and data-driven (autoencoder) approaches. In particular, we consider a model-based approach that optimizes both active and passive optimization variables. We further propose a novel end-to-end data-driven framework, which leverages the recent advances in machine learning. The neural networks presented for conventional signal processing modules are jointly trained with the channel effects to minimize the bit error detection. Numerical results demonstrate that the proposed data-driven approach can learn to encode the transmitted signal via different channel realizations dynamically. In addition, the data-driven approach not only offers a significant gain in the BER performance compared to the other state-of-the-art benchmarks but also guarantees the performance when perfect channel information is unavailable. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: 6 pages, 3 figures, and 2 tables. Accepted to present at IEEE PIMRC 2022

arXiv:2206.03778 [pdf, other]

Learning Digital Terrain Models from Point Clouds: ALS2DTM Dataset and Rasterization-based GAN

Authors: Hoàng-Ân Lê, Florent Guiotte, Minh-Tan Pham, Sébastien Lefèvre, Thomas Corpetti

Abstract: Despite the popularity of deep neural networks in various domains, the extraction of digital terrain models (DTMs) from airborne laser scanning (ALS) point clouds is still challenging. This might be due to the lack of dedicated large-scale annotated dataset and the data-structure discrepancy between point clouds and DTMs. To promote data-driven DTM extraction, this paper collects from open sources… ▽ More Despite the popularity of deep neural networks in various domains, the extraction of digital terrain models (DTMs) from airborne laser scanning (ALS) point clouds is still challenging. This might be due to the lack of dedicated large-scale annotated dataset and the data-structure discrepancy between point clouds and DTMs. To promote data-driven DTM extraction, this paper collects from open sources a large-scale dataset of ALS point clouds and corresponding DTMs with various urban, forested, and mountainous scenes. A baseline method is proposed as the first attempt to train a Deep neural network to extract digital Terrain models directly from ALS point clouds via Rasterization techniques, coined DeepTerRa. Extensive studies with well-established methods are performed to benchmark the dataset and analyze the challenges in learning to extract DTM from point clouds. The experimental results show the interest of the agnostic data-driven approach, with sub-metric error level compared to methods designed for DTM extraction. The data and source code is provided at https://lhoangan.github.io/deepterra/ for reproducibility and further similar research. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2204.01346 [pdf, ps, other]

Concurrent learning in high-order tuners for parameter identification

Authors: Justin H. Le, Andrew R. Teel

Abstract: High-order tuners are algorithms that show promise in achieving greater efficiency than classic gradient-based algorithms in identifying the parameters of parametric models and/or in facilitating the progress of a control or optimization algorithm whose adaptive behavior relies on such models. For high-order tuners, robust stability properties, namely uniform global asymptotic (and exponential) st… ▽ More High-order tuners are algorithms that show promise in achieving greater efficiency than classic gradient-based algorithms in identifying the parameters of parametric models and/or in facilitating the progress of a control or optimization algorithm whose adaptive behavior relies on such models. For high-order tuners, robust stability properties, namely uniform global asymptotic (and exponential) stability, currently rely on a persistent excitation (PE) condition. In this work, we establish such stability properties with a novel analysis based on a Matrosov theorem and then show that the PE requirement can be relaxed via a concurrent learning technique driven by sampled data points that are sufficiently rich. We show numerically that concurrent learning may greatly improve efficiency. We incorporate reset methods that preserve the stability guarantees while providing additional improvements that may be relevant in applications that demand highly accurate parameter estimates at relatively low additional cost in computation. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2112.13559 [pdf, other]

doi 10.1145/3477314.3507112

DAM-AL: Dilated Attention Mechanism with Attention Loss for 3D Infant Brain Image Segmentation

Authors: Dinh-Hieu Hoang, Gia-Han Diep, Minh-Triet Tran, Ngan T. H Le

Abstract: While Magnetic Resonance Imaging (MRI) has played an essential role in infant brain analysis, segmenting MRI into a number of tissues such as gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) is crucial and complex due to the extremely low intensity contrast between tissues at around 6-9 months of age as well as amplified noise, myelination, and incomplete volume. In this paper, w… ▽ More While Magnetic Resonance Imaging (MRI) has played an essential role in infant brain analysis, segmenting MRI into a number of tissues such as gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) is crucial and complex due to the extremely low intensity contrast between tissues at around 6-9 months of age as well as amplified noise, myelination, and incomplete volume. In this paper, we tackle those limitations by develo** a new deep learning model, named DAM-AL, which contains two main contributions, i.e., dilated attention mechanism and hard-case attention loss. Our DAM-AL network is designed with skip block layers and atrous block convolution. It contains both channel-wise attention at high-level context features and spatial attention at low-level spatial structural features. Our attention loss consists of two terms corresponding to region information and hard samples attention. Our proposed DAM-AL has been evaluated on the infant brain iSeg 2017 dataset and the experiments have been conducted on both validation and testing sets. We have benchmarked DAM-AL on Dice coefficient and ASD metrics and compared it with state-of-the-art methods. △ Less

Submitted 27 December, 2021; originally announced December 2021.

arXiv:2107.00845 [pdf, other]

A Business Model for Resource Sharing in Cell-Free UAVs-Assisted Wireless Networks

Authors: Yan Kyaw Tun, Yu Min Park, Tra Huong Thi Le, Zhu Han, Choong Seon Hong

Abstract: Unmanned aerial vehicles (UAVs) are widely deployed to enhance the wireless network capacity and to provide communication services to mobile users beyond the infrastructure coverage. Recently, with the help of a promising technology called network virtualization, multiple service providers (SPs) can share the infrastructures and wireless resources owned by the mobile network operators (MNOs). Then… ▽ More Unmanned aerial vehicles (UAVs) are widely deployed to enhance the wireless network capacity and to provide communication services to mobile users beyond the infrastructure coverage. Recently, with the help of a promising technology called network virtualization, multiple service providers (SPs) can share the infrastructures and wireless resources owned by the mobile network operators (MNOs). Then, they provide specific services to their mobile users using the resources obtained from MNOs. However, wireless resource sharing among SPs is challenging as each SP wants to maximize their utility/profit selfishly while satisfying the QoS requirement of their mobile users. Therefore, in this paper, we propose joint user association and wireless resource sharing problem in the cell-free UAVs-assisted wireless networks with the objective of maximizing the total network utility of the SPs while ensuring QoS constraints of their mobile users and the resource constraints of the UAVs deployed by MNOs. To solve the proposed mixed-integer non-convex problem, we decompose the proposed problem into two subproblems: users association, and resource sharing problems. Then, a two-sided matching algorithm is deployed in order to solve users association problem. We further deploy the whale optimization and Lagrangian relaxation algorithms to solve the resource sharing problem. Finally, extensive numerical results are provided in order to show the effectiveness of our proposed algorithm. △ Less

Submitted 2 July, 2021; originally announced July 2021.

Comments: This paper has been submitted to IEEE Transactions on Vehicular Technology

arXiv:2104.11462 [pdf, ps, other]

doi 10.21437/Interspeech.2021-556

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient spee… ▽ More Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We also focus on speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets. Experiments show that SSL is beneficial for most but not all tasks which confirms the need for exhaustive and reliable benchmarks to evaluate its real impact. LeBenchmark is shared with the scientific community for reproducible research in SSL from speech. △ Less

Submitted 10 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

Comments: Will be presented at Interspeech 2021

Journal ref: Proc. Interspeech 2021

arXiv:2104.11414 [pdf, other]

Passive soft-reset controllers for nonlinear systems

Authors: Justin H. Le, Andrew R. Teel

Abstract: Soft-reset controllers are introduced as a way to approximate hard-reset controllers. The focus is on implementing reset controllers that are (strictly) passive and on analyzing their interconnection with passive plants. A passive hard-reset controller that has a strongly convex energy function can be approximated as a soft-reset controller. A hard-reset controller is a hybrid system whereas a sof… ▽ More Soft-reset controllers are introduced as a way to approximate hard-reset controllers. The focus is on implementing reset controllers that are (strictly) passive and on analyzing their interconnection with passive plants. A passive hard-reset controller that has a strongly convex energy function can be approximated as a soft-reset controller. A hard-reset controller is a hybrid system whereas a soft-reset controller corresponds to a differential inclusion, living entirely in the continuous-time domain. This feature may make soft-reset controllers easier to understand and implement. A soft-reset controller contains a parameter that can be adjusted to better approximate the action of the hard-reset controller. Closed-loop asymptotic stability is established for the interconnection of a passive soft-reset controller with a passive plant, under appropriate detectability assumptions. Several examples are used to illustrate the efficacy of soft-reset controllers. △ Less

Submitted 22 September, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

arXiv:2104.10307 [pdf, ps, other]

Analyzing the Effect of Persistent Asset Switches on a Class of Hybrid-Inspired Optimization Algorithms

Authors: Matina Baradaran, Justin H. Le, Andrew R. Teel

Abstract: Convex optimization challenges are currently pervasive in many science and engineering domains. In many applications of convex optimization, such as those involving multi-agent systems and resource allocation, the objective function can persistently switch during the execution of an optimization algorithm. Motivated by such applications, we analyze the effect of persistently switching objectives i… ▽ More Convex optimization challenges are currently pervasive in many science and engineering domains. In many applications of convex optimization, such as those involving multi-agent systems and resource allocation, the objective function can persistently switch during the execution of an optimization algorithm. Motivated by such applications, we analyze the effect of persistently switching objectives in continuous-time optimization algorithms. In particular, we take advantage of existing robust stability results for switched systems with distinct equilibria and extend these results to systems described by differential inclusions, making the results applicable to recent optimization algorithms that employ differential inclusions for improving efficiency and/or robustness. Within the framework of hybrid systems theory, we provide an accurate characterization, in terms of Omega-limit sets, of the set to which the optimization dynamics converge. Finally, by considering the switching signal to be constrained in its average dwell time, we establish semi-global practical asymptotic stability of these sets with respect to the dwell-time parameter. △ Less

Submitted 20 April, 2021; originally announced April 2021.

arXiv:2103.12350 [pdf, other]

Roughness Index and Roughness Distance for Benchmarking Medical Segmentation

Authors: Vidhiwar Singh Rathour, Kashu Yamakazi, T. Hoang Ngan Le

Abstract: Medical image segmentation is one of the most challenging tasks in medical image analysis and has been widely developed for many clinical applications. Most of the existing metrics have been first designed for natural images and then extended to medical images. While object surface plays an important role in medical segmentation and quantitative analysis i.e. analyze brain tumor surface, measure g… ▽ More Medical image segmentation is one of the most challenging tasks in medical image analysis and has been widely developed for many clinical applications. Most of the existing metrics have been first designed for natural images and then extended to medical images. While object surface plays an important role in medical segmentation and quantitative analysis i.e. analyze brain tumor surface, measure gray matter volume, most of the existing metrics are limited when it comes to analyzing the object surface, especially to tell about surface smoothness or roughness of a given volumetric object or to analyze the topological errors. In this paper, we first analysis both pros and cons of all existing medical image segmentation metrics, specially on volumetric data. We then propose an appropriate roughness index and roughness distance for medical image segmentation analysis and evaluation. Our proposed method addresses two kinds of segmentation errors, i.e. (i)topological errors on boundary/surface and (ii)irregularities on the boundary/surface. The contribution of this work is four-fold: (i) detect irregular spikes/holes on a surface, (ii) propose roughness index to measure surface roughness of a given object, (iii) propose a roughness distance to measure the distance of two boundaries/surfaces by utilizing the proposed roughness index and (iv) suggest an algorithm which helps to remove the irregular spikes/holes to smooth the surface. Our proposed roughness index and roughness distance are built upon the solid surface roughness parameter which has been successfully developed in the civil engineering. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: Paper has been accepted at BIOIMAGING2021

arXiv:2103.11893 [pdf, ps, other]

Thresholding Greedy Pursuit for Sparse Recovery Problems

Authors: Hai Le, Alexei Novikov

Abstract: We study here sparse recovery problems in the presence of additive noise. We analyze a thresholding version of the CoSaMP algorithm, named Thresholding Greedy Pursuit (TGP). We demonstrate that an appropriate choice of thresholding parameter, even without the knowledge of sparsity level of the signal and strength of the noise, can result in exact recovery with no false discoveries as the dimension… ▽ More We study here sparse recovery problems in the presence of additive noise. We analyze a thresholding version of the CoSaMP algorithm, named Thresholding Greedy Pursuit (TGP). We demonstrate that an appropriate choice of thresholding parameter, even without the knowledge of sparsity level of the signal and strength of the noise, can result in exact recovery with no false discoveries as the dimension of the data increases to infinity. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Comments: First version

arXiv:2103.11055 [pdf, other]

Online Robust Control of Nonlinear Systems with Large Uncertainty

Authors: Dimitar Ho, Hoang M. Le, John C. Doyle, Yisong Yue

Abstract: Robust control is a core approach for controlling systems with performance guarantees that are robust to modeling error, and is widely used in real-world systems. However, current robust control approaches can only handle small system uncertainty, and thus require significant effort in system identification prior to controller design. We present an online approach that robustly controls a nonlinea… ▽ More Robust control is a core approach for controlling systems with performance guarantees that are robust to modeling error, and is widely used in real-world systems. However, current robust control approaches can only handle small system uncertainty, and thus require significant effort in system identification prior to controller design. We present an online approach that robustly controls a nonlinear system under large model uncertainty. Our approach is based on decomposing the problem into two sub-problems, "robust control design" (which assumes small model uncertainty) and "chasing consistent models", which can be solved using existing tools from control theory and online learning, respectively. We provide a learning convergence analysis that yields a finite mistake bound on the number of times performance requirements are not met and can provide strong safety guarantees, by bounding the worst-case state deviation. To the best of our knowledge, this is the first approach for online robust control of nonlinear systems with such learning theoretic and safety guarantees. We also show how to instantiate this framework for general robotic systems, demonstrating the practicality of our approach. △ Less

Submitted 4 June, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: 58 pages, 5 figures

Journal ref: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics 2021, PMLR 130:3475-3483

arXiv:2103.09042 [pdf, ps, other]

Invertible Residual Network with Regularization for Effective Medical Image Segmentation

Authors: Kashu Yamazaki, Vidhiwar Singh Rathour, T. Hoang Ngan Le

Abstract: Deep Convolutional Neural Networks (CNNs) i.e. Residual Networks (ResNets) have been used successfully for many computer vision tasks, but are difficult to scale to 3D volumetric medical data. Memory is increasingly often the bottleneck when training 3D Convolutional Neural Networks (CNNs). Recently, invertible neural networks have been applied to significantly reduce activation memory footprint w… ▽ More Deep Convolutional Neural Networks (CNNs) i.e. Residual Networks (ResNets) have been used successfully for many computer vision tasks, but are difficult to scale to 3D volumetric medical data. Memory is increasingly often the bottleneck when training 3D Convolutional Neural Networks (CNNs). Recently, invertible neural networks have been applied to significantly reduce activation memory footprint when training neural networks with backpropagation thanks to the invertible functions that allow retrieving its input from its output without storing intermediate activations in memory to perform the backpropagation. Among many successful network architectures, 3D Unet has been established as a standard architecture for volumetric medical segmentation. Thus, we choose 3D Unet as a baseline for a non-invertible network and we then extend it with the invertible residual network. In this paper, we proposed two versions of the invertible Residual Network, namely Partially Invertible Residual Network (Partially-InvRes) and Fully Invertible Residual Network (Fully-InvRes). In Partially-InvRes, the invertible residual layer is defined by a technique called additive coupling whereas in Fully-InvRes, both invertible upsampling and downsampling operations are learned based on squeezing (known as pixel shuffle). Furthermore, to avoid the overfitting problem because of less training data, a variational auto-encoder (VAE) branch is added to reconstruct the input volumetric data itself. Our results indicate that by using partially/fully invertible networks as the central workhorse in volumetric segmentation, we not only reduce memory overhead but also achieve compatible segmentation performance compared against the non-invertible 3D Unet. We have demonstrated the proposed networks on various volumetric datasets such as iSeg 2019 and BraTS 2020. △ Less

Submitted 16 March, 2021; originally announced March 2021.

arXiv:2103.05115 [pdf, other]

Deep reinforcement learning in medical imaging: A literature review

Authors: S. Kevin Zhou, Hoang Ngan Le, Khoa Luu, Hien V. Nguyen, Nicholas Ayache

Abstract: Deep reinforcement learning (DRL) augments the reinforcement learning framework, which learns a sequence of actions that maximizes the expected reward, with the representative power of deep neural networks. Recent works have demonstrated the great potential of DRL in medicine and healthcare. This paper presents a literature review of DRL in medical imaging. We start with a comprehensive tutorial o… ▽ More Deep reinforcement learning (DRL) augments the reinforcement learning framework, which learns a sequence of actions that maximizes the expected reward, with the representative power of deep neural networks. Recent works have demonstrated the great potential of DRL in medicine and healthcare. This paper presents a literature review of DRL in medical imaging. We start with a comprehensive tutorial of DRL, including the latest model-free and model-based algorithms. We then cover existing DRL applications for medical imaging, which are roughly divided into three main categories: (I) parametric medical image analysis tasks including landmark detection, object/lesion detection, registration, and view plane localization; (ii) solving optimization tasks including hyperparameter tuning, selecting augmentation strategies, and neural architecture search; and (iii) miscellaneous applications including surgical gesture segmentation, personalized mobile health intervention, and computational model personalization. The paper concludes with discussions of future perspectives. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: 39 pages, 20 figures

arXiv:2012.07627 [pdf, other]

Water Level Estimation Using Sentinel-1 Synthetic Aperture Radar Imagery And Digital Elevation Models

Authors: Thai-Bao Duong-Nguyen, Thien-Nu Hoang, Phong Vo, Hoai-Bac Le

Abstract: Hydropower dams and reservoirs have been identified as the main factors redefining natural hydrological cycles. Therefore, monitoring water status in reservoirs plays a crucial role in planning and managing water resources, as well as forecasting drought and flood. This task has been traditionally done by installing sensor stations on the ground nearby water bodies, which has multiple disadvantage… ▽ More Hydropower dams and reservoirs have been identified as the main factors redefining natural hydrological cycles. Therefore, monitoring water status in reservoirs plays a crucial role in planning and managing water resources, as well as forecasting drought and flood. This task has been traditionally done by installing sensor stations on the ground nearby water bodies, which has multiple disadvantages in maintenance cost, accessibility, and global coverage. And to cope with these problems, Remote Sensing, which is known as the science of obtaining information about objects or areas without making contact with them, has been actively studied for many applications. In this paper, we propose a novel water level extracting approach, which employs Sentinel-1 Synthetic Aperture Radar imagery and Digital Elevation Model data sets. Experiments show that the algorithm achieved a low average error of 0.93 meters over three reservoirs globally, proving its potential to be widely applied and furthermore studied. △ Less

Submitted 28 December, 2020; v1 submitted 11 December, 2020; originally announced December 2020.

arXiv:2011.00747 [pdf, other]

Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

Authors: Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

Abstract: We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et al., 2017) but consist of two decoders, each responsible for one task (ASR or ST). Our major contribution lies in how these decoders interact with each other: one… ▽ More We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et al., 2017) but consist of two decoders, each responsible for one task (ASR or ST). Our major contribution lies in how these decoders interact with each other: one decoder can attend to different information sources from the other via a dual-attention mechanism. We propose two variants of these architectures corresponding to two different levels of dependencies between the decoders, called the parallel and cross dual-decoder Transformers, respectively. Extensive experiments on the MuST-C dataset show that our models outperform the previously-reported highest translation performance in the multilingual settings, and outperform as well bilingual one-to-one results. Furthermore, our parallel models demonstrate no trade-off between ASR and ST compared to the vanilla multi-task architecture. Our code and pre-trained models are available at https://github.com/formiel/speech-translation. △ Less

Submitted 1 November, 2020; originally announced November 2020.

Comments: Accepted at COLING 2020 (Oral)

Journal ref: The 28th International Conference on Computational Linguistics (COLING 2020)

arXiv:2009.13770 [pdf, other]

Hybrid Heavy-Ball Systems: Reset Methods for Optimization with Uncertainty

Authors: Justin H. Le, Andrew R. Teel

Abstract: Momentum methods for convex optimization often rely on precise choices of algorithmic parameters, based on knowledge of problem parameters, in order to achieve fast convergence, as well as to prevent oscillations that could severely restrict applications of these algorithms to cyber-physical systems. To address these issues, we propose two dynamical systems, named the Hybrid Heavy-Ball System and… ▽ More Momentum methods for convex optimization often rely on precise choices of algorithmic parameters, based on knowledge of problem parameters, in order to achieve fast convergence, as well as to prevent oscillations that could severely restrict applications of these algorithms to cyber-physical systems. To address these issues, we propose two dynamical systems, named the Hybrid Heavy-Ball System and Hybrid-inspired Heavy-Ball System, which employ a feedback mechanism for driving the momentum state toward zero whenever it points in undesired directions. We describe the relationship between the proposed systems and their discrete-time counterparts, deriving conditions based on linear matrix inequalities for ensuring exponential rates in both continuous time and discrete time. We provide numerical LMI results to illustrate the effects of our reset mechanisms on convergence rates in a setting that simulates uncertainty of problem parameters. Finally, we numerically demonstrate the efficiency and avoidance of oscillations of the proposed systems when solving both strongly convex and non-strongly convex problems. △ Less

Submitted 22 March, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

arXiv:2008.06828 [pdf, other]

A novel approach to remove foreign objects from chest X-ray images

Authors: Hieu X. Le, Phuong D. Nguyen, Thang H. Nguyen, Khanh N. Q. Le, Thanh T. Nguyen

Abstract: We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, a… ▽ More We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, an object detection model is trained to separate the foreign objects from the given image. Subsequently, the binary mask of each object is extracted utilizing a segmentation model. Each pair of the binary mask and the extracted object are then used for inpainting purposes. Finally, the in-painted regions are now merged back to the original image, resulting in a clean and non-foreign-object-existing output. To conclude, we achieved state-of-the-art accuracy. The experimental results showed a new approach to the possible applications of this method for chest X-ray images detection. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: 9 pages, 7 figures, 7 tables

arXiv:1905.10841 [pdf]

Utilizing Automated Breast Cancer Detection to Identify Spatial Distributions of Tumor Infiltrating Lymphocytes in Invasive Breast Cancer

Authors: Han Le, Rajarsi Gupta, Le Hou, Shahira Abousamra, Danielle Fassler, Tahsin Kurc, Dimitris Samaras, Rebecca Batiste, Tianhao Zhao, Arvind Rao, Alison L. Van Dyke, Ashish Sharma, Erich Bremer, Jonas S. Almeida, Joel Saltz

Abstract: Quantitative assessment of Tumor-TIL spatial relationships is increasingly important in both basic science and clinical aspects of breast cancer research. We have developed and evaluated convolutional neural network (CNN) analysis pipelines to generate combined maps of cancer regions and tumor infiltrating lymphocytes (TILs) in routine diagnostic breast cancer whole slide tissue images (WSIs). We… ▽ More Quantitative assessment of Tumor-TIL spatial relationships is increasingly important in both basic science and clinical aspects of breast cancer research. We have developed and evaluated convolutional neural network (CNN) analysis pipelines to generate combined maps of cancer regions and tumor infiltrating lymphocytes (TILs) in routine diagnostic breast cancer whole slide tissue images (WSIs). We produce interactive whole slide maps that provide 1) insight about the structural patterns and spatial distribution of lymphocytic infiltrates and 2) facilitate improved quantification of TILs. We evaluated both tumor and TIL analyses using three CNN networks - Resnet-34, VGG16 and Inception v4, and demonstrated that the results compared favorably to those obtained by what believe are the best published methods. We have produced open-source tools and generated a public dataset consisting of tumor/TIL maps for 1,015 TCGA breast cancer images. We also present a customized web-based interface that enables easy visualization and interactive exploration of high-resolution combined Tumor-TIL maps for 1,015TCGA invasive breast cancer cases that can be downloaded for further downstream analyses. △ Less

Submitted 13 January, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

Comments: The American Journal of Pathology

arXiv:1905.02914 [pdf, other]

Adaptive neural network based dynamic surface control for uncertain dual arm robots

Authors: Dung Tien Pham, Thai Van Nguyen, Hai Xuan Le, Linh Nguyen, Nguyen Huu Thai, Tuan Anh Phan, Hai Tuan Pham, Anh Hoai Duong

Abstract: The paper discusses an adaptive strategy to effectively control nonlinear manipulation motions of a dual arm robot (DAR) under system uncertainties including parameter variations, actuator nonlinearities and external disturbances. It is proposed that the control scheme is first derived from the dynamic surface control (DSC) method, which allows the robot's end-effectors to robustly track the desir… ▽ More The paper discusses an adaptive strategy to effectively control nonlinear manipulation motions of a dual arm robot (DAR) under system uncertainties including parameter variations, actuator nonlinearities and external disturbances. It is proposed that the control scheme is first derived from the dynamic surface control (DSC) method, which allows the robot's end-effectors to robustly track the desired trajectories. Moreover, since exactly determining the DAR system's dynamics is impractical due to the system uncertainties, the uncertain system parameters are then proposed to be adaptively estimated by the use of the radial basis function network (RBFN). The adaptation mechanism is derived from the Lyapunov theory, which theoretically guarantees stability of the closed-loop control system. The effectiveness of the proposed RBFN-DSC approach is demonstrated by implementing the algorithm in a synthetic environment with realistic parameters, where the obtained results are highly promising. △ Less

Submitted 8 May, 2019; originally announced May 2019.

arXiv:1903.07214 [pdf, other]

doi 10.1109/CDC40024.2019.9029226

A Control Lyapunov Perspective on Episodic Learning via Projection to State Stability

Authors: Andrew J. Taylor, Victor D. Dorobantu, Meera Krishnamoorthy, Hoang M. Le, Yisong Yue, Aaron D. Ames

Abstract: The goal of this paper is to understand the impact of learning on control synthesis from a Lyapunov function perspective. In particular, rather than consider uncertainties in the full system dynamics, we employ Control Lyapunov Functions (CLFs) as low-dimensional projections. To understand and characterize the uncertainty that these projected dynamics introduce in the system, we introduce a new no… ▽ More The goal of this paper is to understand the impact of learning on control synthesis from a Lyapunov function perspective. In particular, rather than consider uncertainties in the full system dynamics, we employ Control Lyapunov Functions (CLFs) as low-dimensional projections. To understand and characterize the uncertainty that these projected dynamics introduce in the system, we introduce a new notion: Projection to State Stability (PSS). PSS can be viewed as a variant of Input to State Stability defined on projected dynamics, and enables characterizing robustness of a CLF with respect to the data used to learn system uncertainties. We use PSS to bound uncertainty in affine control, and demonstrate that a practical episodic learning approach can use PSS to characterize uncertainty in the CLF for robust control synthesis. △ Less

Submitted 17 March, 2019; originally announced March 2019.

arXiv:1903.01577 [pdf, other]

doi 10.1109/IROS40897.2019.8967820

Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems

Authors: Andrew J. Taylor, Victor D. Dorobantu, Hoang M. Le, Yisong Yue, Aaron D. Ames

Abstract: Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control… ▽ More Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems. Our proposed method proceeds by iteratively updating estimates of Lyapunov function derivatives and improving controllers, ultimately yielding a stabilizing quadratic program model-based controller. We validate our approach on a planar Segway simulation, demonstrating substantial performance improvements by iteratively refining on a base model-free controller. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Showing 1–43 of 43 results for author: Le, H