-
Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction
Authors:
Akash Awasthi,
Ngan Le,
Zhigang Deng,
Rishi Agrawal,
Carol C. Wu,
Hien Van Nguyen
Abstract:
Predicting human gaze behavior within computer vision is integral for develo** interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models…
▽ More
Predicting human gaze behavior within computer vision is integral for develo** interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models to medical imaging for scanpath prediction remains unexplored. Our proposed system aims to predict eye gaze sequences from radiology reports and CXR images, potentially streamlining data collection and enhancing AI systems using larger datasets. However, predicting human scanpaths on medical images presents unique challenges due to the diverse nature of abnormal regions. Our model predicts fixation coordinates and durations critical for medical scanpath prediction, outperforming existing models in the computer vision community. Utilizing a two-stage training process and large publicly available datasets, our approach generates static heatmaps and eye gaze videos aligned with radiology reports, facilitating comprehensive analysis. We validate our approach by comparing its performance with state-of-the-art methods and assessing its generalizability among different radiologists, introducing novel strategies to model radiologists' search patterns during CXR image diagnosis. Based on the radiologist's evaluation, MedGaze can generate human-like gaze sequences with a high focus on relevant regions over the CXR images. It sometimes also outperforms humans in terms of redundancy and randomness in the scanpaths.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Enhancing Radiological Diagnosis: A Collaborative Approach Integrating AI and Human Expertise for Visual Miss Correction
Authors:
Akash Awasthi,
Ngan Le,
Zhigang Deng,
Carol C. Wu,
Hien Van Nguyen
Abstract:
Human-AI collaboration to identify and correct perceptual errors in chest radiographs has not been previously explored. This study aimed to develop a collaborative AI system, CoRaX, which integrates eye gaze data and radiology reports to enhance diagnostic accuracy in chest radiology by pinpointing perceptual errors and refining the decision-making process. Using public datasets REFLACX and EGD-CX…
▽ More
Human-AI collaboration to identify and correct perceptual errors in chest radiographs has not been previously explored. This study aimed to develop a collaborative AI system, CoRaX, which integrates eye gaze data and radiology reports to enhance diagnostic accuracy in chest radiology by pinpointing perceptual errors and refining the decision-making process. Using public datasets REFLACX and EGD-CXR, the study retrospectively developed CoRaX, employing a large multimodal model to analyze image embeddings, eye gaze data, and radiology reports. The system's effectiveness was evaluated based on its referral-making process, the quality of referrals, and performance in collaborative diagnostic settings. CoRaX was tested on a simulated error dataset of 271 samples with 28% (93 of 332) missed abnormalities. The system corrected 21% (71 of 332) of these errors, leaving 7% (22 of 312) unresolved. The Referral-Usefulness score, indicating the accuracy of predicted regions for all true referrals, was 0.63 (95% CI 0.59, 0.68). The Total-Usefulness score, reflecting the diagnostic accuracy of CoRaX's interactions with radiologists, showed that 84% (237 of 280) of these interactions had a score above 0.40. In conclusion, CoRaX efficiently collaborates with radiologists to address perceptual errors across various abnormalities, with potential applications in the education and training of novice radiologists.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Waveform Learning under Phase Noise Impairment for Sub-THz Communications
Authors:
Dileepa Marasinghe,
Le Hang Nguyen,
Jafar Mohammadi,
Yejian Chen,
Thorsten Wild,
Nandana Rajatheva
Abstract:
The large untapped spectrum in sub-THz allows for ultra-high throughput communication to realize many seemingly impossible applications in 6G. Phase noise (PN) is one key hardware impairment, which is accentuated as we increase the frequency and bandwidth. Furthermore, the modest output power of the power amplifier demands limits on peak to average power ratio (PAPR) signal design. In this work, w…
▽ More
The large untapped spectrum in sub-THz allows for ultra-high throughput communication to realize many seemingly impossible applications in 6G. Phase noise (PN) is one key hardware impairment, which is accentuated as we increase the frequency and bandwidth. Furthermore, the modest output power of the power amplifier demands limits on peak to average power ratio (PAPR) signal design. In this work, we design a PN-robust, low PAPR single-carrier (SC) waveform by geometrically sha** the constellation and adapting the pulse sha** filter pair under practical PN modeling and adjacent channel leakage ratio (ACLR) constraints for a given excess bandwidth. We optimize the waveforms under conventional and state-of-the-art PN-aware demappers. Moreover, we introduce a neural-network (NN) demapper enhancing transceiver adaptability. We formulate the waveform optimization problem in its augmented Lagrangian form and use a back-propagation-inspired technique to obtain a design that is numerically robust to PN, while adhering to PAPR and ACLR constraints. The results substantiate the efficacy of the method, yielding up to 2.5 dB in the required Eb/N0 under stronger PN along with a PAPR reduction of 0.5 dB. Moreover, PAPR reductions up to 1.2 dB are possible with competitive BLER and SE performance in both low and high PN conditions.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Complex-Valued Kernel-based Phase and Amplitude Distortion Compensation in Parametrically Amplified Optical Links
Authors:
Long Hoang Nguyen,
Sonia Boscolo,
Stylianos Sygletos
Abstract:
We develop a complex-valued kernel-adaptive-filtering based method for phase and amplitude distortion compensation in cascaded fibre-optical parametric amplifier (FOPA) links. Our algorithm predicts and cancels both distortions induced by pump-phase modulation across all amplification stages, achieving more than an order of magnitude improvement in BER.
We develop a complex-valued kernel-adaptive-filtering based method for phase and amplitude distortion compensation in cascaded fibre-optical parametric amplifier (FOPA) links. Our algorithm predicts and cancels both distortions induced by pump-phase modulation across all amplification stages, achieving more than an order of magnitude improvement in BER.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
On Detecting Low-pass Graph Signals under Partial Observations
Authors:
Hoang-Son Nguyen,
Hoi-To Wai
Abstract:
The application of graph signal processing (GSP) on partially observed graph signals with missing nodes has gained attention recently. This is because processing data from large graphs are difficult, if not impossible due to the lack of availability of full observations. Many prior works have been developed using the assumption that the generated graph signals are smooth or low pass filtered. This…
▽ More
The application of graph signal processing (GSP) on partially observed graph signals with missing nodes has gained attention recently. This is because processing data from large graphs are difficult, if not impossible due to the lack of availability of full observations. Many prior works have been developed using the assumption that the generated graph signals are smooth or low pass filtered. This paper treats a blind graph filter detection problem under this context. We propose a detector that certifies whether the partially observed graph signals are low pass filtered, without requiring the graph topology knowledge. As an example application, our detector leads to a pre-screening method to filter out non low pass signals and thus robustify the prior GSP algorithms. We also bound the sample complexity of our detector in terms of the class of filters, number of observed nodes, etc. Numerical experiments verify the efficacy of our method.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Remote Sensing Data Assimilation with a Chained Hydrologic-hydraulic Model for Flood Forecasting
Authors:
Thanh Huy Nguyen,
Andrea Piacentini,
Sophie Ricci,
Ludovic Cassan,
Simon Munier,
Quentin Bonassies,
Raquel Rodriguez-Suquet
Abstract:
A chained hydrologic-hydraulic model is implemented using predicted runoff from a large-scale hydrologic model (namely ISBA-CTRIP) as inputs to local hydrodynamic models (TELEMAC-2D) to issue forecasts of water level and flood extent. The uncertainties in the hydrological forcing and in friction parameters are reduced by an Ensemble Kalman Filter that jointly assimilates in-situ water levels and f…
▽ More
A chained hydrologic-hydraulic model is implemented using predicted runoff from a large-scale hydrologic model (namely ISBA-CTRIP) as inputs to local hydrodynamic models (TELEMAC-2D) to issue forecasts of water level and flood extent. The uncertainties in the hydrological forcing and in friction parameters are reduced by an Ensemble Kalman Filter that jointly assimilates in-situ water levels and flood extent maps derived from remote sensing observations. The data assimilation framework is cycled in a real-time forecasting configuration. A cycle consists of a reanalysis and a forecast phase. Over the analysis, observations up to the present are assimilated. An ensemble is then initialized from the last analyzed states and issued forecasts for next 36 hr. Three strategies of forcing data for this forecast are investigated: (i) using CTRIP runoff for reanalysis and forecast, (ii) using observed discharge for analysis, then CTRIP runoff for forecast and (iii) using observed discharge for reanalysis and keep a persistent discharge value for forecast. It was shown that the data assimilation strategy provides a reliable reanalysis in hindcast mode. The combination of observed discharge and CTRIP runoff provides the most accurate results. For all strategies, the quality of the forecast decreases as the lead time increases. When the errors in CTRIP forcing are non-stationary, the forecast capability may be reduced. This work demonstrates that the forcing provided by a hydrologic model, while imperfect, can be efficiently used as input to a hydraulic model to issue reanalysis and forecasts, thanks to the assimilation of in-situ and remote sensing observations.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis
Authors:
Akash Awasthi,
Safwan Ahmad,
Bryant Le,
Hien Van Nguyen
Abstract:
In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial f…
▽ More
In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial for correcting mistakes by guiding radiologists to the accurate regions of interest, especially in the diagnosis of chest radiograph abnormalities. In response to this imperative, we propose a novel system designed to identify the primary intentions articulated by radiologists in their reports and the corresponding regions of interest in CXR images. This system seeks to elucidate the visual context underlying radiologists' textual findings, with the potential to rectify errors made by less experienced practitioners and direct them to precise regions of interest. Importantly, the proposed system can be instrumental in providing constructive feedback to inexperienced radiologists or junior residents in the hospital, bridging the gap in face-to-face communication. The system represents a valuable tool for enhancing diagnostic accuracy and fostering continuous learning within the medical community.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Voice EHR: Introducing Multimodal Audio Data for Health
Authors:
James Anibal,
Hannah Huth,
Ming Li,
Lindsey Hazen,
Yen Minh Lam,
Hang Nguyen,
Phuc Hong,
Michael Kleinman,
Shelley Ost,
Christopher Jackson,
Laura Sprabery,
Cheran Elangovan,
Balaji Krishnaiah,
Lee Akst,
Ioan Lina,
Iqbal Elyazar,
Lenny Ekwati,
Stefan Jansen,
Richard Nduwayezu,
Charisse Garcia,
Jeffrey Plum,
Jacqueline Brenner,
Miranda Song,
Emily Ricotta,
David Clifton
, et al. (3 additional authors not shown)
Abstract:
Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio d…
▽ More
Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. This application ultimately results in an audio electronic health record (voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and language with semantic meaning - compensating for the typical limitations of unimodal clinical datasets. This report introduces a consortium of partners for global work, presents the application used for data collection, and showcases the potential of informative voice EHR to advance the scalability and diversity of audio AI.
△ Less
Submitted 1 June, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrain
Authors:
Yang Bai,
Phuoc Thanh Tran Ngoc,
Huu Duoc Nguyen,
Duc Long Le,
Quang Huy Ha,
Kazuki Kai,
Yu Xiang See To,
Yaosheng Deng,
Jie Song,
Naoki Wakamiya,
Hirotaka Sato,
Masaki Ogura
Abstract:
Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable roboti…
▽ More
Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable robotic-like programmable control, and proposing a novel control algorithm for swarming. Although these creatures, called cyborg insects, have the ability to instinctively avoid collisions with neighbors and obstacles while adapting to complex terrains, there is a lack of literature on the control of multi-cyborg systems. This research gap is due to the difficulty in coordinating the movements of a cyborg system under the presence of insects' inherent individual variability in their reactions to control input. In response to this issue, we propose a novel swarm navigation algorithm addressing these challenges. The effectiveness of the algorithm is demonstrated through an experimental validation in which a cyborg swarm was successfully navigated through an unknown sandy field with obstacles and hills. This research contributes to the domain of swarm robotics and showcases the potential of integrating biological organisms with robotics and control theory to create more intelligent autonomous systems with real-world applications.
△ Less
Submitted 27 March, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Improved Soft-k-Means Clustering Algorithm for Balancing Energy Consumption in Wireless Sensor Networks
Authors:
Botao Zhu,
Ebrahim Bedeer,
Ha H. Nguyen,
Robert Barton,
Jerome Henry
Abstract:
Energy load balancing is an essential issue in designing wireless sensor networks (WSNs). Clustering techniques are utilized as energy-efficient methods to balance the network energy and prolong its lifetime. In this paper, we propose an improved soft-k-means (IS-k-means) clustering algorithm to balance the energy consumption of nodes in WSNs. First, we use the idea of ``clustering by fast search…
▽ More
Energy load balancing is an essential issue in designing wireless sensor networks (WSNs). Clustering techniques are utilized as energy-efficient methods to balance the network energy and prolong its lifetime. In this paper, we propose an improved soft-k-means (IS-k-means) clustering algorithm to balance the energy consumption of nodes in WSNs. First, we use the idea of ``clustering by fast search and find of density peaks'' (CFSFDP) and kernel density estimation (KDE) to improve the selection of the initial cluster centers of the soft k-means clustering algorithm. Then, we utilize the flexibility of the soft-k-means and reassign member nodes considering their membership probabilities at the boundary of clusters to balance the number of nodes per cluster. Furthermore, the concept of multi-cluster heads is employed to balance the energy consumption within clusters. {Extensive simulation results under different network scenarios demonstrate that for small-scale WSNs with single-hop transmission}, the proposed algorithm can postpone the first node death, the half of nodes death, and the last node death on average when compared to various clustering algorithms from the literature.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Early Flood Warning Using Satellite-Derived Convective System and Precipitation Data -- A Retrospective Case Study of Central Vietnam
Authors:
Tran-Vu La,
Thanh Huy Nguyen,
Patrick Matgen,
Marco Chini
Abstract:
This paper addresses the challenges of an early flood warning caused by complex convective systems (CSs), by using Low-Earth Orbit and Geostationary satellite data. We focus on a sequence of extreme events that took place in central Vietnam during October 2020, with a specific emphasis on the events leading up to the floods, i.e., those occurring before October 10th, 2020. In this critical phase,…
▽ More
This paper addresses the challenges of an early flood warning caused by complex convective systems (CSs), by using Low-Earth Orbit and Geostationary satellite data. We focus on a sequence of extreme events that took place in central Vietnam during October 2020, with a specific emphasis on the events leading up to the floods, i.e., those occurring before October 10th, 2020. In this critical phase, several hydrometeorological indicators could be identified thanks to an increasingly advanced and dense observation network composed of Earth Observation satellites, in particular those enabling the characterization and monitoring of a CS, in terms of low-temperature clouds and heavy rainfall. Himawari-8 images, both individually and in time-series, allow identifying and tracking convective clouds. This is complemented by the observation of heavy/violent rainfall through GPM IMERG data, as well as the detection of strong winds using radiometers and scatterometers. Collectively, these datasets, along with the estimated intensity and duration of the event from each source, form a comprehensive dataset detailing the intricate behaviors of CSs. All of these factors are significant contributors to the magnitude of flooding and the short-term dynamics anticipated in the studied region.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Assimilation of SWOT Altimetry and Sentinel-1 Flood Extent Observations for Flood Reanalysis -- A Proof-of-Concept
Authors:
Thanh Huy Nguyen,
Sophie Ricci,
Andrea Piacentini,
Charlotte Emery,
Raquel Rodriguez Suquet,
Santiago Peña Luque
Abstract:
In spite of astonishing advances and developments in remote sensing technologies, meeting the spatio-temporal requirements for flood hydrodynamic modeling remains a great challenge for Earth Observation. The assimilation of multi-source remote sensing data in 2D hydrodynamic models participates to overcome such a challenge. The recently launched Surface Water and Ocean Topography (SWOT) wide-swath…
▽ More
In spite of astonishing advances and developments in remote sensing technologies, meeting the spatio-temporal requirements for flood hydrodynamic modeling remains a great challenge for Earth Observation. The assimilation of multi-source remote sensing data in 2D hydrodynamic models participates to overcome such a challenge. The recently launched Surface Water and Ocean Topography (SWOT) wide-swath altimetry satellite provides a global coverage of water surface elevation at a high resolution. SWOT provides complementary observation to radar and optical images, increasing the opportunity to observe and monitor flood events. This research work focuses on the assimilation of 2D flood extent maps derived from Sentinel-1 C-SAR imagery data, and water surface elevation from SWOT as well as in-situ water level measurements. An Ensemble Kalman Filter (EnKF) with a joint state-parameter analysis is implemented on top of a 2D hydrodynamic TELEMAC-2D model to account for errors in roughness, input forcing and water depth in floodplain subdomains. The proposed strategy is carried out in an Observing System Simulation Experiment based on the 2021 flood event over the Garonne Marmandaise catchment. This work makes the most of the large volume of heterogeneous data from space for flood prediction in hindcast mode paves the way for nowcasting.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
User-Centric Beam Selection and Precoding Design for Coordinated Multiple-Satellite Systems
Authors:
Vu Nguyen Ha,
Duy H. N. Nguyen,
Juan C. -M. Duncan,
Jorge L. Gonzalez-Rios,
Juan A. Vasquez,
Geoffrey Eappen,
Luis M. Garces-Socarras,
Rakesh Palisetty,
Symeon Chatzinotas,
Bjorn Ottersten
Abstract:
This paper introduces a joint optimization framework for user-centric beam selection and linear precoding (LP) design in a coordinated multiple-satellite (CoMSat) system, employing a Digital-Fourier-Transform-based (DFT) beamforming (BF) technique. Regarding serving users at their target SINRs and minimizing the total transmit power, the scheme aims to efficiently determine satellites for users to…
▽ More
This paper introduces a joint optimization framework for user-centric beam selection and linear precoding (LP) design in a coordinated multiple-satellite (CoMSat) system, employing a Digital-Fourier-Transform-based (DFT) beamforming (BF) technique. Regarding serving users at their target SINRs and minimizing the total transmit power, the scheme aims to efficiently determine satellites for users to associate with and activate the best cluster of beams together with optimizing LP for every satellite-to-user transmission. These technical objectives are first framed as a complex mixed-integer programming (MIP) challenge. To tackle this, we reformulate it into a joint cluster association and LP design problem. Then, by theoretically analyzing the duality relationship between downlink and uplink transmissions, we develop an efficient iterative method to identify the optimal solution. Additionally, a simpler duality approach for rapid beam selection and LP design is presented for comparison purposes. Simulation results underscore the effectiveness of our proposed schemes across various settings.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Tensor Power Flow Formulations for Multidimensional Analyses in Distribution Systems
Authors:
Edgar Mauricio Salazar Duque,
Juan S. Giraldo,
Pedro P. Vergara,
Phuong H. Nguyen,
Han,
Slootweg
Abstract:
In this paper, we present two multidimensional power flow formulations based on a fixed-point iteration (FPI) algorithm to efficiently solve hundreds of thousands of power flows in distribution systems. The presented algorithms are the base for a new TensorPowerFlow (TPF) tool and shine for their simplicity, benefiting from multicore \gls{cpu} and \gls{gpu} parallelization. We also focus on the ma…
▽ More
In this paper, we present two multidimensional power flow formulations based on a fixed-point iteration (FPI) algorithm to efficiently solve hundreds of thousands of power flows in distribution systems. The presented algorithms are the base for a new TensorPowerFlow (TPF) tool and shine for their simplicity, benefiting from multicore \gls{cpu} and \gls{gpu} parallelization. We also focus on the mathematical convergence properties of the algorithm, showing that its unique solution is at the practical operational point, which is the solution of high-voltage and low-current. The proof is validated using numerical simulations showing the robustness of the FPI algorithm compared to the classical \gls{nr} approach. In the case study, a benchmark with different PF solution methods is performed, showing that for applications requiring a yearly simulation at 1-minute resolution the computation time is decreased by a factor of 164, compared to the NR in its sparse formulation.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Predictive Models based on Deep Learning Algorithms for Tensile Deformation of AlCoCuCrFeNi High-entropy alloy
Authors:
Hoang-Giang Nguyen,
Thanh-Dung Le
Abstract:
High-entropy alloys (HEAs) stand out between multi-component alloys due to their attractive microstructures and mechanical properties. In this investigation, molecular dynamics (MD) simulation and machine learning were used to ascertain the deformation mechanism of AlCoCuCrFeNi HEAs under the influence of temperature, strain rate, and grain sizes. First, the MD simulation shows that the yield stre…
▽ More
High-entropy alloys (HEAs) stand out between multi-component alloys due to their attractive microstructures and mechanical properties. In this investigation, molecular dynamics (MD) simulation and machine learning were used to ascertain the deformation mechanism of AlCoCuCrFeNi HEAs under the influence of temperature, strain rate, and grain sizes. First, the MD simulation shows that the yield stress decreases significantly as the strain and temperature increase. In other cases, changes in strain rate and grain size have less effect on mechanical properties than changes in strain and temperature. The alloys exhibited superplastic behavior under all test conditions. The deformity mechanism discloses that strain and temperature are the main sources of beginning strain, and the shear bands move along the uniaxial tensile axis inside the workpiece. Furthermore, the fast phase shift of inclusion under mild strain indicates the relative instability of the inclusion phase of HCP. Ultimately, the dislocation evolution mechanism shows that the dislocations are transported to free surfaces under increased strain when they nucleate around the grain boundary. Surprisingly, the ML prediction results also confirm the same characteristics as those confirmed from the MD simulation. Hence, the combination of MD and ML reinforces the confidence in the findings of mechanical characteristics of HEA. Consequently, this combination fills the gaps between MD and ML, which can significantly save time human power and cost to conduct real experiments for testing HEA deformation in practice.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
The Smooth Trajectory Estimator for LMB Filters
Authors:
Hoa Van Nguyen,
Tran Thien Dat Nguyen,
Changbeom Shim,
Marzhar Anuar
Abstract:
This paper proposes a smooth-trajectory estimator for the labelled multi-Bernoulli (LMB) filter by exploiting the special structure of the generalised labelled multi-Bernoulli (GLMB) filter. We devise a simple and intuitive approach to store the best association map when approximating the GLMB random finite set (RFS) to the LMB RFS. In particular, we construct a smooth-trajectory estimator (i.e.,…
▽ More
This paper proposes a smooth-trajectory estimator for the labelled multi-Bernoulli (LMB) filter by exploiting the special structure of the generalised labelled multi-Bernoulli (GLMB) filter. We devise a simple and intuitive approach to store the best association map when approximating the GLMB random finite set (RFS) to the LMB RFS. In particular, we construct a smooth-trajectory estimator (i.e., an estimator over the entire trajectories of labelled estimates) for the LMB filter based on the history of the best association map and all of the measurements up to the current time. Experimental results under two challenging scenarios demonstrate significant tracking accuracy improvements with negligible additional computational time compared to the conventional LMB filter. The source code is publicly available at https://tinyurl.com/ste-lmb, aimed at promoting advancements in MOT algorithms.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Distributed Multi-Object Tracking Under Limited Field of View Heterogeneous Sensors with Density Clustering
Authors:
Fei Chen,
Hoa Van Nguyen,
Alex S. Leong,
Sabita Panicker,
Robin Baker,
Damith C. Ranasinghe
Abstract:
We consider the problem of tracking multiple, unknown, and time-varying numbers of objects using a distributed network of heterogeneous sensors. In an effort to derive a formulation for practical settings, we consider limited and unknown sensor field-of-views (FoVs), sensors with limited local computational resources and communication channel capacity. The resulting distributed multi-object tracki…
▽ More
We consider the problem of tracking multiple, unknown, and time-varying numbers of objects using a distributed network of heterogeneous sensors. In an effort to derive a formulation for practical settings, we consider limited and unknown sensor field-of-views (FoVs), sensors with limited local computational resources and communication channel capacity. The resulting distributed multi-object tracking algorithm involves solving an NP-hard multidimensional assignment problem either optimally for small-size problems or sub-optimally for general practical problems. For general problems, we propose an efficient distributed multi-object tracking algorithm that performs track-to-track fusion using a clustering-based analysis of the state space transformed into a density space to mitigate the complexity of the assignment problem. The proposed algorithm can more efficiently group local track estimates for fusion than existing approaches. To ensure we achieve globally consistent identities for tracks across a network of nodes as objects move between FoVs, we develop a graph-based algorithm to achieve label consensus and minimise track segmentation. Numerical experiments with a synthetic and a real-world trajectory dataset demonstrate that our proposed method is significantly more computationally efficient than state-of-the-art solutions, achieving similar tracking accuracy and bandwidth requirements but with improved label consistency.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Physics-informed Graphical Neural Network for Power System State Estimation
Authors:
Quang-Ha Ngo,
Bang L. H. Nguyen,
Tuyen V. Vu,
Jianhua Zhang,
Tuan Ngo
Abstract:
State estimation is highly critical for accurately observing the dynamic behavior of the power grids and minimizing risks from cyber threats. However, existing state estimation methods encounter challenges in accurately capturing power system dynamics, primarily because of limitations in encoding the grid topology and sparse measurements. This paper proposes a physics-informed graphical learning s…
▽ More
State estimation is highly critical for accurately observing the dynamic behavior of the power grids and minimizing risks from cyber threats. However, existing state estimation methods encounter challenges in accurately capturing power system dynamics, primarily because of limitations in encoding the grid topology and sparse measurements. This paper proposes a physics-informed graphical learning state estimation method to address these limitations by leveraging both domain physical knowledge and a graph neural network (GNN). We employ a GNN architecture that can handle the graph-structured data of power systems more effectively than traditional data-driven methods. The physics-based knowledge is constructed from the branch current formulation, making the approach adaptable to both transmission and distribution systems. The validation results of three IEEE test systems show that the proposed method can achieve lower mean square error more than 20% than the conventional methods.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
Authors:
Shengkui Zhao,
Yukun Ma,
Chongjia Ni,
Chong Zhang,
Hao Wang,
Trung Hieu Nguyen,
Kun Zhou,
Jiaqi Yip,
Dianwen Ng,
Bin Ma
Abstract:
Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to…
▽ More
Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to model both long-range, coarse-scale dependencies and fine-scale recurrent patterns by integrating a recurrent module into the MossFormer framework. Instead of applying the recurrent neural networks (RNNs) that use traditional recurrent connections, we present a recurrent module based on a feedforward sequential memory network (FSMN), which is considered "RNN-free" recurrent network due to the ability to capture recurrent patterns without using recurrent connections. Our recurrent module mainly comprises an enhanced dilated FSMN block by using gated convolutional units (GCU) and dense connections. In addition, a bottleneck layer and an output layer are also added for controlling information flow. The recurrent module relies on linear projections and convolutions for seamless, parallel processing of the entire sequence. The integrated MossFormer2 hybrid model demonstrates remarkable enhancements over MossFormer and surpasses other state-of-the-art methods in WSJ0-2/3mix, Libri2Mix, and WHAM!/WHAMR! benchmarks.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
IncepSE: Leveraging InceptionTime's performance with Squeeze and Excitation mechanism in ECG analysis
Authors:
Tue Minh Cao,
Nhat Hong Tran,
Le Phi Nguyen,
Hieu Huy Pham,
Hung Thanh Nguyen
Abstract:
Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques tha…
▽ More
Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques that are aimed at tackling the formidable challenges of severe imbalance dataset PTB-XL and gradient corruption. By this means, we manage to set a new height for deep learning model in a supervised learning manner across the majority of tasks. Our model consistently surpasses InceptionTime by substantial margins compared to other state-of-the-arts in this domain, noticeably 0.013 AUROC score improvement in the "all" task, while also mitigating the inherent dataset fluctuations during training.
△ Less
Submitted 16 November, 2023;
originally announced December 2023.
-
Robust MRI Reconstruction by Smoothed Unrolling (SMUG)
Authors:
Shijun Liang,
Van Hoang Minh Nguyen,
**ghan Jia,
Ismail Alkhouri,
Sijia Liu,
Saiprasad Ravishankar
Abstract:
As the popularity of deep learning (DL) in the field of magnetic resonance imaging (MRI) continues to rise, recent research has indicated that DL-based MRI reconstruction models might be excessively sensitive to minor input disturbances, including worst-case additive perturbations. This sensitivity often leads to unstable, aliased images. This raises the question of how to devise DL techniques for…
▽ More
As the popularity of deep learning (DL) in the field of magnetic resonance imaging (MRI) continues to rise, recent research has indicated that DL-based MRI reconstruction models might be excessively sensitive to minor input disturbances, including worst-case additive perturbations. This sensitivity often leads to unstable, aliased images. This raises the question of how to devise DL techniques for MRI reconstruction that can be robust to train-test variations. To address this problem, we propose a novel image reconstruction framework, termed Smoothed Unrolling (SMUG), which advances a deep unrolling-based MRI reconstruction model using a randomized smoothing (RS)-based robust learning approach. RS, which improves the tolerance of a model against input noises, has been widely used in the design of adversarial defense approaches for image classification tasks. Yet, we find that the conventional design that applies RS to the entire DL-based MRI model is ineffective. In this paper, we show that SMUG and its variants address the above issue by customizing the RS process based on the unrolling architecture of a DL-based MRI reconstruction model. Compared to the vanilla RS approach, we show that SMUG improves the robustness of MRI reconstruction with respect to a diverse set of instability sources, including worst-case and random noise perturbations to input measurements, varying measurement sampling rates, and different numbers of unrolling steps. Furthermore, we theoretically analyze the robustness of our method in the presence of perturbations.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Explainable Severity ranking via pairwise n-hidden comparison: a case study of glaucoma
Authors:
Hong Nguyen,
Cuong V. Nguyen,
Shrikanth Narayanan,
Benjamin Y. Xu,
Michael Pazzani
Abstract:
Primary open-angle glaucoma (POAG) is a chronic and progressive optic nerve condition that results in an acquired loss of optic nerve fibers and potential blindness. The gradual onset of glaucoma results in patients progressively losing their vision without being consciously aware of the changes. To diagnose POAG and determine its severity, patients must undergo a comprehensive dilated eye examina…
▽ More
Primary open-angle glaucoma (POAG) is a chronic and progressive optic nerve condition that results in an acquired loss of optic nerve fibers and potential blindness. The gradual onset of glaucoma results in patients progressively losing their vision without being consciously aware of the changes. To diagnose POAG and determine its severity, patients must undergo a comprehensive dilated eye examination. In this work, we build a framework to rank, compare, and interpret the severity of glaucoma using fundus images. We introduce a siamese-based severity ranking using pairwise n-hidden comparisons. We additionally have a novel approach to explaining why a specific image is deemed more severe than others. Our findings indicate that the proposed severity ranking model surpasses traditional ones in terms of diagnostic accuracy and delivers improved saliency explanations.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Doubly 1-Bit Quantized Massive MIMO
Authors:
Italo Atzeni,
Antti Tölli,
Duy H. N. Nguyen,
A. Lee Swindlehurst
Abstract:
Enabling communications in the (sub-)THz band will call for massive multiple-input multiple-output (MIMO) arrays at either the transmit- or receive-side, or at both. To scale down the complexity and power consumption when operating across massive frequency and antenna dimensions, a sacrifice in the resolution of the digital-to-analog/analog-to-digital converters (DACs/ADCs) will be inevitable. In…
▽ More
Enabling communications in the (sub-)THz band will call for massive multiple-input multiple-output (MIMO) arrays at either the transmit- or receive-side, or at both. To scale down the complexity and power consumption when operating across massive frequency and antenna dimensions, a sacrifice in the resolution of the digital-to-analog/analog-to-digital converters (DACs/ADCs) will be inevitable. In this paper, we analyze the extreme scenario where both the transmit- and receive-side are equipped with fully digital massive MIMO arrays and 1-bit DACs/ADCs, which leads to a system with minimum radio-frequency complexity, cost, and power consumption. Building upon the Bussgang decomposition, we derive a tractable approximation of the mean squared error (MSE) between the transmitted data symbols and their soft estimates. Numerical results show that, despite its simplicity, a doubly 1-bit quantized massive MIMO system with very large antenna arrays can deliver an impressive performance in terms of MSE and symbol error rate.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
MPCNN: A Novel Matrix Profile Approach for CNN-based Sleep Apnea Classification
Authors:
Hieu X. Nguyen,
Duong V. Nguyen,
Hieu H. Pham,
Cuong D. Do
Abstract:
Sleep apnea (SA) is a significant respiratory condition that poses a major global health challenge. Previous studies have investigated several machine and deep learning models for electrocardiogram (ECG)-based SA diagnoses. Despite these advancements, conventional feature extractions derived from ECG signals, such as R-peaks and RR intervals, may fail to capture crucial information encompassed wit…
▽ More
Sleep apnea (SA) is a significant respiratory condition that poses a major global health challenge. Previous studies have investigated several machine and deep learning models for electrocardiogram (ECG)-based SA diagnoses. Despite these advancements, conventional feature extractions derived from ECG signals, such as R-peaks and RR intervals, may fail to capture crucial information encompassed within the complete PQRST segments. In this study, we propose an innovative approach to address this diagnostic gap by delving deeper into the comprehensive segments of the ECG signal. The proposed methodology draws inspiration from Matrix Profile algorithms, which generate an Euclidean distance profile from fixed-length signal subsequences. From this, we derived the Min Distance Profile (MinDP), Max Distance Profile (MaxDP), and Mean Distance Profile (MeanDP) based on the minimum, maximum, and mean of the profile distances, respectively. To validate the effectiveness of our approach, we use the modified LeNet-5 architecture as the primary CNN model, along with two existing lightweight models, BAFNet and SE-MSCNN, for ECG classification tasks. Our extensive experimental results on the PhysioNet Apnea-ECG dataset revealed that with the new feature extraction method, we achieved a per-segment accuracy up to 92.11 \% and a per-recording accuracy of 100\%. Moreover, it yielded the highest correlation compared to state-of-the-art methods, with a correlation coefficient of 0.989. By introducing a new feature extraction method based on distance relationships, we enhanced the performance of certain lightweight models, showing potential for home sleep apnea test (HSAT) and SA detection in IoT devices. The source code for this work is made publicly available in GitHub: https://github.com/vinuni-vishc/MPCNN-Sleep-Apnea.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
Constellation Sha** under Phase Noise Impairment for Sub-THz Communications
Authors:
Dileepa Marasinghe,
Le Hang Nguyen,
Jafar Mohammadi,
Yejian Chen,
Thorsten Wild,
Nandana Rajatheva
Abstract:
The large untapped spectrum in the sub-THz allows for ultra-high throughput communication to realize many seemingly impossible applications in 6G. One of the challenges in radio communications in sub-THz is the hardware impairments. Specifically, phase noise is one key hardware impairment, which is accentuated as we increase the frequency and bandwidth. Furthermore, the moderate output power of th…
▽ More
The large untapped spectrum in the sub-THz allows for ultra-high throughput communication to realize many seemingly impossible applications in 6G. One of the challenges in radio communications in sub-THz is the hardware impairments. Specifically, phase noise is one key hardware impairment, which is accentuated as we increase the frequency and bandwidth. Furthermore, the moderate output power of the sub-THz power amplifier demands limits on peak to average power ratio (PAPR) signal design. Single carrier frequency domain equalization (SC-FDE) has been identified as a suitable candidate for sub-THz, although some challenges such as phase noise and PAPR still remain to be tackled. In this work, we design a phase noise robust, modest PAPR SC waveform by geometrically sha** the constellation under practical conditions. We formulate the waveform optimization problem in its augmented Lagrangian form and use a back-propagation-inspired technique to obtain a constellation design that is numerically robust to phase noise, while maintaining a relatively low PAPR compared to the conventional waveforms.
△ Less
Submitted 21 March, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Exploding AI Power Use: an Opportunity to Rethink Grid Planning and Management
Authors:
Liuzixuan Lin,
Ra**i Wijayawardana,
Varsha Rao,
Hai Nguyen,
Wedan Emmanuel Gnibga,
Andrew A. Chien
Abstract:
The unprecedented rapid growth of computing demand for AI is projected to increase global annual datacenter (DC) growth from 7.2% to 11.3%. We project the 5-year AI DC demand for several power grids and assess whether they will allow desired AI growth (resource adequacy). If not, several "desperate measures" -- grid policies that enable more load growth and maintain grid reliability by sacrificing…
▽ More
The unprecedented rapid growth of computing demand for AI is projected to increase global annual datacenter (DC) growth from 7.2% to 11.3%. We project the 5-year AI DC demand for several power grids and assess whether they will allow desired AI growth (resource adequacy). If not, several "desperate measures" -- grid policies that enable more load growth and maintain grid reliability by sacrificing new DC reliability are considered.
We find that two DC hotspots -- EirGrid (Ireland) and Dominion (US) -- will have difficulty accommodating new DCs needed by the AI growth. In EirGrid, relaxing new DC reliability guarantees increases the power available to 1.6x--4.1x while maintaining 99.6% actual power availability for the new DCs, sufficient for the 5-year AI demand. In Dominion, relaxing reliability guarantees increases available DC capacity similarly (1.5x--4.6x) but not enough for the 5-year AI demand. New DCs only receive 89% power availability. Study of other US power grids -- SPP, CAISO, ERCOT -- shows that sufficient capacity exists for the projected AI load growth.
Our results suggest the need to rethink adequacy assessment and also grid planning and management. New research opportunities include coordinated planning, reliability models that incorporate load flexibility, and adaptive load abstractions.
△ Less
Submitted 30 April, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Shape-Sensitive Loss for Catheter and Guidewire Segmentation
Authors:
Chayun Kongtongvattana,
Baoru Huang,
**gxuan Kang,
Hoan Nguyen,
Olajide Olufemi,
Anh Nguyen
Abstract:
We introduce a shape-sensitive loss function for catheter and guidewire segmentation and utilize it in a vision transformer network to establish a new state-of-the-art result on a large-scale X-ray images dataset. We transform network-derived predictions and their corresponding ground truths into signed distance maps, thereby enabling any networks to concentrate on the essential boundaries rather…
▽ More
We introduce a shape-sensitive loss function for catheter and guidewire segmentation and utilize it in a vision transformer network to establish a new state-of-the-art result on a large-scale X-ray images dataset. We transform network-derived predictions and their corresponding ground truths into signed distance maps, thereby enabling any networks to concentrate on the essential boundaries rather than merely the overall contours. These SDMs are subjected to the vision transformer, efficiently producing high-dimensional feature vectors encapsulating critical image attributes. By computing the cosine similarity between these feature vectors, we gain a nuanced understanding of image similarity that goes beyond the limitations of traditional overlap-based measures. The advantages of our approach are manifold, ranging from scale and translation invariance to superior detection of subtle differences, thus ensuring precise localization and delineation of the medical instruments within the images. Comprehensive quantitative and qualitative analyses substantiate the significant enhancement in performance over existing baselines, demonstrating the promise held by our new shape-sensitive loss function for improving catheter and guidewire segmentation.
△ Less
Submitted 19 January, 2024; v1 submitted 18 November, 2023;
originally announced November 2023.
-
On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation
Authors:
Duy Minh Ho Nguyen,
Tan Ngoc Pham,
Nghiem Tuong Diep,
Nghi Quoc Phan,
Quang Pham,
Vinh Tong,
Binh T. Nguyen,
Ngan Hoang Le,
Nhat Ho,
Pengtao Xie,
Daniel Sonntag,
Mathias Niepert
Abstract:
Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for…
▽ More
Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on develo** better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Modeling Power Systems Dynamics with Symbolic Physics-Informed Neural Networks
Authors:
Huynh T. T. Tran,
Hieu T. Nguyen
Abstract:
In recent years, scientific machine learning, particularly physic-informed neural networks (PINNs), has introduced new innovative methods to understanding the differential equations that describe power system dynamics, providing a more efficient alternative to traditional methods. However, using a single neural network to capture patterns of all variables requires a large enough size of networks,…
▽ More
In recent years, scientific machine learning, particularly physic-informed neural networks (PINNs), has introduced new innovative methods to understanding the differential equations that describe power system dynamics, providing a more efficient alternative to traditional methods. However, using a single neural network to capture patterns of all variables requires a large enough size of networks, leading to a long time of training and still high computational costs. In this paper, we utilize the interfacing of PINNs with symbolic techniques to construct multiple single-output neural networks by taking the loss function apart and integrating it over the relevant domain. Also, we reweigh the factors of the components in the loss function to improve the performance of the network for instability systems. Our results show that the symbolic PINNs provide higher accuracy with significantly fewer parameters and faster training time. By using the adaptive weight method, the symbolic PINNs can avoid the vanishing gradient problem and numerical instability.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Label Space Partition Selection for Multi-Object Tracking Using Two-Layer Partitioning
Authors:
Ji Youn Lee,
Changbeom Shim,
Hoa Van Nguyen,
Tran Thien Dat Nguyen,
Hyun** Choi,
Youngho Kim
Abstract:
Estimating the trajectories of multi-objects poses a significant challenge due to data association ambiguity, which leads to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the i…
▽ More
Estimating the trajectories of multi-objects poses a significant challenge due to data association ambiguity, which leads to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the intersection of predicted measurements. Several geometry approaches have been used for label grou** since finding all intersected label pairs is clearly infeasible for large-scale tracking problems. This paper proposes an efficient implementation of label grou** for label-partitioned generalized labeled multi-Bernoulli filter framework using a secondary partitioning technique. This allows for parallel computation in the label graph indexing step, avoiding generating and eliminating duplicate comparisons. Additionally, we compare the performance of the proposed technique with several efficient spatial searching algorithms. The results demonstrate the superior performance of the proposed approach on large-scale data sets, enabling scalable trajectory estimation.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Vision and Language Navigation in the Real World via Online Visual Language Map**
Authors:
Chengguang Xu,
Hieu T. Nguyen,
Christopher Amato,
Lawson L. S. Wong
Abstract:
Navigating in unseen environments is crucial for mobile robots. Enhancing them with the ability to follow instructions in natural language will further improve navigation efficiency in unseen cases. However, state-of-the-art (SOTA) vision-and-language navigation (VLN) methods are mainly evaluated in simulation, neglecting the complex and noisy real world. Directly transferring SOTA navigation poli…
▽ More
Navigating in unseen environments is crucial for mobile robots. Enhancing them with the ability to follow instructions in natural language will further improve navigation efficiency in unseen cases. However, state-of-the-art (SOTA) vision-and-language navigation (VLN) methods are mainly evaluated in simulation, neglecting the complex and noisy real world. Directly transferring SOTA navigation policies trained in simulation to the real world is challenging due to the visual domain gap and the absence of prior knowledge about unseen environments. In this work, we propose a novel navigation framework to address the VLN task in the real world. Utilizing the powerful foundation models, the proposed framework includes four key components: (1) an LLMs-based instruction parser that converts the language instruction into a sequence of pre-defined macro-action descriptions, (2) an online visual-language mapper that builds a real-time visual-language map to maintain a spatial and semantic understanding of the unseen environment, (3) a language indexing-based localizer that grounds each macro-action description into a waypoint location on the map, and (4) a DD-PPO-based local controller that predicts the action. We evaluate the proposed pipeline on an Interbotix LoCoBot WX250 in an unseen lab environment. Without any fine-tuning, our pipeline significantly outperforms the SOTA VLN baseline in the real world.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
A Tutorial on Chirp Spread Spectrum for LoRaWAN: Basics and Key Advances
Authors:
Alireza Maleki,
Ha H. Nguyen,
Ebrahim Bedeer,
Robert Barton
Abstract:
Chirps spread spectrum (CSS) modulation is the heart of long-range (LoRa) modulation used in the context of long-range wide area network (LoRaWAN) in internet of things (IoT) scenarios. Despite being a proprietary technology owned by Semtech Corp., LoRa modulation has drawn much attention from the research and industry communities in recent years. However, to the best of our knowledge, a comprehen…
▽ More
Chirps spread spectrum (CSS) modulation is the heart of long-range (LoRa) modulation used in the context of long-range wide area network (LoRaWAN) in internet of things (IoT) scenarios. Despite being a proprietary technology owned by Semtech Corp., LoRa modulation has drawn much attention from the research and industry communities in recent years. However, to the best of our knowledge, a comprehensive tutorial, investigating the CSS modulation in the LoRaWAN application, is missing in the literature. Therefore, in the first part of this paper, we provide a thorough analysis and tutorial of CSS modulation modified by LoRa specifications, discussing various aspects such as signal generation, detection, error performance, and spectral characteristics. Moreover, a summary of key recent advances in the context of CSS modulation applications in IoT networks is presented in the second part of this paper under four main categories of transceiver configuration and design, data rate improvement, interference modeling, and synchronization algorithms.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Circular-Line Trajectory Tracking Controller for Mobile Robot using Multi-Pixy2 Sensors
Authors:
Xuan Quang Ngo,
Tri Duc Tran,
Huy Hung Nguyen,
Van Dong Nguyen,
Van Tu Duong,
Tan Tien Nguyen
Abstract:
This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed…
▽ More
This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed to show the effectiveness of the proposed method.
△ Less
Submitted 12 August, 2023;
originally announced September 2023.
-
SPGM: Prioritizing Local Features for enhanced speech separation performance
Authors:
Jia Qi Yip,
Shengkui Zhao,
Yukun Ma,
Chongjia Ni,
Chong Zhang,
Hao Wang,
Trung Hieu Nguyen,
Kun Zhou,
Dianwen Ng,
Eng Siong Chng,
Bin Ma
Abstract:
Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlap** chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we pro…
▽ More
Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlap** chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters. Model and weights are available at huggingface.co/yipjiaqi/spgm
△ Less
Submitted 10 March, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Are Soft Prompts Good Zero-shot Learners for Speech Recognition?
Authors:
Dianwen Ng,
Chong Zhang,
Ruixi Zhang,
Yukun Ma,
Fabian Ritter-Gutierrez,
Trung Hieu Nguyen,
Chongjia Ni,
Shengkui Zhao,
Eng Siong Chng,
Bin Ma
Abstract:
Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understa…
▽ More
Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). Our findings highlight their role as zero-shot learners in improving ASR performance but also make them vulnerable to malicious modifications. Soft prompts aid generalization but are not obligatory for inference. We also identify two primary roles of soft prompts: content refinement and noise information enhancement, which enhances robustness against background noise. Additionally, we propose an effective modification on noise prompts to show that they are capable of zero-shot learning on adapting to out-of-distribution noise environments.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech
Authors:
Titouan Parcollet,
Ha Nguyen,
Solene Evain,
Marcely Zanon Boito,
Adrien Pupier,
Salima Mdhaffar,
Hang Le,
Sina Alisamir,
Natalia Tomashenko,
Marco Dinarelli,
Shucong Zhang,
Alexandre Allauzen,
Maximin Coavoux,
Yannick Esteve,
Mickael Rouvier,
Jerome Goulian,
Benjamin Lecouteux,
Francois Portet,
Solange Rossato,
Fabien Ringeval,
Didier Schwab,
Laurent Besacier
Abstract:
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-…
▽ More
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training. Overall, the newly introduced models trained on 14,000 hours of French speech outperform multilingual and previous LeBenchmark SSL models across the benchmark but also required up to four times more energy for pre-training.
△ Less
Submitted 18 March, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
After-Fatigue Condition: A Novel Analysis Based on Surface EMG Signals
Authors:
Van Hieu Nguyen,
Gia Thien Luu,
Thien Van Luong,
Mai Xuan Trang,
Philippe Ravier,
Olivier Buttelli
Abstract:
This study introduces a novel muscle activation analysis based on surface electromyography (sEMG) signals to assess the muscle's after-fatigue condition. Previous studies have mainly focused on the before-fatigue and fatigue conditions. However, a comprehensive analysis of the after-fatigue condition has been overlooked. The proposed method analyzes muscle fatigue indicators at various maximal vol…
▽ More
This study introduces a novel muscle activation analysis based on surface electromyography (sEMG) signals to assess the muscle's after-fatigue condition. Previous studies have mainly focused on the before-fatigue and fatigue conditions. However, a comprehensive analysis of the after-fatigue condition has been overlooked. The proposed method analyzes muscle fatigue indicators at various maximal voluntary contraction (MVC) levels to compare the before-fatigue, fatigue, and after-fatigue conditions using amplitude-based, spectral-based, and muscle fiber conduction velocity (CV) parameters. In addition, the contraction time of each MVC level is also analyzed with the same indicators. The results show that in the after-fatigue condition, the muscle activation changes significantly in the ways such as higher CV, power spectral density shifting to the right, and longer contraction time until exhaustion compared to the before-fatigue and fatigue conditions. The results can provide a comprehensive and objective evaluation of muscle fatigue and recovery, which can be helpful in clinical diagnosis, rehabilitation, and sports performance.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
3D Transformer based on deformable patch location for differential diagnosis between Alzheimer's disease and Frontotemporal dementia
Authors:
Huy-Dung Nguyen,
Michaël Clément,
Boris Mansencal,
Pierrick Coupé
Abstract:
Alzheimer's disease and Frontotemporal dementia are common types of neurodegenerative disorders that present overlap** clinical symptoms, making their differential diagnosis very challenging. Numerous efforts have been done for the diagnosis of each disease but the problem of multi-class differential diagnosis has not been actively explored. In recent years, transformer-based models have demonst…
▽ More
Alzheimer's disease and Frontotemporal dementia are common types of neurodegenerative disorders that present overlap** clinical symptoms, making their differential diagnosis very challenging. Numerous efforts have been done for the diagnosis of each disease but the problem of multi-class differential diagnosis has not been actively explored. In recent years, transformer-based models have demonstrated remarkable success in various computer vision tasks. However, their use in disease diagnostic is uncommon due to the limited amount of 3D medical data given the large size of such models. In this paper, we present a novel 3D transformer-based architecture using a deformable patch location module to improve the differential diagnosis of Alzheimer's disease and Frontotemporal dementia. Moreover, to overcome the problem of data scarcity, we propose an efficient combination of various data augmentation techniques, adapted for training transformer-based models on 3D structural magnetic resonance imaging data. Finally, we propose to combine our transformer-based model with a traditional machine learning model using brain structure volumes to better exploit the available data. Our experiments demonstrate the effectiveness of the proposed approach, showing competitive results compared to state-of-the-art methods. Moreover, the deformable patch locations can be visualized, revealing the most relevant brain regions used to establish the diagnosis of each disease.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
A Convergence Predictor Model for Consensus-based Decentralised Energy Markets
Authors:
Parikshit Pareek,
L. P. Mohasha Isuru Sampath,
Hung D. Nguyen,
Eddy Y. S. Foo
Abstract:
This letter introduces a convergence prediction model (CPM) for decentralized market clearing mechanisms. The CPM serves as a tool to detect potential cyber-attacks that affect the convergence of the consensus mechanism during ongoing market clearing operations. In this study, we propose a successively elongating Bayesian logistic regression approach to model the probability of convergence of real…
▽ More
This letter introduces a convergence prediction model (CPM) for decentralized market clearing mechanisms. The CPM serves as a tool to detect potential cyber-attacks that affect the convergence of the consensus mechanism during ongoing market clearing operations. In this study, we propose a successively elongating Bayesian logistic regression approach to model the probability of convergence of real-time market mechanisms. The CPM utilizes net-power balance among all the prosumers/market participants as a feature for convergence prediction, enabling a low-dimensional model to operate efficiently for all the prosumers concurrently. The results highlight that the proposed CPM has achieved a net false rate of less than 0.01% for a stressed dataset.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Deployment and Analysis of Instance Segmentation Algorithm for In-field Grade Estimation of Sweetpotatoes
Authors:
Hoang M. Nguyen,
Sydney Gyurek,
Russell Mierop,
Kenneth V. Pecota,
Kylie LaGamba,
Michael Boyette,
G. Craig Yencho,
Cranos M. Williams,
Michael W. Kudenov
Abstract:
Shape estimation of sweetpotato (SP) storage roots is inherently challenging due to their varied size and shape characteristics. Even measuring "simple" metrics, such as length and width, requires significant time investments either directly in-field or afterward using automated graders. In this paper, we present the results of a model that can perform grading and provide yield estimates directly…
▽ More
Shape estimation of sweetpotato (SP) storage roots is inherently challenging due to their varied size and shape characteristics. Even measuring "simple" metrics, such as length and width, requires significant time investments either directly in-field or afterward using automated graders. In this paper, we present the results of a model that can perform grading and provide yield estimates directly in the field quicker than manual measurements. Detectron2, a library consisting of deep-learning object detection algorithms, was used to implement Mask R-CNN, an instance segmentation model. This model was deployed for in-field grade estimation of SPs and evaluated against an optical sorter. Storage roots from various clones imaged with a cellphone during trials between 2019 and 2020, were used in the model's training and validation to fine-tune a model to detect SPs. Our results showed that the model could distinguish individual SPs in various environmental conditions including variations in lighting and soil characteristics. RMSE for length, width, and weight, from the model compared to a commercial optical sorter, were 0.66 cm, 1.22 cm, and 74.73 g, respectively, while the RMSE of root counts per plot was 5.27 roots, with r^2 = 0.8. This phenoty** strategy has the potential enable rapid yield estimates in the field without the need for sophisticated and costly optical sorters and may be more readily deployed in environments with limited access to these kinds of resources or facilities.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Mental Workload Estimation with Electroencephalogram Signals by Combining Multi-Space Deep Models
Authors:
Hong-Hai Nguyen,
Ngumimi Karen Iyortsuun,
Seungwon Kim,
Hyung-Jeong Yang,
Soo-Hyung Kim
Abstract:
The human brain remains continuously active, whether an individual is working or at rest. Mental activity is a daily process, and if the brain becomes excessively active, known as overload, it can adversely affect human health. Recently, advancements in early prediction of mental health conditions have emerged, aiming to prevent serious consequences and enhance the overall quality of life. Consequ…
▽ More
The human brain remains continuously active, whether an individual is working or at rest. Mental activity is a daily process, and if the brain becomes excessively active, known as overload, it can adversely affect human health. Recently, advancements in early prediction of mental health conditions have emerged, aiming to prevent serious consequences and enhance the overall quality of life. Consequently, the estimation of mental status has garnered significant attention from diverse researchers due to its potential benefits. While various signals are employed to assess mental state, the electroencephalogram, containing extensive information about the brain, is widely utilized by researchers. In this paper, we categorize mental workload into three states (low, middle, and high) and estimate a continuum of mental workload levels. Our method leverages information from multiple spatial dimensions to achieve optimal results in mental estimation. For the time domain approach, we employ Temporal Convolutional Networks. In the frequency domain, we introduce a novel architecture based on combining residual blocks, termed the Multi-Dimensional Residual Block. The integration of these two domains yields significant results compared to individual estimates in each domain. Our approach achieved a 74.98% accuracy in the three-class classification, surpassing the provided data results at 69.00%. Specially, our method demonstrates efficacy in estimating continuous levels, evidenced by a corresponding Concordance Correlation Coefficient (CCC) result of 0.629. The combination of time and frequency domain analysis in our approach highlights the exciting potential to improve healthcare applications in the future.
△ Less
Submitted 12 March, 2024; v1 submitted 22 July, 2023;
originally announced August 2023.
-
Distributionally Robust Safety Filter for Learning-Based Control in Active Distribution Systems
Authors:
Hoang Tien Nguyen,
Dae-Hyun Choi
Abstract:
Operational constraint violations may occur when deep reinforcement learning (DRL) agents interact with real-world active distribution systems to learn their optimal policies during training. This letter presents a universal distributionally robust safety filter (DRSF) using which any DRL agent can reduce the constraint violations of distribution systems significantly during training while maintai…
▽ More
Operational constraint violations may occur when deep reinforcement learning (DRL) agents interact with real-world active distribution systems to learn their optimal policies during training. This letter presents a universal distributionally robust safety filter (DRSF) using which any DRL agent can reduce the constraint violations of distribution systems significantly during training while maintaining near-optimal solutions. The DRSF is formulated as a distributionally robust optimization problem with chance constraints of operational limits. This problem aims to compute near-optimal actions that are minimally modified from the optimal actions of DRL-based Volt/VAr control by leveraging the distribution system model, thereby providing constraint satisfaction guarantee with a probability level under the model uncertainty. The performance of the proposed DRSF is verified using the IEEE 33-bus and 123-bus systems.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Semantic enrichment towards efficient speech representations
Authors:
Gaëlle Laperrière,
Ha Nguyen,
Sahar Ghannay,
Bassam Jabaian,
Yannick Estève
Abstract:
Over the past few years, self-supervised learned speech representations have emerged as fruitful replacements for conventional surface representations when solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual models trained on massive textual data were introduced to encode language agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make profit from…
▽ More
Over the past few years, self-supervised learned speech representations have emerged as fruitful replacements for conventional surface representations when solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual models trained on massive textual data were introduced to encode language agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make profit from such textual models to enrich multilingual speech representations with language agnostic semantics. By aiming for better semantic extraction on a challenging Spoken Language Understanding task and in consideration with computation costs, this study investigates a specific in-domain semantic enrichment of the SAMU-XLSR model by specializing it on a small amount of transcribed data from the downstream task. In addition, we show the benefits of the use of same-domain French and Italian benchmarks for low-resource language portability and explore cross-domain capacities of the enriched SAMU-XLSR.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Fairness Enhancement of UAV Systems with Hybrid Active-Passive RIS
Authors:
Nhan Thanh Nguyen,
Van-Dinh Nguyen,
Hieu Van Nguyen,
Qingqing Wu,
Antti Tolli,
Symeon Chatzinotas,
Markku Juntti
Abstract:
We consider unmanned aerial vehicle (UAV)-enabled wireless systems where downlink communications between a multi-antenna UAV and multiple users are assisted by a hybrid active-passive reconfigurable intelligent surface (RIS). We aim at a fairness design of two typical UAV-enabled networks, namely the static-UAV network where the UAV is deployed at a fixed location to serve all users at the same ti…
▽ More
We consider unmanned aerial vehicle (UAV)-enabled wireless systems where downlink communications between a multi-antenna UAV and multiple users are assisted by a hybrid active-passive reconfigurable intelligent surface (RIS). We aim at a fairness design of two typical UAV-enabled networks, namely the static-UAV network where the UAV is deployed at a fixed location to serve all users at the same time, and the mobile-UAV network which employs the time division multiple access protocol. In both networks, our goal is to maximize the minimum rate among users through jointly optimizing the UAV's location/trajectory, transmit beamformer, and RIS coefficients. The resulting problems are highly nonconvex due to a strong coupling between the involved variables. We develop efficient algorithms based on block coordinate ascend and successive convex approximation to effectively solve these problems in an iterative manner. In particular, in the optimization of the mobile-UAV network, closed-form solutions to the transmit beamformer and RIS passive coefficients are derived. Numerical results show that a hybrid RIS equipped with only 4 active elements and a power budget of 0 dBm offers an improvement of 38%-63% in minimum rate, while that achieved by a passive RIS is only about 15%, with the same total number of elements.
△ Less
Submitted 20 September, 2023; v1 submitted 24 June, 2023;
originally announced June 2023.
-
Reducing Uncertainties of a Chained Hydrologic-hydraulic Model to Improve Flood Forecasting Using Multi-source Earth Observation Data
Authors:
Thanh Huy Nguyen,
Sophie Ricci,
Andrea Piacentini,
Quentin Bonassies,
Raquel Rodriguez Suquet,
Santiago Peña Luque,
Kevin Marlis,
Cédric David
Abstract:
The challenges in operational flood forecasting lie in producing reliable forecasts given constrained computational resources and within processing times that are compatible with near-real-time forecasting. Flood hydrodynamic models exploit observed data from gauge networks, e.g. water surface elevation (WSE) and/or discharge that describe the forcing time-series at the upstream and lateral bounda…
▽ More
The challenges in operational flood forecasting lie in producing reliable forecasts given constrained computational resources and within processing times that are compatible with near-real-time forecasting. Flood hydrodynamic models exploit observed data from gauge networks, e.g. water surface elevation (WSE) and/or discharge that describe the forcing time-series at the upstream and lateral boundary conditions of the model. A chained hydrologic-hydraulic model is thus interesting to allow extended lead time forecasts and overcome the limits of forecast when using only observed gauge measurements. This research work focuses on comprehensively reducing the uncertainties in the model parameters, hydraulic state and especially the forcing data in order to improve the overall flood reanalysis and forecast performance. It aims at assimilating two main complementary EO data sources, namely in-situ WSE and SAR-derived flood extent observations.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Waveforms for sub-THz 6G: Design Guidelines
Authors:
Muris Sarajlić,
Nuutti Tervo,
Aarno Pärssinen,
Le Hang Nguyen,
Hardy Halbauer,
Kilian Roth,
Vaidyanathan Kumar,
Tommy Svensson,
Ahmad Nimr,
Stephan Zeitz,
Meik Dör**haus,
Gerhard Fettweis
Abstract:
The projected sub-THz (100 - 300 GHz) part of the upcoming 6G standard will require a careful design of the waveform and choice of slot structure. Not only that the design of the physical layer for 6G will be driven by ambitious system performance requirements, but also hardware limitations, specific to sub-THz frequencies, pose a fundamental design constraint for the waveform. In this contributio…
▽ More
The projected sub-THz (100 - 300 GHz) part of the upcoming 6G standard will require a careful design of the waveform and choice of slot structure. Not only that the design of the physical layer for 6G will be driven by ambitious system performance requirements, but also hardware limitations, specific to sub-THz frequencies, pose a fundamental design constraint for the waveform. In this contribution, general guidelines for the waveform design are given, together with a non-exhaustive list of exemplary waveforms that can be used to meet the design requirements.
△ Less
Submitted 17 July, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
M^2UNet: MetaFormer Multi-scale Upsampling Network for Polyp Segmentation
Authors:
Quoc-Huy Trinh,
Nhat-Tan Bui,
Trong-Hieu Nguyen Mau,
Minh-Van Nguyen,
Hai-Minh Phan,
Minh-Triet Tran,
Hai-Dang Nguyen
Abstract:
Polyp segmentation has recently garnered significant attention, and multiple methods have been formulated to achieve commendable outcomes. However, these techniques often confront difficulty when working with the complex polyp foreground and their surrounding regions because of the nature of convolution operation. Besides, most existing methods forget to exploit the potential information from mult…
▽ More
Polyp segmentation has recently garnered significant attention, and multiple methods have been formulated to achieve commendable outcomes. However, these techniques often confront difficulty when working with the complex polyp foreground and their surrounding regions because of the nature of convolution operation. Besides, most existing methods forget to exploit the potential information from multiple decoder stages. To address this challenge, we suggest combining MetaFormer, introduced as a baseline for integrating CNN and Transformer, with UNet framework and incorporating our Multi-scale Upsampling block (MU). This simple module makes it possible to combine multi-level information by exploring multiple receptive field paths of the shallow decoder stage and then adding with the higher stage to aggregate better feature representation, which is essential in medical image segmentation. Taken all together, we propose MetaFormer Multi-scale Upsampling Network (M$^2$UNet) for the polyp segmentation task. Extensive experiments on five benchmark datasets demonstrate that our method achieved competitive performance compared with several previous methods.
△ Less
Submitted 1 September, 2023; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Dealing With Non-Gaussianity of SAR-derived Wet Surface Ratio for Flood Extent Representation Improvement
Authors:
Thanh Huy Nguyen,
Sophie Ricci,
Andrea Piacentini,
Ehouarn Simon,
Raquel Rodriguez Suquet,
Santiago Peña Luque
Abstract:
Owing to advances in data assimilation, notably Ensemble Kalman Filter (EnKF), flood simulation and forecast capabilities have greatly improved in recent years. The motivation of the research work is to reduce comprehensively the uncertainties in the model parameters, forcing and hydraulic state, and consequently improve the overall flood reanalysis and forecast capability, especially in the flood…
▽ More
Owing to advances in data assimilation, notably Ensemble Kalman Filter (EnKF), flood simulation and forecast capabilities have greatly improved in recent years. The motivation of the research work is to reduce comprehensively the uncertainties in the model parameters, forcing and hydraulic state, and consequently improve the overall flood reanalysis and forecast capability, especially in the floodplain. It aims at assimilating SAR-derived (typically from Sentinel-1 mission) flood extent observations, expressed in terms of wet surface ratio. The non-Gaussianity of the observation errors associated with the SAR flood observations violates a major hypothesis regarding the EnKF and jeopardizes the optimality of the filter analysis. Therefore, a special treatment of such non-Gaussianity with a Gaussian anamorphosis process is thus proposed. This strategy was validated and applied over the Garonne Marmandaise catchment (Southwest of France) represented with the TELEMAC-2D hydrodynamic model, focusing on a major flood event that occurred in December 2019. The assimilation of the SAR-derived wet surface ratio observations, in complement to the in-situ water surface elevations, is illustrated to consequentially improve the flood representation.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
One shot learning based drivers head movement identification using a millimetre wave radar sensor
Authors:
Hong Nhung Nguyen,
Seongwook Lee,
Tien Tung Nguyen,
Yong Hwa Kim
Abstract:
Concentration of drivers on traffic is a vital safety issue; thus, monitoring a driver being on road becomes an essential requirement. The key purpose of supervision is to detect abnormal behaviours of the driver and promptly send warnings to him her for avoiding incidents related to traffic accidents. In this paper, to meet the requirement, based on radar sensors applications, the authors first u…
▽ More
Concentration of drivers on traffic is a vital safety issue; thus, monitoring a driver being on road becomes an essential requirement. The key purpose of supervision is to detect abnormal behaviours of the driver and promptly send warnings to him her for avoiding incidents related to traffic accidents. In this paper, to meet the requirement, based on radar sensors applications, the authors first use a small sized millimetre wave radar installed at the steering wheel of the vehicle to collect signals from different head movements of the driver. The received signals consist of the reflection patterns that change in response to the head movements of the driver. Then, in order to distinguish these different movements, a classifier based on the measured signal of the radar sensor is designed. However, since the collected data set is not large, in this paper, the authors propose One shot learning to classify four cases of driver's head movements. The experimental results indicate that the proposed method can classify the four types of cases according to the various head movements of the driver with a high accuracy reaching up to 100. In addition, the classification performance of the proposed method is significantly better than that of the convolutional neural network model.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
ContriMix: Scalable stain color augmentation for domain generalization without domain labels in digital pathology
Authors:
Tan H. Nguyen,
Dinkar Juyal,
** Li,
Aaditya Prakash,
Shima Nofallah,
Chintan Shah,
Sai Chowdary Gullapally,
Limin Yu,
Michael Griffin,
Anand Sampat,
John Abel,
Justin Lee,
Amaro Taylor-Weiner
Abstract:
Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test…
▽ More
Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test time. Many color augmentation methods leverage domain labels to generate synthetic images. This approach causes three significant challenges to scaling such a model. Firstly, incorporating data from a new domain into deep-learning models trained on existing domain labels is not straightforward. Secondly, dependency on domain labels prevents the use of pathology images without domain labels to improve model performance. Finally, implementation of these methods becomes complicated when multiple domain labels (e.g., patient identification, medical center, etc) are associated with a single image. We introduce ContriMix, a novel domain label free stain color augmentation method based on DRIT++, a style-transfer method. Contrimix leverages sample stain color variation within a training minibatch and random mixing to extract content and attribute information from pathology images. This information can be used by a trained ContriMix model to create synthetic images to improve the performance of existing classifiers. ContriMix outperforms competing methods on the Camelyon17-WILDS dataset. Its performance is consistent across different slides in the test set while being robust to the color variation from rare substances in pathology images. We make our code and trained ContriMix models available for research use. The code for ContriMix can be found at https://gitlab.com/huutan86/contrimix
△ Less
Submitted 8 March, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.