-
HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Model
Authors:
Hieu T. Nguyen,
Yiwen Chen,
Vikram Voleti,
Varun Jampani,
Huaizu Jiang
Abstract:
We introduce HouseCrafter, a novel approach that can lift a floorplan into a complete large 3D indoor scene (e.g., a house). Our key insight is to adapt a 2D diffusion model, which is trained on web-scale images, to generate consistent multi-view color (RGB) and depth (D) images across different locations of the scene. Specifically, the RGB-D images are generated autoregressively in a batch-wise m…
▽ More
We introduce HouseCrafter, a novel approach that can lift a floorplan into a complete large 3D indoor scene (e.g., a house). Our key insight is to adapt a 2D diffusion model, which is trained on web-scale images, to generate consistent multi-view color (RGB) and depth (D) images across different locations of the scene. Specifically, the RGB-D images are generated autoregressively in a batch-wise manner along sampled locations based on the floorplan, where previously generated images are used as condition to the diffusion model to produce images at nearby locations. The global floorplan and attention design in the diffusion model ensures the consistency of the generated images, from which a 3D scene can be reconstructed. Through extensive evaluation on the 3D-Front dataset, we demonstrate that HouseCraft can generate high-quality house-scale 3D scenes. Ablation studies also validate the effectiveness of different design choices. We will release our code and model weights. Project page: https://neu-vi.github.io/houseCrafter/
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Forget but Recall: Incremental Latent Rectification in Continual Learning
Authors:
Nghia D. Nguyen,
Hieu Trung Nguyen,
Ang Li,
Hoang Pham,
Viet Anh Nguyen,
Khoa D. Doan
Abstract:
Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs). However, current DNNs suffer from catastrophic forgetting, which hinders remembering past knowledge. To mitigate this issue, existing Continual Learning (CL) approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks. This paper in…
▽ More
Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs). However, current DNNs suffer from catastrophic forgetting, which hinders remembering past knowledge. To mitigate this issue, existing Continual Learning (CL) approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks. This paper investigates an unexplored CL direction for incremental learning called Incremental Latent Rectification or ILR. In a nutshell, ILR learns to propagate with correction (or rectify) the representation from the current trained DNN backward to the representation space of the old task, where performing predictive decisions is easier. This rectification process only employs a chain of small representation map** networks, called rectifier units. Empirical experiments on several continual learning benchmarks, including CIFAR10, CIFAR100, and Tiny ImageNet, demonstrate the effectiveness and potential of this novel CL direction compared to existing representative CL methods.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Generative Conditional Distributions by Neural (Entropic) Optimal Transport
Authors:
Bao Nguyen,
Binh Nguyen,
Hieu Trung Nguyen,
Viet Anh Nguyen
Abstract:
Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our…
▽ More
Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our method relies on the minimax training of two neural networks: a generative network parametrizing the inverse cumulative distribution functions of the conditional distributions and another network parametrizing the conditional Kantorovich potential. To prevent overfitting, we regularize the objective function by penalizing the Lipschitz constant of the network output. Our experiments on real-world datasets show the effectiveness of our algorithm compared to state-of-the-art conditional distribution learning techniques. Our implementation can be found at https://github.com/nguyenngocbaocmt02/GENTLE.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Cold-start Recommendation by Personalized Embedding Region Elicitation
Authors:
Hieu Trung Nguyen,
Duy Nguyen,
Khoa Doan,
Viet Anh Nguyen
Abstract:
Rating elicitation is a success element for recommender systems to perform well at cold-starting, in which the systems need to recommend items to a newly arrived user with no prior knowledge about the user's preference. Existing elicitation methods employ a fixed set of items to learn the user's preference and then infer the users' preferences on the remaining items. Using a fixed seed set can lim…
▽ More
Rating elicitation is a success element for recommender systems to perform well at cold-starting, in which the systems need to recommend items to a newly arrived user with no prior knowledge about the user's preference. Existing elicitation methods employ a fixed set of items to learn the user's preference and then infer the users' preferences on the remaining items. Using a fixed seed set can limit the performance of the recommendation system since the seed set is unlikely optimal for all new users with potentially diverse preferences. This paper addresses this challenge using a 2-phase, personalized elicitation scheme. First, the elicitation scheme asks users to rate a small set of popular items in a ``burn-in'' phase. Second, it sequentially asks the user to rate adaptive items to refine the preference and the user's representation. Throughout the process, the system represents the user's embedding value not by a point estimate but by a region estimate. The value of information obtained by asking the user's rating on an item is quantified by the distance from the region center embedding space that contains with high confidence the true embedding value of the user. Finally, the recommendations are successively generated by considering the preference region of the user. We show that each subproblem in the elicitation scheme can be efficiently implemented. Further, we empirically demonstrate the effectiveness of the proposed method against existing rating-elicitation methods on several prominent datasets.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Explaining Graph Neural Networks via Structure-aware Interaction Index
Authors:
Ngoc Bui,
Hieu Trung Nguyen,
Viet Anh Nguyen,
Rex Ying
Abstract:
The Shapley value is a prominent tool for interpreting black-box machine learning models thanks to its strong theoretical foundation. However, for models with structured inputs, such as graph neural networks, existing Shapley-based explainability approaches either focus solely on node-wise importance or neglect the graph structure when perturbing the input instance. This paper introduces the Myers…
▽ More
The Shapley value is a prominent tool for interpreting black-box machine learning models thanks to its strong theoretical foundation. However, for models with structured inputs, such as graph neural networks, existing Shapley-based explainability approaches either focus solely on node-wise importance or neglect the graph structure when perturbing the input instance. This paper introduces the Myerson-Taylor interaction index that internalizes the graph structure into attributing the node values and the interaction values among nodes. Unlike the Shapley-based methods, the Myerson-Taylor index decomposes coalitions into components satisfying a pre-chosen connectivity criterion. We prove that the Myerson-Taylor index is the unique one that satisfies a system of five natural axioms accounting for graph structure and high-order interaction among nodes. Leveraging these properties, we propose Myerson-Taylor Structure-Aware Graph Explainer (MAGE), a novel explainer that uses the second-order Myerson-Taylor index to identify the most important motifs influencing the model prediction, both positively and negatively. Extensive experiments on various graph datasets and models demonstrate that our method consistently provides superior subgraph explanations compared to state-of-the-art methods.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
QSimPy: A Learning-centric Simulation Framework for Quantum Cloud Resource Management
Authors:
Hoa T. Nguyen,
Muhammad Usman,
Rajkumar Buyya
Abstract:
Quantum cloud computing is an emerging computing paradigm that allows seamless access to quantum hardware as cloud-based services. However, effective use of quantum resources is challenging and necessitates robust simulation frameworks for effective resource management design and evaluation. To address this need, we proposed QSimPy, a novel discrete-event simulation framework designed with the mai…
▽ More
Quantum cloud computing is an emerging computing paradigm that allows seamless access to quantum hardware as cloud-based services. However, effective use of quantum resources is challenging and necessitates robust simulation frameworks for effective resource management design and evaluation. To address this need, we proposed QSimPy, a novel discrete-event simulation framework designed with the main focus of facilitating learning-centric approaches for quantum resource management problems in cloud environments. Underpinned by extensibility, compatibility, and reusability principles, QSimPy provides a lightweight simulation environment based on SimPy, a well-known Python-based simulation engine for modeling dynamics of quantum cloud resources and task operations. We integrate the Gymnasium environment into our framework to support the creation of simulated environments for develo** and evaluating reinforcement learning-based techniques for optimizing quantum cloud resource management. The QSimPy framework encapsulates the operational intricacies of quantum cloud environments, supporting research in dynamic task allocation and optimization through DRL approaches. We also demonstrate the use of QSimPy in develo** reinforcement learning policies for quantum task placement problems, demonstrating its potential as a useful framework for future quantum cloud research.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Quantum Cloud Computing: A Review, Open Problems, and Future Directions
Authors:
Hoa T. Nguyen,
Prabhakar Krishnan,
Dilip Krishnaswamy,
Muhammad Usman,
Rajkumar Buyya
Abstract:
Quantum cloud computing is an emerging paradigm of computing that empowers quantum applications and their deployment on quantum computing resources without the need for a specialized environment to host and operate physical quantum computers. This paper reviews recent advances, identifies open problems, and proposes future directions in quantum cloud computing. It discusses the state-of-the-art qu…
▽ More
Quantum cloud computing is an emerging paradigm of computing that empowers quantum applications and their deployment on quantum computing resources without the need for a specialized environment to host and operate physical quantum computers. This paper reviews recent advances, identifies open problems, and proposes future directions in quantum cloud computing. It discusses the state-of-the-art quantum cloud advances, including the various cloud-based models, platforms, and recently developed technologies and software use cases. Furthermore, it discusses different aspects of the quantum cloud, including resource management, quantum serverless, security, and privacy problems. Finally, the paper examines open problems and proposes the future directions of quantum cloud computing, including potential opportunities and ongoing research in this emerging field.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
MSTAR: Multi-Scale Backbone Architecture Search for Timeseries Classification
Authors:
Tue M. Cao,
Nhat H. Tran,
Hieu H. Pham,
Hung T. Nguyen,
Le P. Nguyen
Abstract:
Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design an…
▽ More
Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design and yet not being able to reach the optimal architecture due to the uniqueness of each dataset. We overcome these challenges by proposing a novel multi-scale search space and a framework for Neural architecture search (NAS), which addresses both the problem of frequency and time resolution, discovering the suitable scale for a specific dataset. We further show that our model can serve as a backbone to employ a powerful Transformer module with both untrained and pre-trained weights. Our search space reaches the state-of-the-art performance on four datasets on four different domains while introducing more than ten highly fine-tuned models for each data.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Beyond explaining: XAI-based Adaptive Learning with SHAP Clustering for Energy Consumption Prediction
Authors:
Tobias Clement,
Hung Truong Thanh Nguyen,
Nils Kemmerzell,
Mohamed Abdelaal,
Davor Stjelja
Abstract:
This paper presents an approach integrating explainable artificial intelligence (XAI) techniques with adaptive learning to enhance energy consumption prediction models, with a focus on handling data distribution shifts. Leveraging SHAP clustering, our method provides interpretable explanations for model predictions and uses these insights to adaptively refine the model, balancing model complexity…
▽ More
This paper presents an approach integrating explainable artificial intelligence (XAI) techniques with adaptive learning to enhance energy consumption prediction models, with a focus on handling data distribution shifts. Leveraging SHAP clustering, our method provides interpretable explanations for model predictions and uses these insights to adaptively refine the model, balancing model complexity with predictive performance. We introduce a three-stage process: (1) obtaining SHAP values to explain model predictions, (2) clustering SHAP values to identify distinct patterns and outliers, and (3) refining the model based on the derived SHAP clustering characteristics. Our approach mitigates overfitting and ensures robustness in handling data distribution shifts. We evaluate our method on a comprehensive dataset comprising energy consumption records of buildings, as well as two additional datasets to assess the transferability of our approach to other domains, regression, and classification problems. Our experiments demonstrate the effectiveness of our approach in both task types, resulting in improved predictive performance and interpretable model explanations.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells
Authors:
Vinh Quoc Luu,
Duy Khanh Le,
Huy Thanh Nguyen,
Minh Thanh Nguyen,
Thinh Tien Nguyen,
Vinh Quang Dinh
Abstract:
Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. These challenges inhibit the development of more accurate and modern techniques to diagnose cancer relating to white blood cells. To address the first c…
▽ More
Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. These challenges inhibit the development of more accurate and modern techniques to diagnose cancer relating to white blood cells. To address the first challenge, a semi-supervised learning framework should be devised to efficiently capitalize on the scarcity of the dataset available. In this work, we address this issue by proposing a novel self-training pipeline with the incorporation of FixMatch. Self-training is a technique that utilizes the model trained on labeled data to generate pseudo-labels for the unlabeled data and then re-train on both of them. FixMatch is a consistency-regularization algorithm to enforce the model's robustness against variations in the input image. We discover that by incorporating FixMatch in the self-training pipeline, the performance improves in the majority of cases. Our performance achieved the best performance with the self-training scheme with consistency on DeepLab-V3 architecture and ResNet-50, reaching 90.69%, 87.37%, and 76.49% on Zheng 1, Zheng 2, and LISC datasets, respectively.
△ Less
Submitted 23 February, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
IncepSE: Leveraging InceptionTime's performance with Squeeze and Excitation mechanism in ECG analysis
Authors:
Tue Minh Cao,
Nhat Hong Tran,
Le Phi Nguyen,
Hieu Huy Pham,
Hung Thanh Nguyen
Abstract:
Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques tha…
▽ More
Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques that are aimed at tackling the formidable challenges of severe imbalance dataset PTB-XL and gradient corruption. By this means, we manage to set a new height for deep learning model in a supervised learning manner across the majority of tasks. Our model consistently surpasses InceptionTime by substantial margins compared to other state-of-the-arts in this domain, noticeably 0.013 AUROC score improvement in the "all" task, while also mitigating the inherent dataset fluctuations during training.
△ Less
Submitted 16 November, 2023;
originally announced December 2023.
-
A Tight Lower Bound for 3-Coloring Grids in the Online-LOCAL Model
Authors:
Yi-Jun Chang,
Gopinath Mishra,
Hung Thuan Nguyen,
Mingyang Yang,
Yu-Cheng Yeh
Abstract:
Recently, \citeauthor*{akbari2021locality}~(ICALP 2023) studied the locality of graph problems in distributed, sequential, dynamic, and online settings from a {unified} point of view. They designed a novel $O(\log n)$-locality deterministic algorithm for proper 3-coloring bipartite graphs in the $\mathsf{Online}$-$\mathsf{LOCAL}$ model. In this work, we establish the optimality of the algorithm by…
▽ More
Recently, \citeauthor*{akbari2021locality}~(ICALP 2023) studied the locality of graph problems in distributed, sequential, dynamic, and online settings from a {unified} point of view. They designed a novel $O(\log n)$-locality deterministic algorithm for proper 3-coloring bipartite graphs in the $\mathsf{Online}$-$\mathsf{LOCAL}$ model. In this work, we establish the optimality of the algorithm by showing a \textit{tight} deterministic $Ω(\log n)$ locality lower bound, which holds even on grids. To complement this result, we have the following additional results:
\begin{enumerate}
\item We show a higher and {tight} $Ω(\sqrt{n})$ lower bound for 3-coloring toroidal and cylindrical grids.
\item Considering the generalization of $3$-coloring bipartite graphs to $(k+1)$-coloring $k$-partite graphs, %where $k \geq 2$ is a constant,
we show that the problem also has $O(\log n)$ locality when the input is a $k$-partite graph that admits a \emph{locally inferable unique coloring}. This special class of $k$-partite graphs covers several fundamental graph classes such as $k$-trees and triangular grids. Moreover, for this special class of graphs, we show a {tight} $Ω(\log n)$ locality lower bound.
\item For general $k$-partite graphs with $k \geq 3$, we prove that the problem of $(2k-2)$-coloring $k$-partite graphs exhibits a locality of $Ω(n)$ in the $\onlineLOCAL$ model, matching the round complexity of the same problem in the $\LOCAL$ model recently shown by \citeauthor*{coiteux2023no}~(STOC 2024). Consequently, the problem of $(k+1)$-coloring $k$-partite graphs admits a locality lower bound of $Ω(n)$ when $k\geq 3$, contrasting sharply with the $Θ(\log n)$ locality for the case of $k=2$. \end{enumerate}
△ Less
Submitted 1 May, 2024; v1 submitted 3 December, 2023;
originally announced December 2023.
-
Utilizing Model Residuals to Identify Rental Properties of Interest: The Price Anomaly Score (PAS) and Its Application to Real-time Data in Manhattan
Authors:
Youssef Sultan,
Jackson C. Rafter,
Huyen T. Nguyen
Abstract:
Understanding whether a property is priced fairly hinders buyers and sellers since they usually do not have an objective viewpoint of the price distribution for the overall market of their interest. Drawing from data collected of all possible available properties for rent in Manhattan as of September 2023, this paper aims to strengthen our understanding of model residuals; specifically on machine…
▽ More
Understanding whether a property is priced fairly hinders buyers and sellers since they usually do not have an objective viewpoint of the price distribution for the overall market of their interest. Drawing from data collected of all possible available properties for rent in Manhattan as of September 2023, this paper aims to strengthen our understanding of model residuals; specifically on machine learning models which generalize for a majority of the distribution of a well-proportioned dataset. Most models generally perceive deviations from predicted values as mere inaccuracies, however this paper proposes a different vantage point: when generalizing to at least 75\% of the data-set, the remaining deviations reveal significant insights. To harness these insights, we introduce the Price Anomaly Score (PAS), a metric capable of capturing boundaries between irregularly predicted prices. By combining relative pricing discrepancies with statistical significance, the Price Anomaly Score (PAS) offers a multifaceted view of rental valuations. This metric allows experts to identify overpriced or underpriced properties within a dataset by aggregating PAS values, then fine-tuning upper and lower boundaries to any threshold to set indicators of choice.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Toward Reinforcement Learning-based Rectilinear Macro Placement Under Human Constraints
Authors:
Tuyen P. Le,
Hieu T. Nguyen,
Seungyeol Baek,
Taeyoun Kim,
Jungwoo Lee,
Seongjung Kim,
Hyun** Kim,
Misu Jung,
Daehoon Kim,
Seokyong Lee,
Daewoo Choi
Abstract:
Macro placement is a critical phase in chip design, which becomes more intricate when involving general rectilinear macros and layout areas. Furthermore, macro placement that incorporates human-like constraints, such as design hierarchy and peripheral bias, has the potential to significantly reduce the amount of additional manual labor required from designers. This study proposes a methodology tha…
▽ More
Macro placement is a critical phase in chip design, which becomes more intricate when involving general rectilinear macros and layout areas. Furthermore, macro placement that incorporates human-like constraints, such as design hierarchy and peripheral bias, has the potential to significantly reduce the amount of additional manual labor required from designers. This study proposes a methodology that leverages an approach suggested by Google's Circuit Training (G-CT) to provide a learning-based macro placer that not only supports placing rectilinear cases, but also adheres to crucial human-like design principles. Our experimental results demonstrate the effectiveness of our framework in achieving power-performance-area (PPA) metrics and in obtaining placements of high quality, comparable to those produced with human intervention. Additionally, our methodology shows potential as a generalized model to address diverse macro shapes and layout areas.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Vision and Language Navigation in the Real World via Online Visual Language Map**
Authors:
Chengguang Xu,
Hieu T. Nguyen,
Christopher Amato,
Lawson L. S. Wong
Abstract:
Navigating in unseen environments is crucial for mobile robots. Enhancing them with the ability to follow instructions in natural language will further improve navigation efficiency in unseen cases. However, state-of-the-art (SOTA) vision-and-language navigation (VLN) methods are mainly evaluated in simulation, neglecting the complex and noisy real world. Directly transferring SOTA navigation poli…
▽ More
Navigating in unseen environments is crucial for mobile robots. Enhancing them with the ability to follow instructions in natural language will further improve navigation efficiency in unseen cases. However, state-of-the-art (SOTA) vision-and-language navigation (VLN) methods are mainly evaluated in simulation, neglecting the complex and noisy real world. Directly transferring SOTA navigation policies trained in simulation to the real world is challenging due to the visual domain gap and the absence of prior knowledge about unseen environments. In this work, we propose a novel navigation framework to address the VLN task in the real world. Utilizing the powerful foundation models, the proposed framework includes four key components: (1) an LLMs-based instruction parser that converts the language instruction into a sequence of pre-defined macro-action descriptions, (2) an online visual-language mapper that builds a real-time visual-language map to maintain a spatial and semantic understanding of the unseen environment, (3) a language indexing-based localizer that grounds each macro-action description into a waypoint location on the map, and (4) a DD-PPO-based local controller that predicts the action. We evaluate the proposed pipeline on an Interbotix LoCoBot WX250 in an unseen lab environment. Without any fine-tuning, our pipeline significantly outperforms the SOTA VLN baseline in the real world.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Distributionally Robust Safety Filter for Learning-Based Control in Active Distribution Systems
Authors:
Hoang Tien Nguyen,
Dae-Hyun Choi
Abstract:
Operational constraint violations may occur when deep reinforcement learning (DRL) agents interact with real-world active distribution systems to learn their optimal policies during training. This letter presents a universal distributionally robust safety filter (DRSF) using which any DRL agent can reduce the constraint violations of distribution systems significantly during training while maintai…
▽ More
Operational constraint violations may occur when deep reinforcement learning (DRL) agents interact with real-world active distribution systems to learn their optimal policies during training. This letter presents a universal distributionally robust safety filter (DRSF) using which any DRL agent can reduce the constraint violations of distribution systems significantly during training while maintaining near-optimal solutions. The DRSF is formulated as a distributionally robust optimization problem with chance constraints of operational limits. This problem aims to compute near-optimal actions that are minimally modified from the optimal actions of DRL-based Volt/VAr control by leveraging the distribution system model, thereby providing constraint satisfaction guarantee with a probability level under the model uncertainty. The performance of the proposed DRSF is verified using the IEEE 33-bus and 123-bus systems.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
A negation detection assessment of GPTs: analysis with the xNot360 dataset
Authors:
Ha Thanh Nguyen,
Randy Goebel,
Francesca Toni,
Kostas Stathis,
Ken Satoh
Abstract:
Negation is a fundamental aspect of natural language, playing a critical role in communication and comprehension. Our study assesses the negation detection performance of Generative Pre-trained Transformer (GPT) models, specifically GPT-2, GPT-3, GPT-3.5, and GPT-4. We focus on the identification of negation in natural language using a zero-shot prediction approach applied to our custom xNot360 da…
▽ More
Negation is a fundamental aspect of natural language, playing a critical role in communication and comprehension. Our study assesses the negation detection performance of Generative Pre-trained Transformer (GPT) models, specifically GPT-2, GPT-3, GPT-3.5, and GPT-4. We focus on the identification of negation in natural language using a zero-shot prediction approach applied to our custom xNot360 dataset. Our approach examines sentence pairs labeled to indicate whether the second sentence negates the first. Our findings expose a considerable performance disparity among the GPT models, with GPT-4 surpassing its counterparts and GPT-3.5 displaying a marked performance reduction. The overall proficiency of the GPT models in negation detection remains relatively modest, indicating that this task pushes the boundaries of their natural language understanding capabilities. We not only highlight the constraints of GPT models in handling negation but also emphasize the importance of logical reliability in high-stakes domains such as healthcare, science, and law.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
A9 Intersection Dataset: All You Need for Urban 3D Camera-LiDAR Roadside Perception
Authors:
Walter Zimmer,
Christian Creß,
Huu Tung Nguyen,
Alois C. Knoll
Abstract:
Intelligent Transportation Systems (ITS) allow a drastic expansion of the visibility range and decrease occlusions for autonomous driving. To obtain accurate detections, detailed labeled sensor data for training is required. Unfortunately, high-quality 3D labels of LiDAR point clouds from the infrastructure perspective of an intersection are still rare. Therefore, we provide the A9 Intersection Da…
▽ More
Intelligent Transportation Systems (ITS) allow a drastic expansion of the visibility range and decrease occlusions for autonomous driving. To obtain accurate detections, detailed labeled sensor data for training is required. Unfortunately, high-quality 3D labels of LiDAR point clouds from the infrastructure perspective of an intersection are still rare. Therefore, we provide the A9 Intersection Dataset, which consists of labeled LiDAR point clouds and synchronized camera images. Here, we recorded the sensor output from two roadside cameras and LiDARs mounted on intersection gantry bridges. The point clouds were labeled in 3D by experienced annotators. Furthermore, we provide calibration data between all sensors, which allow the projection of the 3D labels into the camera images and an accurate data fusion. Our dataset consists of 4.8k images and point clouds with more than 57.4k manually labeled 3D boxes. With ten object classes, it has a high diversity of road users in complex driving maneuvers, such as left and right turns, overtaking, and U-turns. In experiments, we provided multiple baselines for the perception tasks. Overall, our dataset is a valuable contribution to the scientific community to perform complex 3D camera-LiDAR roadside perception tasks. Find data, code, and more information at https://a9-dataset.com.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
In-context Cross-Density Adaptation on Noisy Mammogram Abnormalities Detection
Authors:
Huy T. Nguyen,
Thinh B. Lam,
Quan D. D. Tran,
Minh T. Nguyen,
Dat T. Chung,
Vinh Q. Dinh
Abstract:
This paper investigates the impact of breast density distribution on the generalization performance of deep-learning models on mammography images using the VinDr-Mammo dataset. We explore the use of domain adaptation techniques, specifically Domain Adaptive Object Detection (DAOD) with the Noise Latent Transferability Exploration (NLTE) framework, to improve model performance across breast densiti…
▽ More
This paper investigates the impact of breast density distribution on the generalization performance of deep-learning models on mammography images using the VinDr-Mammo dataset. We explore the use of domain adaptation techniques, specifically Domain Adaptive Object Detection (DAOD) with the Noise Latent Transferability Exploration (NLTE) framework, to improve model performance across breast densities under noisy labeling circumstances. We propose a robust augmentation framework to bridge the domain gap between the source and target inside a dataset. Our results show that DAOD-based methods, along with the proposed augmentation framework, can improve the generalization performance of deep-learning models (+5% overall mAP improvement approximately in our experimental results compared to commonly used detection models). This paper highlights the importance of domain adaptation techniques in medical imaging, particularly in the context of breast density distribution, which is critical in mammography.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
InfraDet3D: Multi-Modal 3D Object Detection based on Roadside Infrastructure Camera and LiDAR Sensors
Authors:
Walter Zimmer,
Joseph Birkner,
Marcel Brucker,
Huu Tung Nguyen,
Stefan Petrovski,
Bohan Wang,
Alois C. Knoll
Abstract:
Current multi-modal object detection approaches focus on the vehicle domain and are limited in the perception range and the processing capabilities. Roadside sensor units (RSUs) introduce a new domain for perception systems and leverage altitude to observe traffic. Cameras and LiDARs mounted on gantry bridges increase the perception range and produce a full digital twin of the traffic. In this wor…
▽ More
Current multi-modal object detection approaches focus on the vehicle domain and are limited in the perception range and the processing capabilities. Roadside sensor units (RSUs) introduce a new domain for perception systems and leverage altitude to observe traffic. Cameras and LiDARs mounted on gantry bridges increase the perception range and produce a full digital twin of the traffic. In this work, we introduce InfraDet3D, a multi-modal 3D object detector for roadside infrastructure sensors. We fuse two LiDARs using early fusion and further incorporate detections from monocular cameras to increase the robustness and to detect small objects. Our monocular 3D detection module uses HD maps to ground object yaw hypotheses, improving the final perception results. The perception framework is deployed on a real-world intersection that is part of the A9 Test Stretch in Munich, Germany. We perform several ablation studies and experiments and show that fusing two LiDARs with two cameras leads to an improvement of +1.90 mAP compared to a camera-only solution. We evaluate our results on the A9 infrastructure dataset and achieve 68.48 mAP on the test set. The dataset and code will be available at https://a9-dataset.com to allow the research community to further improve the perception results and make autonomous driving safer.
△ Less
Submitted 29 April, 2023;
originally announced May 2023.
-
Experimental Impact Analysis of Cyberattacks in Power Systems using Digital Real-Time Testbeds
Authors:
Kalinath Katuri,
Ioannis Zografopoulos,
Ha Thi Nguyen,
Charalambos Konstantinou
Abstract:
Smart grid advancements and the increased integration of digital devices have transformed the existing power grid into a cyber-physical energy system. This resha** of the current power system can make it vulnerable to cyberattacks, which could cause irreversible damage to the energy infrastructure resulting in the loss of power, equipment damage, etc. Constant threats emphasize the importance of…
▽ More
Smart grid advancements and the increased integration of digital devices have transformed the existing power grid into a cyber-physical energy system. This resha** of the current power system can make it vulnerable to cyberattacks, which could cause irreversible damage to the energy infrastructure resulting in the loss of power, equipment damage, etc. Constant threats emphasize the importance of cybersecurity investigations. At the same time, develo** cyber-physical system (CPS) simulation testbeds is crucial for vulnerability assessment and the implementation and validation of security solutions. In this paper, two separate real-time CPS testbeds are developed based on the availability of local research facilities for impact analysis of denial-of-service (DoS) attacks on microgrids. The two configurations are implemented using two different digital real-time simulator systems, one using the real-time digital simulator (RTDS) with a hardware-in-the-loop (HIL) setup and the other one using OPAL-RT with ExataCPS to emulate the cyber-layer infrastructure. Both testbeds demonstrate the impact of DoS attacks on microgrid control and protection operation.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Evaluating the impact of an explainable machine learning system on the interobserver agreement in chest radiograph interpretation
Authors:
Hieu H. Pham,
Ha Q. Nguyen,
Hieu T. Nguyen,
Linh T. Le,
Khanh Lam
Abstract:
We conducted a prospective study to measure the clinical impact of an explainable machine learning system on interobserver agreement in chest radiograph interpretation. The AI system, which we call as it VinDr-CXR when used as a diagnosis-supporting tool, significantly improved the agreement between six radiologists with an increase of 1.5% in mean Fleiss' Kappa. In addition, we also observed that…
▽ More
We conducted a prospective study to measure the clinical impact of an explainable machine learning system on interobserver agreement in chest radiograph interpretation. The AI system, which we call as it VinDr-CXR when used as a diagnosis-supporting tool, significantly improved the agreement between six radiologists with an increase of 1.5% in mean Fleiss' Kappa. In addition, we also observed that, after the radiologists consulted AI's suggestions, the agreement between each radiologist and the system was remarkably increased by 3.3% in mean Cohen's Kappa. This work has been accepted for publication in IEEE Access and this paper is our short version submitted to the Midwest Machine Learning Symposium (MMLS 2023), Chicago, IL, USA.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
iQuantum: A Case for Modeling and Simulation of Quantum Computing Environments
Authors:
Hoa T. Nguyen,
Muhammad Usman,
Rajkumar Buyya
Abstract:
Today's quantum computers are primarily accessible through the cloud and potentially shifting to the edge network in the future. With the rapid advancement and proliferation of quantum computing research worldwide, there has been a considerable increase in demand for using cloud-based quantum computation resources. This demand has highlighted the need for designing efficient and adaptable resource…
▽ More
Today's quantum computers are primarily accessible through the cloud and potentially shifting to the edge network in the future. With the rapid advancement and proliferation of quantum computing research worldwide, there has been a considerable increase in demand for using cloud-based quantum computation resources. This demand has highlighted the need for designing efficient and adaptable resource management strategies and service models for quantum computing. However, the limited quantity, quality, and accessibility of quantum resources pose significant challenges to practical research in quantum software and systems. To address these challenges, we propose iQuantum, a first-of-its-kind simulation toolkit that can model hybrid quantum-classical computing environments for prototy** and evaluating system design and scheduling algorithms. This paper presents the quantum computing system model, architectural design, proof-of-concept implementation, potential use cases, and future development of iQuantum. Our proposed iQuantum simulator is anticipated to boost research in quantum software and systems, particularly in the creation and evaluation of policies and algorithms for resource management, job scheduling, and hybrid quantum-classical task orchestration in quantum computing environments integrating edge and cloud resources.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
A Quantum Neural Network Regression for Modeling Lithium-ion Battery Capacity Degradation
Authors:
Anh Phuong Ngo,
Nhat Le,
Hieu T. Nguyen,
Abdullah Eroglu,
Duong T. Nguyen
Abstract:
Given the high power density low discharge rate and decreasing cost rechargeable lithium-ion batteries LiBs have found a wide range of applications such as power grid level storage systems electric vehicles and mobile devices. Develo** a framework to accurately model the nonlinear degradation process of LiBs which is indeed a supervised learning problem becomes an important research topic. This…
▽ More
Given the high power density low discharge rate and decreasing cost rechargeable lithium-ion batteries LiBs have found a wide range of applications such as power grid level storage systems electric vehicles and mobile devices. Develo** a framework to accurately model the nonlinear degradation process of LiBs which is indeed a supervised learning problem becomes an important research topic. This paper presents a classical-quantum hybrid machine learning approach to capture the LiB degradation model that assesses battery cell life loss from operating profiles. Our work is motivated by recent advances in quantum computers as well as the similarity between neural networks and quantum circuits. Similar to adjusting weight parameters in conventional neural networks the parameters of the quantum circuit namely the qubits degree of freedom can be tuned to learn a nonlinear function in a supervised learning fashion. As a proof of concept paper our obtained numerical results with the battery dataset provided by NASA demonstrate the ability of the quantum neural networks in modeling the nonlinear relationship between the degraded capacity and the operating cycles. We also discuss the potential advantage of the quantum approach compared to conventional neural networks in classical computers in dealing with massive data especially in the context of future penetration of EVs and energy storage.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Improving performance of real-time full-band blind packet-loss concealment with predictive network
Authors:
Viet-Anh Nguyen,
Anh H. T. Nguyen,
Andy W. H. Khong
Abstract:
Packet loss concealment (PLC) is a tool for enhancing speech degradation caused by poor network conditions or underflow/overflow in audio processing pipelines. We propose a real-time recurrent method that leverages previous outputs to mitigate artefact of lost packets without the prior knowledge of loss mask. The proposed full-band recurrent network (FRN) model operates at 48 kHz, which is suitabl…
▽ More
Packet loss concealment (PLC) is a tool for enhancing speech degradation caused by poor network conditions or underflow/overflow in audio processing pipelines. We propose a real-time recurrent method that leverages previous outputs to mitigate artefact of lost packets without the prior knowledge of loss mask. The proposed full-band recurrent network (FRN) model operates at 48 kHz, which is suitable for high-quality telecommunication applications. Experiment results highlight the superiority of FRN over an offline non-causal baseline and a top performer in a recent PLC challenge.
△ Less
Submitted 12 May, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Interference Cancellation GAN Framework for Dynamic Channels
Authors:
Hung T. Nguyen,
Steven Bottone,
Kwang Taik Kim,
Mung Chiang,
H. Vincent Poor
Abstract:
Symbol detection is a fundamental and challenging problem in modern communication systems, e.g., multiuser multiple-input multiple-output (MIMO) setting. Iterative Soft Interference Cancellation (SIC) is a state-of-the-art method for this task and recently motivated data-driven neural network models, e.g. DeepSIC, that can deal with unknown non-linear channels. However, these neural network models…
▽ More
Symbol detection is a fundamental and challenging problem in modern communication systems, e.g., multiuser multiple-input multiple-output (MIMO) setting. Iterative Soft Interference Cancellation (SIC) is a state-of-the-art method for this task and recently motivated data-driven neural network models, e.g. DeepSIC, that can deal with unknown non-linear channels. However, these neural network models require thorough timeconsuming training of the networks before applying, and is thus not readily suitable for highly dynamic channels in practice. We introduce an online training framework that can swiftly adapt to any changes in the channel. Our proposed framework unifies the recent deep unfolding approaches with the emerging generative adversarial networks (GANs) to capture any changes in the channel and quickly adjust the networks to maintain the top performance of the model. We demonstrate that our framework significantly outperforms recent neural network models on highly dynamic channels and even surpasses those on the static channel in our experiments.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
An Accurate and Explainable Deep Learning System Improves Interobserver Agreement in the Interpretation of Chest Radiograph
Authors:
Hieu H. Pham,
Ha Q. Nguyen,
Hieu T. Nguyen,
Linh T. Le,
Lam Khanh
Abstract:
Recent artificial intelligence (AI) algorithms have achieved radiologist-level performance on various medical classification tasks. However, only a few studies addressed the localization of abnormal findings from CXR scans, which is essential in explaining the image-level classification to radiologists. We introduce in this paper an explainable deep learning system called VinDr-CXR that can classi…
▽ More
Recent artificial intelligence (AI) algorithms have achieved radiologist-level performance on various medical classification tasks. However, only a few studies addressed the localization of abnormal findings from CXR scans, which is essential in explaining the image-level classification to radiologists. We introduce in this paper an explainable deep learning system called VinDr-CXR that can classify a CXR scan into multiple thoracic diseases and, at the same time, localize most types of critical findings on the image. VinDr-CXR was trained on 51,485 CXR scans with radiologist-provided bounding box annotations. It demonstrated a comparable performance to experienced radiologists in classifying 6 common thoracic diseases on a retrospective validation set of 3,000 CXR scans, with a mean area under the receiver operating characteristic curve (AUROC) of 0.967 (95% confidence interval [CI]: 0.958-0.975). The VinDr-CXR was also externally validated in independent patient cohorts and showed its robustness. For the localization task with 14 types of lesions, our free-response receiver operating characteristic (FROC) analysis showed that the VinDr-CXR achieved a sensitivity of 80.2% at the rate of 1.0 false-positive lesion identified per scan. A prospective study was also conducted to measure the clinical impact of the VinDr-CXR in assisting six experienced radiologists. The results indicated that the proposed system, when used as a diagnosis supporting tool, significantly improved the agreement between radiologists themselves with an increase of 1.5% in mean Fleiss' Kappa. We also observed that, after the radiologists consulted VinDr-CXR's suggestions, the agreement between each of them and the system was remarkably increased by 3.3% in mean Cohen's Kappa.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
Slice-level Detection of Intracranial Hemorrhage on CT Using Deep Descriptors of Adjacent Slices
Authors:
Dat T. Ngo,
Thao T. B. Nguyen,
Hieu T. Nguyen,
Dung B. Nguyen,
Ha Q. Nguyen,
Hieu H. Pham
Abstract:
The rapid development in representation learning techniques such as deep neural networks and the availability of large-scale, well-annotated medical imaging datasets have to a rapid increase in the use of supervised machine learning in the 3D medical image analysis and diagnosis. In particular, deep convolutional neural networks (D-CNNs) have been key players and were adopted by the medical imagin…
▽ More
The rapid development in representation learning techniques such as deep neural networks and the availability of large-scale, well-annotated medical imaging datasets have to a rapid increase in the use of supervised machine learning in the 3D medical image analysis and diagnosis. In particular, deep convolutional neural networks (D-CNNs) have been key players and were adopted by the medical imaging community to assist clinicians and medical experts in disease diagnosis and treatment. However, training and inferencing deep neural networks such as D-CNN on high-resolution 3D volumes of Computed Tomography (CT) scans for diagnostic tasks pose formidable computational challenges. This challenge raises the need of develo** deep learning-based approaches that are robust in learning representations in 2D images, instead 3D scans. In this work, we propose for the first time a new strategy to train \emph{slice-level} classifiers on CT scans based on the descriptors of the adjacent slices along the axis. In particular, each of which is extracted through a convolutional neural network (CNN). This method is applicable to CT datasets with per-slice labels such as the RSNA Intracranial Hemorrhage (ICH) dataset, which aims to predict the presence of ICH and classify it into 5 different sub-types. We obtain a single model in the top 4% best-performing solutions of the RSNA ICH challenge, where model ensembles are allowed. Experiments also show that the proposed method significantly outperforms the baseline model on CQ500. The proposed method is general and can be applied to other 3D medical diagnosis tasks such as MRI imaging. To encourage new advances in the field, we will make our codes and pre-trained model available upon acceptance of the paper.
△ Less
Submitted 17 April, 2023; v1 submitted 5 August, 2022;
originally announced August 2022.
-
A Study on Self-Supervised Object Detection Pretraining
Authors:
Trung Dang,
Simon Kornblith,
Huy Thong Nguyen,
Peter Chin,
Maryam Khademi
Abstract:
In this work, we study different approaches to self-supervised pretraining of object detection models. We first design a general framework to learn a spatially consistent dense representation from an image, by randomly sampling and projecting boxes to each augmented view and maximizing the similarity between corresponding box features. We study existing design choices in the literature, such as bo…
▽ More
In this work, we study different approaches to self-supervised pretraining of object detection models. We first design a general framework to learn a spatially consistent dense representation from an image, by randomly sampling and projecting boxes to each augmented view and maximizing the similarity between corresponding box features. We study existing design choices in the literature, such as box generation, feature extraction strategies, and using multiple views inspired by its success on instance-level image representation learning techniques. Our results suggest that the method is robust to different choices of hyperparameters, and using multiple views is not as effective as shown for instance-level image representation learning. We also design two auxiliary tasks to predict boxes in one view from their features in the other view, by (1) predicting boxes from the sampled set by using a contrastive loss, and (2) predicting box coordinates using a transformer, which potentially benefits downstream object detection tasks. We found that these tasks do not lead to better object detection performance when finetuning the pretrained model on labeled data.
△ Less
Submitted 10 August, 2022; v1 submitted 8 July, 2022;
originally announced July 2022.
-
QFaaS: A Serverless Function-as-a-Service Framework for Quantum Computing
Authors:
Hoa T. Nguyen,
Muhammad Usman,
Rajkumar Buyya
Abstract:
Recent breakthroughs in quantum hardware are creating opportunities for its use in many applications. However, quantum software engineering is still in its infancy with many challenges, especially dealing with the diversity of quantum programming languages and hardware platforms. To alleviate these challenges, we propose QFaaS, a novel Quantum Function-as-a-Service framework, which leverages the a…
▽ More
Recent breakthroughs in quantum hardware are creating opportunities for its use in many applications. However, quantum software engineering is still in its infancy with many challenges, especially dealing with the diversity of quantum programming languages and hardware platforms. To alleviate these challenges, we propose QFaaS, a novel Quantum Function-as-a-Service framework, which leverages the advantages of the serverless model and the state-of-the-art software engineering approaches to advance practical quantum computing. Our framework provides essential components of a quantum serverless platform to simplify the software development and adapt to the quantum cloud computing paradigm, such as combining hybrid quantum-classical computation, containerizing functions, and integrating DevOps features. We design QFaaS as a unified quantum computing framework by supporting well-known quantum languages and software development kits (Qiskit, Q#, Cirq, and Braket), executing the quantum tasks on multiple simulators and quantum cloud providers (IBM Quantum and Amazon Braket). This paper proposes architectural design, principal components, the life cycle of hybrid quantum-classical function, operation workflow, and implementation of QFaaS. We present two practical use cases and perform the evaluations on quantum computers and simulators to demonstrate our framework's ability to ease the burden on traditional engineers to expedite the ongoing quantum software transition.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
A Summary of the ALQAC 2021 Competition
Authors:
Nguyen Ha Thanh,
Bui Minh Quan,
Chau Nguyen,
Tung Le,
Nguyen Minh Phuong,
Dang Tran Binh,
Vuong Thi Hai Yen,
Teeradaj Racharak,
Nguyen Le Minh,
Tran Duc Vu,
Phan Viet Anh,
Nguyen Truong Son,
Huy Tien Nguyen,
Bhumindr Butr-indr,
Peerapon Vateekul,
Prachya Boonkwan
Abstract:
We summarize the evaluation of the first Automated Legal Question Answering Competition (ALQAC 2021). The competition this year contains three tasks, which aims at processing the statute law document, which are Legal Text Information Retrieval (Task 1), Legal Text Entailment Prediction (Task 2), and Legal Text Question Answering (Task 3). The final goal of these tasks is to build a system that can…
▽ More
We summarize the evaluation of the first Automated Legal Question Answering Competition (ALQAC 2021). The competition this year contains three tasks, which aims at processing the statute law document, which are Legal Text Information Retrieval (Task 1), Legal Text Entailment Prediction (Task 2), and Legal Text Question Answering (Task 3). The final goal of these tasks is to build a system that can automatically determine whether a particular statement is lawful. There is no limit to the approaches of the participating teams. This year, there are 5 teams participating in Task 1, 6 teams participating in Task 2, and 5 teams participating in Task 3. There are in total 36 runs submitted to the organizer. In this paper, we summarize each team's approaches, official results, and some discussion about the competition. Only results of the teams who successfully submit their approach description paper are reported in this paper.
△ Less
Submitted 24 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Contextual Model Aggregation for Fast and Robust Federated Learning in Edge Computing
Authors:
Hung T. Nguyen,
H. Vincent Poor,
Mung Chiang
Abstract:
Federated learning is a prime candidate for distributed machine learning at the network edge due to the low communication complexity and privacy protection among other attractive properties. However, existing algorithms face issues with slow convergence and/or robustness of performance due to the considerable heterogeneity of data distribution, computation and communication capability at the edge.…
▽ More
Federated learning is a prime candidate for distributed machine learning at the network edge due to the low communication complexity and privacy protection among other attractive properties. However, existing algorithms face issues with slow convergence and/or robustness of performance due to the considerable heterogeneity of data distribution, computation and communication capability at the edge. In this work, we tackle both of these issues by focusing on the key component of model aggregation in federated learning systems and studying optimal algorithms to perform this task. Particularly, we propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction in each round of optimization. The aforementioned context-dependent bound is derived from the particular participating devices in that round and an assumption on smoothness of the overall loss function. We show that this aggregation leads to a definite reduction of loss function at every round. Furthermore, we can integrate our aggregation with many existing algorithms to obtain the contextual versions. Our experimental results demonstrate significant improvements in convergence speed and robustness of the contextual versions compared to the original algorithms. We also consider different variants of the contextual aggregation and show robust performance even in the most extreme settings.
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography
Authors:
Hieu T. Nguyen,
Ha Q. Nguyen,
Hieu H. Pham,
Khanh Lam,
Linh T. Le,
Minh Dao,
Van Vu
Abstract:
Mammography, or breast X-ray, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe or CADx) tools have been developed to support physicians and improve the accuracy of interpreting mammography. However, most published datasets of mammography are either limited on sampl…
▽ More
Mammography, or breast X-ray, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe or CADx) tools have been developed to support physicians and improve the accuracy of interpreting mammography. However, most published datasets of mammography are either limited on sample size or digitalized from screen-film mammography (SFM), hindering the development of CADe and CADx tools which are developed based on full-field digital mammography (FFDM). To overcome this challenge, we introduce VinDr-Mammo - a new benchmark dataset of FFDM for detecting and diagnosing breast cancer and other diseases in mammography. The dataset consists of 5,000 mammography exams, each of which has four standard views and is double read with disagreement (if any) being resolved by arbitration. It is created for the assessment of Breast Imaging Reporting and Data System (BI-RADS) and density at the breast level. In addition, the dataset also provides the category, location, and BI-RADS assessment of non-benign findings. We make VinDr-Mammo publicly available on PhysioNet as a new imaging resource to promote advances in develo** CADe and CADx tools for breast cancer screening.
△ Less
Submitted 16 March, 2023; v1 submitted 20 March, 2022;
originally announced March 2022.
-
Learning from Multiple Expert Annotators for Enhancing Anomaly Detection in Medical Image Analysis
Authors:
Khiem H. Le,
Tuan V. Tran,
Hieu H. Pham,
Hieu T. Nguyen,
Tung T. Le,
Ha Q. Nguyen
Abstract:
Building an accurate computer-aided diagnosis system based on data-driven approaches requires a large amount of high-quality labeled data. In medical imaging analysis, multiple expert annotators often produce subjective estimates about "ground truth labels" during the annotation process, depending on their expertise and experience. As a result, the labeled data may contain a variety of human biase…
▽ More
Building an accurate computer-aided diagnosis system based on data-driven approaches requires a large amount of high-quality labeled data. In medical imaging analysis, multiple expert annotators often produce subjective estimates about "ground truth labels" during the annotation process, depending on their expertise and experience. As a result, the labeled data may contain a variety of human biases with a high rate of disagreement among annotators, which significantly affect the performance of supervised machine learning algorithms. To tackle this challenge, we propose a simple yet effective approach to combine annotations from multiple radiology experts for training a deep learning-based detector that aims to detect abnormalities on medical scans. The proposed method first estimates the ground truth annotations and confidence scores of training examples. The estimated annotations and their scores are then used to train a deep learning detector with a re-weighted loss function to localize abnormal findings. We conduct an extensive experimental evaluation of the proposed approach on both simulated and real-world medical imaging datasets. The experimental results show that our approach significantly outperforms baseline approaches that do not consider the disagreements among annotators, including methods in which all of the noisy annotations are treated equally as ground truth and the ensemble of different models trained on different label sets provided separately by annotators.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
A Novel Transparency Strategy-based Data Augmentation Approach for BI-RADS Classification of Mammograms
Authors:
Sam B. Tran,
Huyen T. X. Nguyen,
Chi Phan,
Hieu H. Pham,
Ha Q. Nguyen
Abstract:
Image augmentation techniques have been widely investigated to improve the performance of deep learning (DL) algorithms on mammography classification tasks. Recent methods have proved the efficiency of image augmentation on data deficiency or data imbalance issues. In this paper, we propose a novel transparency strategy to boost the Breast Imaging Reporting and Data System (BI-RADS) scores of mamm…
▽ More
Image augmentation techniques have been widely investigated to improve the performance of deep learning (DL) algorithms on mammography classification tasks. Recent methods have proved the efficiency of image augmentation on data deficiency or data imbalance issues. In this paper, we propose a novel transparency strategy to boost the Breast Imaging Reporting and Data System (BI-RADS) scores of mammogram classifiers. The proposed approach utilizes the Region of Interest (ROI) information to generate more high-risk training examples for breast cancer (BI-RADS 3, 4, 5) from original images. Our extensive experiments on three different datasets show that the proposed approach significantly improves the mammogram classification performance and surpasses a state-of-the-art data augmentation technique called CutMix. This study also highlights that our transparency method is more effective than other augmentation strategies for BI-RADS classification and can be widely applied to other computer vision tasks.
△ Less
Submitted 17 April, 2023; v1 submitted 20 March, 2022;
originally announced March 2022.
-
Handwriting recognition and automatic scoring for descriptive answers in Japanese language tests
Authors:
Hung Tuan Nguyen,
Cuong Tuan Nguyen,
Haruki Oka,
Tsunenori Ishioka,
Masaki Nakagawa
Abstract:
This paper presents an experiment of automatically scoring handwritten descriptive answers in the trial tests for the new Japanese university entrance examination, which were made for about 120,000 examinees in 2017 and 2018. There are about 400,000 answers with more than 20 million characters. Although all answers have been scored by human examiners, handwritten characters are not labeled. We pre…
▽ More
This paper presents an experiment of automatically scoring handwritten descriptive answers in the trial tests for the new Japanese university entrance examination, which were made for about 120,000 examinees in 2017 and 2018. There are about 400,000 answers with more than 20 million characters. Although all answers have been scored by human examiners, handwritten characters are not labeled. We present our attempt to adapt deep neural network-based handwriting recognizers trained on a labeled handwriting dataset into this unlabeled answer set. Our proposed method combines different training strategies, ensembles multiple recognizers, and uses a language model built from a large general corpus to avoid overfitting into specific data. In our experiment, the proposed method records character accuracy of over 97% using about 2,000 verified labeled answers that account for less than 0.5% of the dataset. Then, the recognized answers are fed into a pre-trained automatic scoring system based on the BERT model without correcting misrecognized characters and providing rubric annotations. The automatic scoring system achieves from 0.84 to 0.98 of Quadratic Weighted Kappa (QWK). As QWK is over 0.8, it represents an acceptable similarity of scoring between the automatic scoring system and the human examiners. These results are promising for further research on end-to-end automatic scoring of descriptive answers.
△ Less
Submitted 30 November, 2023; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Adversarial Neural Networks for Error Correcting Codes
Authors:
Hung T. Nguyen,
Steven Bottone,
Kwang Taik Kim,
Mung Chiang,
H. Vincent Poor
Abstract:
Error correcting codes are a fundamental component in modern day communication systems, demanding extremely high throughput, ultra-reliability and low latency. Recent approaches using machine learning (ML) models as the decoders offer both improved performance and great adaptability to unknown environments, where traditional decoders struggle. We introduce a general framework to further boost the…
▽ More
Error correcting codes are a fundamental component in modern day communication systems, demanding extremely high throughput, ultra-reliability and low latency. Recent approaches using machine learning (ML) models as the decoders offer both improved performance and great adaptability to unknown environments, where traditional decoders struggle. We introduce a general framework to further boost the performance and applicability of ML models. We propose to combine ML decoders with a competing discriminator network that tries to distinguish between codewords and noisy words, and, hence, guides the decoding models to recover transmitted codewords. Our framework is game-theoretic, motivated by generative adversarial networks (GANs), with the decoder and discriminator competing in a zero-sum game. The decoder learns to simultaneously decode and generate codewords while the discriminator learns to tell the differences between decoded outputs and codewords. Thus, the decoder is able to decode noisy received signals into codewords, increasing the probability of successful decoding. We show a strong connection of our framework with the optimal maximum likelihood decoder by proving that this decoder defines a Nash equilibrium point of our game. Hence, training to equilibrium has a good possibility of achieving the optimal maximum likelihood performance. Moreover, our framework does not require training labels, which are typically unavailable during communications, and, thus, seemingly can be trained online and adapt to channel dynamics. To demonstrate the performance of our framework, we combine it with the very recent neural decoders and show improved performance compared to the original models and traditional decoding algorithms on various codes.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
On-the-fly Resource-Aware Model Aggregation for Federated Learning in Heterogeneous Edge
Authors:
Hung T. Nguyen,
Roberto Morabito,
Kwang Taik Kim,
Mung Chiang
Abstract:
Edge computing has revolutionized the world of mobile and wireless networks world thanks to its flexible, secure, and performing characteristics. Lately, we have witnessed the increasing use of it to make more performing the deployment of machine learning (ML) techniques such as federated learning (FL). FL was debuted to improve communication efficiency compared to conventional distributed machine…
▽ More
Edge computing has revolutionized the world of mobile and wireless networks world thanks to its flexible, secure, and performing characteristics. Lately, we have witnessed the increasing use of it to make more performing the deployment of machine learning (ML) techniques such as federated learning (FL). FL was debuted to improve communication efficiency compared to conventional distributed machine learning (ML). The original FL assumes a central aggregation server to aggregate locally optimized parameters and might bring reliability and latency issues. In this paper, we conduct an in-depth study of strategies to replace this central server by a flying master that is dynamically selected based on the current participants and/or available resources at every FL round of optimization. Specifically, we compare different metrics to select this flying master and assess consensus algorithms to perform the selection. Our results demonstrate a significant reduction of runtime using our flying master FL framework compared to the original FL from measurements results conducted in our EdgeAI testbed and over real 5G networks using an operational edge testbed.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
A novel multi-view deep learning approach for BI-RADS and density assessment of mammograms
Authors:
Huyen T. X. Nguyen,
Sam B. Tran,
Dung B. Nguyen,
Hieu H. Pham,
Ha Q. Nguyen
Abstract:
Advanced deep learning (DL) algorithms may predict the patient's risk of develo** breast cancer based on the Breast Imaging Reporting and Data System (BI-RADS) and density standards. Recent studies have suggested that the combination of multi-view analysis improved the overall breast exam classification. In this paper, we propose a novel multi-view DL approach for BI-RADS and density assessment…
▽ More
Advanced deep learning (DL) algorithms may predict the patient's risk of develo** breast cancer based on the Breast Imaging Reporting and Data System (BI-RADS) and density standards. Recent studies have suggested that the combination of multi-view analysis improved the overall breast exam classification. In this paper, we propose a novel multi-view DL approach for BI-RADS and density assessment of mammograms. The proposed approach first deploys deep convolutional networks for feature extraction on each view separately. The extracted features are then stacked and fed into a Light Gradient Boosting Machine (LightGBM) classifier to predict BI-RADS and density scores. We conduct extensive experiments on both the internal mammography dataset and the public dataset Digital Database for Screening Mammography (DDSM). The experimental results demonstrate that the proposed approach outperforms the single-view classification approach on two benchmark datasets by huge F1-score margins (+5% on the internal dataset and +10% on the DDSM dataset). These results highlight the vital role of combining multi-view information to improve the performance of breast cancer risk prediction.
△ Less
Submitted 17 April, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Intelligence Reflecting Surface-Aided Integrated Data and Energy Networking Coexisting D2D Communications
Authors:
Nguyen Thi Thanh Van,
Huy T. Nguyen,
Nguyen Cong Luong,
Ngo Manh Tien,
Dusit Niyato,
Dong In Kim
Abstract:
In this paper, we consider an integrated data and energy network and D2D communication coexistence (DED2D) system. The DED2D system allows a base station (BS) to transfer data to information-demanded users (IUs) and energy to energy-demanded users (EUs), i.e., using a time-fraction-based information and energy transfer (TFIET) scheme. Furthermore, the DED2D system enables D2D communications to sha…
▽ More
In this paper, we consider an integrated data and energy network and D2D communication coexistence (DED2D) system. The DED2D system allows a base station (BS) to transfer data to information-demanded users (IUs) and energy to energy-demanded users (EUs), i.e., using a time-fraction-based information and energy transfer (TFIET) scheme. Furthermore, the DED2D system enables D2D communications to share spectrum with the BS. Therefore, the DED2D system addresses the growth of energy and spectrum demands of the next generation networks. However, the interference caused by the D2D communications and propagation loss of wireless links can significantly degrade the data throughput of IUs. To deal with the issues, we propose to deploy an intelligent reflecting surface (IRS) in the DED2D system. Then, we formulate an optimization problem that aims to optimize the information beamformer for the IUs, energy beamformer for EUs, time fractions of the TFIET, transmit power of D2D transmitters, and reflection coefficients of the IRS to maximize IUs' worse throughput while satisfying the harvested energy requirement of EUs and D2D rate threshold. The max-min throughput optimization problem is computationally intractable, and we develop an alternating descent algorithm to resolve it with low computational complexity. The simulation results demonstrate the effectiveness of the proposed algorithm.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
VSEC: Transformer-based Model for Vietnamese Spelling Correction
Authors:
Dinh-Truong Do,
Ha Thanh Nguyen,
Thang Ngoc Bui,
Dinh Hieu Vo
Abstract:
Spelling error correction is one of topics which have a long history in natural language processing. Although previous studies have achieved remarkable results, challenges still exist. In the Vietnamese language, a state-of-the-art method for the task infers a syllable's context from its adjacent syllables. The method's accuracy can be unsatisfactory, however, because the model may lose the contex…
▽ More
Spelling error correction is one of topics which have a long history in natural language processing. Although previous studies have achieved remarkable results, challenges still exist. In the Vietnamese language, a state-of-the-art method for the task infers a syllable's context from its adjacent syllables. The method's accuracy can be unsatisfactory, however, because the model may lose the context if two (or more) spelling mistakes stand near each other. In this paper, we propose a novel method to correct Vietnamese spelling errors. We tackle the problems of mistyped errors and misspelled errors by using a deep learning model. The embedding layer, in particular, is powered by the byte pair encoding technique. The sequence to sequence model based on the Transformer architecture makes our approach different from the previous works on the same problem. In the experiment, we train the model with a large synthetic dataset, which is randomly introduced spelling errors. We test the performance of the proposed method using a realistic dataset. This dataset contains 11,202 human-made misspellings in 9,341 different Vietnamese sentences. The experimental results show that our method achieves encouraging performance with 86.8% errors detected and 81.5% errors corrected, which improves the state-of-the-art approach 5.6% and 2.2%, respectively.
△ Less
Submitted 8 November, 2021; v1 submitted 31 October, 2021;
originally announced November 2021.
-
TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining
Authors:
Viet-Anh Nguyen,
Anh H. T. Nguyen,
Andy W. H. Khong
Abstract:
We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the qual…
▽ More
We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the quality of bandwidth extended signals and reduce the sensitivity with respect to downsampling methods. Experiment results on the VCTK dataset show that the proposed method outperforms several recent baselines in both intrusive and non-intrusive metrics. Pretraining and filter augmentation also help stabilize and enhance the overall performance.
△ Less
Submitted 7 June, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Automated Generation of Accurate \& Fluent Medical X-ray Reports
Authors:
Hoang T. N. Nguyen,
Dong Nie,
Taivanbat Badamdorj,
Yujie Liu,
Yingying Zhu,
Jason Truong,
Li Cheng
Abstract:
Our paper focuses on automating the generation of medical reports from chest X-ray image inputs, a critical yet time-consuming task for radiologists. Unlike existing medical re-port generation efforts that tend to produce human-readable reports, we aim to generate medical reports that are both fluent and clinically accurate. This is achieved by our fully differentiable and end-to-end paradigm cont…
▽ More
Our paper focuses on automating the generation of medical reports from chest X-ray image inputs, a critical yet time-consuming task for radiologists. Unlike existing medical re-port generation efforts that tend to produce human-readable reports, we aim to generate medical reports that are both fluent and clinically accurate. This is achieved by our fully differentiable and end-to-end paradigm containing three complementary modules: taking the chest X-ray images and clinical his-tory document of patients as inputs, our classification module produces an internal check-list of disease-related topics, referred to as enriched disease embedding; the embedding representation is then passed to our transformer-based generator, giving rise to the medical reports; meanwhile, our generator also pro-duces the weighted embedding representation, which is fed to our interpreter to ensure consistency with respect to disease-related topics.Our approach achieved promising results on commonly-used metrics concerning language fluency and clinical accuracy. Moreover, noticeable performance gains are consistently ob-served when additional input information is available, such as the clinical document and extra scans of different views.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural Networks
Authors:
Thanh T. Tran,
Hieu H. Pham,
Thang V. Nguyen,
Tung T. Le,
Hieu T. Nguyen,
Ha Q. Nguyen
Abstract:
Chest radiograph (CXR) interpretation in pediatric patients is error-prone and requires a high level of understanding of radiologic expertise. Recently, deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting CXR in adults. However, there is a lack of evidence indicating that D-CNNs can recognize accurately multiple lung pathologies from pediatric CXR scans. I…
▽ More
Chest radiograph (CXR) interpretation in pediatric patients is error-prone and requires a high level of understanding of radiologic expertise. Recently, deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting CXR in adults. However, there is a lack of evidence indicating that D-CNNs can recognize accurately multiple lung pathologies from pediatric CXR scans. In particular, the development of diagnostic models for the detection of pediatric chest diseases faces significant challenges such as (i) lack of physician-annotated datasets and (ii) class imbalance problems. In this paper, we retrospectively collect a large dataset of 5,017 pediatric CXR scans, for which each is manually labeled by an experienced radiologist for the presence of 10 common pathologies. A D-CNN model is then trained on 3,550 annotated scans to classify multiple pediatric lung pathologies automatically. To address the high-class imbalance issue, we propose to modify and apply "Distribution-Balanced loss" for training D-CNNs which reshapes the standard Binary-Cross Entropy loss (BCE) to efficiently learn harder samples by down-weighting the loss assigned to the majority classes. On an independent test set of 777 studies, the proposed approach yields an area under the receiver operating characteristic (AUC) of 0.709 (95% CI, 0.690-0.729). The sensitivity, specificity, and F1-score at the cutoff value are 0.722 (0.694-0.750), 0.579 (0.563-0.595), and 0.389 (0.373-0.405), respectively. These results significantly outperform previous state-of-the-art methods on most of the target diseases. Moreover, our ablation studies validate the effectiveness of the proposed loss function compared to other standard losses, e.g., BCE and Focal Loss, for this learning task. Overall, we demonstrate the potential of D-CNNs in interpreting pediatric CXRs.
△ Less
Submitted 14 August, 2021;
originally announced August 2021.
-
A Transformer-based Math Language Model for Handwritten Math Expression Recognition
Authors:
Huy Quang Ung,
Cuong Tuan Nguyen,
Hung Tuan Nguyen,
Thanh-Nghia Truong,
Masaki Nakagawa
Abstract:
Handwritten mathematical expressions (HMEs) contain ambiguities in their interpretations, even for humans sometimes. Several math symbols are very similar in the writing style, such as dot and comma or 0, O, and o, which is a challenge for HME recognition systems to handle without using contextual information. To address this problem, this paper presents a Transformer-based Math Language Model (TM…
▽ More
Handwritten mathematical expressions (HMEs) contain ambiguities in their interpretations, even for humans sometimes. Several math symbols are very similar in the writing style, such as dot and comma or 0, O, and o, which is a challenge for HME recognition systems to handle without using contextual information. To address this problem, this paper presents a Transformer-based Math Language Model (TMLM). Based on the self-attention mechanism, the high-level representation of an input token in a sequence of tokens is computed by how it is related to the previous tokens. Thus, TMLM can capture long dependencies and correlations among symbols and relations in a mathematical expression (ME). We trained the proposed language model using a corpus of approximately 70,000 LaTeX sequences provided in CROHME 2016. TMLM achieved the perplexity of 4.42, which outperformed the previous math language models, i.e., the N-gram and recurrent neural network-based language models. In addition, we combine TMLM into a stochastic context-free grammar-based HME recognition system using a weighting parameter to re-rank the top-10 best candidates. The expression rates on the testing sets of CROHME 2016 and CROHME 2019 were improved by 2.97 and 0.83 percentage points, respectively.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
Recurrent neural network transducer for Japanese and Chinese offline handwritten text recognition
Authors:
Trung Tan Ngo,
Hung Tuan Nguyen,
Nam Tuan Ly,
Masaki Nakagawa
Abstract:
In this paper, we propose an RNN-Transducer model for recognizing Japanese and Chinese offline handwritten text line images. As far as we know, it is the first approach that adopts the RNN-Transducer model for offline handwritten text recognition. The proposed model consists of three main components: a visual feature encoder that extracts visual features from an input image by CNN and then encodes…
▽ More
In this paper, we propose an RNN-Transducer model for recognizing Japanese and Chinese offline handwritten text line images. As far as we know, it is the first approach that adopts the RNN-Transducer model for offline handwritten text recognition. The proposed model consists of three main components: a visual feature encoder that extracts visual features from an input image by CNN and then encodes the visual features by BLSTM; a linguistic context encoder that extracts and encodes linguistic features from the input image by embedded layers and LSTM; and a joint decoder that combines and then decodes the visual features and the linguistic features into the final label sequence by fully connected and softmax layers. The proposed model takes advantage of both visual and linguistic information from the input image. In the experiments, we evaluated the performance of the proposed model on the two datasets: Kuzushiji and SCUT-EPT. Experimental results show that the proposed model achieves state-of-the-art performance on all datasets.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
VinDr-SpineXR: A deep learning framework for spinal lesions detection and classification from radiographs
Authors:
Hieu T. Nguyen,
Hieu H. Pham,
Nghia T. Nguyen,
Ha Q. Nguyen,
Thang Q. Huynh,
Minh Dao,
Van Vu
Abstract:
Radiographs are used as the most important imaging tool for identifying spine anomalies in clinical practice. The evaluation of spinal bone lesions, however, is a challenging task for radiologists. This work aims at develo** and evaluating a deep learning-based framework, named VinDr-SpineXR, for the classification and localization of abnormalities from spine X-rays. First, we build a large data…
▽ More
Radiographs are used as the most important imaging tool for identifying spine anomalies in clinical practice. The evaluation of spinal bone lesions, however, is a challenging task for radiologists. This work aims at develo** and evaluating a deep learning-based framework, named VinDr-SpineXR, for the classification and localization of abnormalities from spine X-rays. First, we build a large dataset, comprising 10,468 spine X-ray images from 5,000 studies, each of which is manually annotated by an experienced radiologist with bounding boxes around abnormal findings in 13 categories. Using this dataset, we then train a deep learning classifier to determine whether a spine scan is abnormal and a detector to localize 7 crucial findings amongst the total 13. The VinDr-SpineXR is evaluated on a test set of 2,078 images from 1,000 studies, which is kept separate from the training set. It demonstrates an area under the receiver operating characteristic curve (AUROC) of 88.61% (95% CI 87.19%, 90.02%) for the image-level classification task and a mean average precision ([email protected]) of 33.56% for the lesion-level localization task. These results serve as a proof of concept and set a baseline for future research in this direction. To encourage advances, the dataset, codes, and trained deep learning models are made publicly available.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
GSSF: A Generative Sequence Similarity Function based on a Seq2Seq model for clustering online handwritten mathematical answers
Authors:
Huy Quang Ung,
Cuong Tuan Nguyen,
Hung Tuan Nguyen,
Masaki Nakagawa
Abstract:
Toward a computer-assisted marking for descriptive math questions,this paper presents clustering of online handwritten mathematical expressions (OnHMEs) to help human markers to mark them efficiently and reliably. We propose a generative sequence similarity function for computing a similarity score of two OnHMEs based on a sequence-to-sequence OnHME recognizer. Each OnHME is represented by a simil…
▽ More
Toward a computer-assisted marking for descriptive math questions,this paper presents clustering of online handwritten mathematical expressions (OnHMEs) to help human markers to mark them efficiently and reliably. We propose a generative sequence similarity function for computing a similarity score of two OnHMEs based on a sequence-to-sequence OnHME recognizer. Each OnHME is represented by a similarity-based representation (SbR) vector. The SbR matrix is inputted to the k-means algorithm for clustering OnHMEs. Experiments are conducted on an answer dataset (Dset_Mix) of 200 OnHMEs mixed of real patterns and synthesized patterns for each of 10 questions and a real online handwritten mathematical answer dataset of 122 student answers at most for each of 15 questions (NIER_CBT). The best clustering results achieved around 0.916 and 0.915 for purity, and around 0.556 and 0.702 for the marking cost on Dset_Mix and NIER_CBT, respectively. Our method currently outperforms the previous methods for clustering HMEs.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Global Context for improving recognition of Online Handwritten Mathematical Expressions
Authors:
Cuong Tuan Nguyen,
Thanh-Nghia Truong,
Hung Tuan Nguyen,
Masaki Nakagawa
Abstract:
This paper presents a temporal classification method for all three subtasks of symbol segmentation, symbol recognition and relation classification in online handwritten mathematical expressions (HMEs). The classification model is trained by multiple paths of symbols and spatial relations derived from the Symbol Relation Tree (SRT) representation of HMEs. The method benefits from global context of…
▽ More
This paper presents a temporal classification method for all three subtasks of symbol segmentation, symbol recognition and relation classification in online handwritten mathematical expressions (HMEs). The classification model is trained by multiple paths of symbols and spatial relations derived from the Symbol Relation Tree (SRT) representation of HMEs. The method benefits from global context of a deep bidirectional Long Short-term Memory network, which learns the temporal classification directly from online handwriting by the Connectionist Temporal Classification loss. To recognize an online HME, a symbol-level parse tree with Context-Free Grammar is constructed, where symbols and spatial relations are obtained from the temporal classification results. We show the effectiveness of the proposed method on the two latest CROHME datasets.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Learning symbol relation tree for online mathematical expression recognition
Authors:
Thanh-Nghia Truong,
Hung Tuan Nguyen,
Cuong Tuan Nguyen,
Masaki Nakagawa
Abstract:
This paper proposes a method for recognizing online handwritten mathematical expressions (OnHME) by building a symbol relation tree (SRT) directly from a sequence of strokes. A bidirectional recurrent neural network learns from multiple derived paths of SRT to predict both symbols and spatial relations between symbols using global context. The recognition system has two parts: a temporal classifie…
▽ More
This paper proposes a method for recognizing online handwritten mathematical expressions (OnHME) by building a symbol relation tree (SRT) directly from a sequence of strokes. A bidirectional recurrent neural network learns from multiple derived paths of SRT to predict both symbols and spatial relations between symbols using global context. The recognition system has two parts: a temporal classifier and a tree connector. The temporal classifier produces an SRT by recognizing an OnHME pattern. The tree connector splits the SRT into several sub-SRTs. The final SRT is formed by looking up the best combination among those sub-SRTs. Besides, we adopt a tree sorting method to deal with various stroke orders. Recognition experiments indicate that the proposed OnHME recognition system is competitive to other methods. The recognition system achieves 44.12% and 41.76% expression recognition rates on the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2014 and 2016 testing sets.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.