Search | arXiv e-print repository

doi 10.5281/zenodo.6190227

Building a temperature forecasting model for the city with the regression neural network (RNN)

Authors: Nguyen Phuc Tran, Duy Thanh Tran, Thi Thuy Nga Duong

Abstract: In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building… ▽ More In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building predictive models to make accurate simulations. in Vietnam, research on weather forecast models is a recent development, having only begun around 2000. along with advancements in computer science, mathematical models are being built and applied with machine learning techniques to create more accurate and reliable predictive models. this article will summarize the research and solutions for applying recurrent neural networks to forecast urban temperatures. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 6 pages

Journal ref: The 6th International Conference for Small & Medium Business in 2020 (ICSMB 2020)

arXiv:2405.15779 [pdf]

LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation

Authors: Ngoc-Du Tran, Thi-Thao Tran, Quang-Huy Nguyen, Manh-Hung Vu, Van-Truong Pham

Abstract: The emergence of deep learning techniques has advanced the image segmentation task, especially for medical images. Many neural network models have been introduced in the last decade bringing the automated segmentation accuracy close to manual segmentation. However, cutting-edge models like Transformer-based architectures rely on large scale annotated training data, and are generally designed with… ▽ More The emergence of deep learning techniques has advanced the image segmentation task, especially for medical images. Many neural network models have been introduced in the last decade bringing the automated segmentation accuracy close to manual segmentation. However, cutting-edge models like Transformer-based architectures rely on large scale annotated training data, and are generally designed with densely consecutive layers in the encoder, decoder, and skip connections resulting in large number of parameters. Additionally, for better performance, they often be pretrained on a larger data, thus requiring large memory size and increasing resource expenses. In this study, we propose a new lightweight but efficient model, namely LiteNeXt, based on convolutions and mixing modules with simplified decoder, for medical image segmentation. The model is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42). To handle boundary fuzzy as well as occlusion or clutter in objects especially in medical image regions, we propose the Marginal Weight Loss that can help effectively determine the marginal boundary between object and background. Furthermore, we propose the Self-embedding Representation Parallel technique, that can help augment the data in a self-learning manner. Experiments on public datasets including Data Science Bowls, GlaS, ISIC2018, PH2, and Sunnybrook data show promising results compared to other state-of-the-art CNN-based and Transformer-based architectures. Our code will be published at: https://github.com/tranngocduvnvp/LiteNeXt. △ Less

Submitted 3 April, 2024; originally announced May 2024.

Comments: 35 pages, 9 figures, 10 tables

arXiv:2405.01815 [pdf, other]

Toward end-to-end interpretable convolutional neural networks for waveform signals

Authors: Linh Vu, Thu Tran, Wern-Han Lim, Raphael Phan

Abstract: This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models, presenting advancements in efficiency and explainability. By benchmarking experiments on three standard speech emotion recognition datasets with five-fold cross-validation, our framework outperforms Mel spectrogram features by up to seven percent. It can potentially repla… ▽ More This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models, presenting advancements in efficiency and explainability. By benchmarking experiments on three standard speech emotion recognition datasets with five-fold cross-validation, our framework outperforms Mel spectrogram features by up to seven percent. It can potentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while remaining lightweight. Furthermore, we demonstrate the efficiency and interpretability of the front-end layer using the PhysioNet Heart Sound Database, illustrating its ability to handle and capture intricate long waveform patterns. Our contributions offer a portable solution for building efficient and interpretable models for raw waveform data. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2402.01198 [pdf, other]

Physical Layer Location Privacy in SIMO Communication Using Fake Paths Injection

Authors: Trong Duy Tran, Maxime Ferreira Da Costa, Linh Trung Nguyen

Abstract: Fake path injection is an emerging paradigm for inducing privacy over wireless networks. In this paper, fake paths are injected by the transmitter into a SIMO multipath communication channel to preserve her physical location from an eavesdropper. A novel statistical privacy metric is defined as the ratio between the largest (resp. smallest) eigenvalues of Bob's (resp. Eve's) Cramér-Rao lower bound… ▽ More Fake path injection is an emerging paradigm for inducing privacy over wireless networks. In this paper, fake paths are injected by the transmitter into a SIMO multipath communication channel to preserve her physical location from an eavesdropper. A novel statistical privacy metric is defined as the ratio between the largest (resp. smallest) eigenvalues of Bob's (resp. Eve's) Cramér-Rao lower bound on the SIMO multipath channel parameters to assess the privacy enhancements. Leveraging the spectral properties of generalized Vandermonde matrices, bounds on the privacy margin of the proposed scheme are derived. Specifically, it is shown that the privacy margin increases quadratically in the inverse of the separation between the true and the fake paths under Eve's perspective. Numerical simulations further showcase the approach's benefit. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2312.05187 [pdf, other]

Seamless: Multilingual Expressive and Streaming Speech Translation

Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. The contributions to this work are publicly released and accessible at https://github.com/facebookresearch/seamless_communication △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.01970 [pdf, other]

CaRL: Cascade Reinforcement Learning with State Space Splitting for O-RAN based Traffic Steering

Authors: Chuanneng Sun, Yu Zhou, Gueyoung Jung, Tuyen Xuan Tran, Dario Pompili

Abstract: The Open Radio Access Network (O-RAN) architecture empowers intelligent and automated optimization of the RAN through applications deployed on the RAN Intelligent Controller (RIC) platform, enabling capabilities beyond what is achievable with traditional RAN solutions. Within this paradigm, Traffic Steering (TS) emerges as a pivotal RIC application that focuses on optimizing cell-level mobility se… ▽ More The Open Radio Access Network (O-RAN) architecture empowers intelligent and automated optimization of the RAN through applications deployed on the RAN Intelligent Controller (RIC) platform, enabling capabilities beyond what is achievable with traditional RAN solutions. Within this paradigm, Traffic Steering (TS) emerges as a pivotal RIC application that focuses on optimizing cell-level mobility settings in near-real-time, aiming to significantly improve network spectral efficiency. In this paper, we design a novel TS algorithm based on a Cascade Reinforcement Learning (CaRL) framework. We propose state space factorization and policy decomposition to reduce the need for large models and well-labeled datasets. For each sub-state space, an RL sub-policy will be trained to learn an optimized map** onto the action space. To apply CaRL on new network regions, we propose a knowledge transfer approach to initialize a new sub-policy based on knowledge learned by the trained policies. To evaluate CaRL, we build a data-driven and scalable RIC digital twin (DT) that is modeled using important real-world data, including network configuration, user geo-distribution, and traffic demand, among others, from a tier-1 mobile operator in the US. We evaluate CaRL on two DT scenarios representing two network clusters in two different cities and compare its performance with the business-as-usual (BAU) policy and other competing optimization approaches using heuristic and Q-table algorithms. Benchmarking results show that CaRL performs the best and improves the average cluster-aggregated downlink throughput over the BAU policy by 24% and 18% in these two scenarios, respectively. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 14 pages, 10 figures

ACM Class: C.2.3; I.2.8

arXiv:2311.15213 [pdf]

Leveraging Anatomical Constraints with Uncertainty for Pneumothorax Segmentation

Authors: Han Yuan, Chuan Hong, Nguyen Tuan Anh Tran, Xinxing Xu, Nan Liu

Abstract: Pneumothorax is a medical emergency caused by abnormal accumulation of air in the pleural space - the potential space between the lungs and chest wall. On 2D chest radiographs, pneumothorax occurs within the thoracic cavity and outside of the mediastinum and we refer to this area as "lung+ space". While deep learning (DL) has increasingly been utilized to segment pneumothorax lesions in chest radi… ▽ More Pneumothorax is a medical emergency caused by abnormal accumulation of air in the pleural space - the potential space between the lungs and chest wall. On 2D chest radiographs, pneumothorax occurs within the thoracic cavity and outside of the mediastinum and we refer to this area as "lung+ space". While deep learning (DL) has increasingly been utilized to segment pneumothorax lesions in chest radiographs, many existing DL models employ an end-to-end approach. These models directly map chest radiographs to clinician-annotated lesion areas, often neglecting the vital domain knowledge that pneumothorax is inherently location-sensitive. We propose a novel approach that incorporates the lung+ space as a constraint during DL model training for pneumothorax segmentation on 2D chest radiographs. To circumvent the need for additional annotations and to prevent potential label leakage on the target task, our method utilizes external datasets and an auxiliary task of lung segmentation. This approach generates a specific constraint of lung+ space for each chest radiograph. Furthermore, we have incorporated a discriminator to eliminate unreliable constraints caused by the domain shift between the auxiliary and target datasets. Our results demonstrated significant improvements, with average performance gains of 4.6%, 3.6%, and 3.3% regarding Intersection over Union (IoU), Dice Similarity Coefficient (DSC), and Hausdorff Distance (HD). Our research underscores the significance of incorporating medical domain knowledge about the location-specific nature of pneumothorax to enhance DL-based lesion segmentation. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.06580 [pdf, other]

doi 10.1109/ISGT59692.2024.10454197

Modeling Power Systems Dynamics with Symbolic Physics-Informed Neural Networks

Authors: Huynh T. T. Tran, Hieu T. Nguyen

Abstract: In recent years, scientific machine learning, particularly physic-informed neural networks (PINNs), has introduced new innovative methods to understanding the differential equations that describe power system dynamics, providing a more efficient alternative to traditional methods. However, using a single neural network to capture patterns of all variables requires a large enough size of networks,… ▽ More In recent years, scientific machine learning, particularly physic-informed neural networks (PINNs), has introduced new innovative methods to understanding the differential equations that describe power system dynamics, providing a more efficient alternative to traditional methods. However, using a single neural network to capture patterns of all variables requires a large enough size of networks, leading to a long time of training and still high computational costs. In this paper, we utilize the interfacing of PINNs with symbolic techniques to construct multiple single-output neural networks by taking the loss function apart and integrating it over the relevant domain. Also, we reweigh the factors of the components in the loss function to improve the performance of the network for instability systems. Our results show that the symbolic PINNs provide higher accuracy with significantly fewer parameters and faster training time. By using the adaptive weight method, the symbolic PINNs can avoid the vanishing gradient problem and numerical instability. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Journal ref: The 2024 Conference on Innovative Smart Grid Technologies, North America (ISGT NA 2024)

arXiv:2309.16699 [pdf]

Circular-Line Trajectory Tracking Controller for Mobile Robot using Multi-Pixy2 Sensors

Authors: Xuan Quang Ngo, Tri Duc Tran, Huy Hung Nguyen, Van Dong Nguyen, Van Tu Duong, Tan Tien Nguyen

Abstract: This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed… ▽ More This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed to show the effectiveness of the proposed method. △ Less

Submitted 12 August, 2023; originally announced September 2023.

Comments: 6 pages, 12 figures, the 2023 International Symposium on Electrical and Electronics Engineering, Ho Chi Minh, Viet Nam, 2023

arXiv:2307.16834 [pdf]

doi 10.1007/978-3-031-53963-3_25

Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System

Authors: Hoang Viet Pham, Thinh Gia Tran, Chuong Dinh Le, An Dinh Le, Hien Bich Vo

Abstract: Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jet… ▽ More Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jetson platform is one of the pioneers in offering optimal performance regarding energy efficiency and throughput in the execution of deep learning algorithms. Previously, most benchmarking analysis was based on 2D images with a single deep learning model for each comparison result. In this paper, we implement an end-to-end video-based crime-scene anomaly detection system inputting from surveillance videos and the system is deployed and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin Nano). The comparison analysis includes the integration of Torch-TensorRT as a software developer kit from NVIDIA for the model performance optimisation. The system is built based on the PySlowfast open-source project from Facebook as the coding template. The end-to-end system process comprises the videos from camera, data preprocessing pipeline, feature extractor and the anomaly detection. We provide the experience of an AI-based system deployment on various Jetson Edge devices with Docker technology. Regarding anomaly detectors, a weakly supervised video-based deep learning model called Robust Temporal Feature Magnitude Learning (RTFM) is applied in the system. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power. △ Less

Submitted 12 September, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: Accepted in Future of Information and Communication Conference (FICC) 2024

arXiv:2306.16103 [pdf]

1M parameters are enough? A lightweight CNN-based model for medical image segmentation

Authors: Binh-Duong Dinh, Thanh-Thu Nguyen, Thi-Thao Tran, Van-Truong Pham

Abstract: Convolutional neural networks (CNNs) and Transformer-based models are being widely applied in medical image segmentation thanks to their ability to extract high-level features and capture important aspects of the image. However, there is often a trade-off between the need for high accuracy and the desire for low computational cost. A model with higher parameters can theoretically achieve better pe… ▽ More Convolutional neural networks (CNNs) and Transformer-based models are being widely applied in medical image segmentation thanks to their ability to extract high-level features and capture important aspects of the image. However, there is often a trade-off between the need for high accuracy and the desire for low computational cost. A model with higher parameters can theoretically achieve better performance but also result in more computational complexity and higher memory usage, and thus is not practical to implement. In this paper, we look for a lightweight U-Net-based model which can remain the same or even achieve better performance, namely U-Lite. We design U-Lite based on the principle of Depthwise Separable Convolution so that the model can both leverage the strength of CNNs and reduce a remarkable number of computing parameters. Specifically, we propose Axial Depthwise Convolutions with kernels 7x7 in both the encoder and decoder to enlarge the model receptive field. To further improve the performance, we use several Axial Dilated Depthwise Convolutions with filters 3x3 for the bottleneck as one of our branches. Overall, U-Lite contains only 878K parameters, 35 times less than the traditional U-Net, and much more times less than other modern Transformer-based models. The proposed model cuts down a large amount of computational complexity while attaining an impressive performance on medical segmentation tasks compared to other state-of-the-art architectures. The code will be available at: https://github.com/duong-db/U-Lite. △ Less

Submitted 3 July, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: I have fixed Figure 1

arXiv:2302.09597

doi 10.1049/enc2.12107

Solving Differential-Algebraic Equations in Power System Dynamic Analysis with Quantum Computing

Authors: Huynh Trung Thanh Tran, Hieu T. Nguyen, Long T. Vu, Samuel T. Ojetola

Abstract: Power system dynamics are generally modeled by high dimensional nonlinear differential-algebraic equations (DAEs) given a large number of components forming the network. These DAEs' complexity can grow exponentially due to the increasing penetration of distributed energy resources, whereas their computation time becomes sensitive due to the increasing interconnection of the power grid with other e… ▽ More Power system dynamics are generally modeled by high dimensional nonlinear differential-algebraic equations (DAEs) given a large number of components forming the network. These DAEs' complexity can grow exponentially due to the increasing penetration of distributed energy resources, whereas their computation time becomes sensitive due to the increasing interconnection of the power grid with other energy systems. This paper demonstrates the use of quantum computing algorithms to solve DAEs for power system dynamic analysis. We leverage a symbolic programming framework to equivalently convert the power system's DAEs into ordinary differential equations (ODEs) using index reduction methods and then encode their data into qubits using amplitude encoding. The system nonlinearity is captured by Hamiltonian simulation with truncated Taylor expansion so that state variables can be updated by a quantum linear equation solver. Our results show that quantum computing can solve the power system's DAEs accurately with a computational complexity polynomial in the logarithm of the system dimension. We also illustrate the use of recent advanced tools in scientific machine learning for implementing complex computing concepts, i.e. Taylor expansion, DAEs/ODEs transformation, and quantum computing solver with abstract representation for power engineering applications. △ Less

Submitted 1 March, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: This version was uploaded as an incorrect replacement, and was intended as a replacement of arXiv:2306.01961. I need to withdraw this paper to upload it as a replacement of the correct paper

Journal ref: Energy Conversion and Economics, Volume 5, Issue 1, Feb 2024, pages 40-53

arXiv:2302.06294 [pdf, other]

doi 10.1016/j.media.2023.102888

CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Authors: Chinedu Innocent Nwoye, Tong Yu, Saurav Sharma, Aditya Murali, Deepak Alapatt, Armine Vardazaryan, Kun Yuan, Jonas Hajek, Wolfgang Reiter, Amine Yamlahi, Finn-Henri Smidt, Xiaoyang Zou, Guoyan Zheng, Bruno Oliveira, Helena R. Torres, Satoshi Kondo, Satoshi Kasai, Felix Holm, Ege Özsoy, Shuangchun Gui, Han Li, Sista Raviteja, Rachana Sathish, Pranav Poudel, Binod Bhattarai , et al. (24 additional authors not shown)

Abstract: Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor… ▽ More Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery. △ Less

Submitted 14 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: MICCAI EndoVis CholecTriplet2022 challenge report. Published at Elsevier journal of Medical Image Analysis. 25 pages, 15 figures, 8 tables

Journal ref: Medical Image Analysis, Volume 89, 2023, 102888, ISSN 1361-8415

arXiv:2212.14353 [pdf, other]

Sheaf-theoretic self-filtering network of low-cost sensors for local air quality monitoring: A causal approach

Authors: Anh-Duy Pham, Chuong Dinh Le, Hoang Viet Pham, Thinh Gia Tran, Dat Thanh Vo, Chau Long Tran, An Dinh Le, Hien Bich Vo

Abstract: Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using… ▽ More Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using commercial instruments. Traditional methods for air quality measurement often rely on calibrating the measurement with public standard instruments or calculating the measurements moving average over a constant period. However, this can lead to an incorrect index at the measurement location, as well as an oversmoothing effect on the signal. In this study, we propose a compact device that uses sheaf theory to detect and count vehicles as a local air quality change-causing factor. By inferring the number of vehicles into the PM2.5 index and propagating it into the recorded PM2.5 index from low-cost air monitoring sensors such as PMS7003 and BME280, we can achieve self-correction in real-time. Plus, the sheaf-theoretic method allows for easy scaling to multiple nodes for further filtering effects. By implementing sheaf theory in air quality monitoring, we can overcome the limitations of traditional methods and provide more accurate and reliable results. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2212.04313 [pdf]

Scalable, low-cost, and versatile system design for air pollution and traffic density monitoring and analysis

Authors: Thinh Gia Tran, Dat Thanh Vo, Long Chau Tran, Hoang Viet Pham, Chuong Dinh Le, An Dinh Le, Duy Anh Pham, Hien Bich Vo

Abstract: Vietnam requires a sustainable urbanization, for which city sensing is used in planning and de-cision-making. Large cities need portable, scalable, and inexpensive digital technology for this purpose. End-to-end air quality monitoring companies such as AirVisual and Plume Air have shown their reliability with portable devices outfitted with superior air sensors. They are pricey, yet homeowners use… ▽ More Vietnam requires a sustainable urbanization, for which city sensing is used in planning and de-cision-making. Large cities need portable, scalable, and inexpensive digital technology for this purpose. End-to-end air quality monitoring companies such as AirVisual and Plume Air have shown their reliability with portable devices outfitted with superior air sensors. They are pricey, yet homeowners use them to get local air data without evaluating the causal effect. Our air quality inspection system is scalable, reasonably priced, and flexible. Minicomputer of the sys-tem remotely monitors PMS7003 and BME280 sensor data through a microcontroller processor. The 5-megapixel camera module enables researchers to infer the causal relationship between traffic intensity and dust concentration. The design enables inexpensive, commercial-grade hardware, with Azure Blob storing air pollution data and surrounding-area imagery and pre-venting the system from physically expanding. In addition, by including an air channel that re-plenishes and distributes temperature, the design improves ventilation and safeguards electrical components. The gadget allows for the analysis of the correlation between traffic and air quali-ty data, which might aid in the establishment of sustainable urban development plans and poli-cies. △ Less

Submitted 8 December, 2022; originally announced December 2022.

arXiv:2211.08733 [pdf, other]

Comparing Subjective Perceptions of Robot-to-Human Handover Trajectories

Authors: Alexander Calvert, Wesley Chan, Tin Tran, Sara Sheikholeslami, Rhys Newbury, Akansel Cosgun, Elizabeth Croft

Abstract: Robots must move legibly around people for safety reasons, especially for tasks where physical contact is possible. One such task is handovers, which requires implicit communication on where and when physical contact (object transfer) occurs. In this work, we study whether the trajectory model used by a robot during the reaching phase affects the subjective perceptions of receivers for robot-to-hu… ▽ More Robots must move legibly around people for safety reasons, especially for tasks where physical contact is possible. One such task is handovers, which requires implicit communication on where and when physical contact (object transfer) occurs. In this work, we study whether the trajectory model used by a robot during the reaching phase affects the subjective perceptions of receivers for robot-to-human handovers. We conducted a user study where 32 participants were handed over three objects with four trajectory models: three were versions of a minimum jerk trajectory, and one was an ellipse-fitting-based trajectory. The start position of the handover was fixed for all trajectories, and the end position was allowed to vary randomly around a fixed position by $\pm$3 cm in all axis. The user study found no significant differences among the handover trajectories in survey questions relating to safety, predictability, naturalness, and other subjective metrics. While these results seemingly reject the hypothesis that the trajectory affects human perceptions of a handover, it prompts future research to investigate the effect of other variables, such as robot speed, object transfer position, object orientation at the transfer point, and explicit communication signals such as gaze and speech. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: Submitted to Australasian Conference on Robotics and Automation 2022. 9 pages, 4 figures

arXiv:2211.00466 [pdf, other]

doi 10.1109/INDIN51400.2023.10217993

Recognition of Defective Mineral Wool Using Pruned ResNet Models

Authors: Mehdi Rafiei, Dat Thanh Tran, Alexandros Iosifidis

Abstract: Mineral wool production is a non-linear process that makes it hard to control the final quality. Therefore, having a non-destructive method to analyze the product quality and recognize defective products is critical. For this purpose, we developed a visual quality control system for mineral wool. X-ray images of wool specimens were collected to create a training set of defective and non-defective… ▽ More Mineral wool production is a non-linear process that makes it hard to control the final quality. Therefore, having a non-destructive method to analyze the product quality and recognize defective products is critical. For this purpose, we developed a visual quality control system for mineral wool. X-ray images of wool specimens were collected to create a training set of defective and non-defective samples. Afterward, we developed several recognition models based on the ResNet architecture to find the most efficient model. In order to have a light-weight and fast inference model for real-life applicability, two structural pruning methods are applied to the classifiers. Considering the low quantity of the dataset, cross-validation and augmentation methods are used during the training. As a result, we obtained a model with more than 98% accuracy, which in comparison to the current procedure used at the company, it can recognize 20% more defective products. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 6 pages, 5 figures, 3 tables Submitted on IEEE Transactions on Industrial Informatics

arXiv:2209.11527 [pdf, other]

An artificial neural network-based system for detecting machine failures using tiny sound data: A case study

Authors: Thanh Tran, Sebastian Bader, Jan Lundgren

Abstract: In an effort to advocate the research for a deep learning-based machine failure detection system, we present a case study of our proposed system based on a tiny sound dataset. Our case study investigates a variational autoencoder (VAE) for augmenting a small drill sound dataset from Valmet AB. A Valmet dataset contains 134 sounds that have been divided into two categories: "Anomaly" and "Normal" r… ▽ More In an effort to advocate the research for a deep learning-based machine failure detection system, we present a case study of our proposed system based on a tiny sound dataset. Our case study investigates a variational autoencoder (VAE) for augmenting a small drill sound dataset from Valmet AB. A Valmet dataset contains 134 sounds that have been divided into two categories: "Anomaly" and "Normal" recorded from a drilling machine in Valmet AB, a company in Sundsvall, Sweden that supplies equipment and processes for the production of biofuels. Using deep learning models to detect failure drills on such a small sound dataset is typically unsuccessful. We employed a VAE to increase the number of sounds in the tiny dataset by synthesizing new sounds from original sounds. The augmented dataset was created by combining these synthesized sounds with the original sounds. We used a high-pass filter with a passband frequency of 1000 Hz and a low-pass filter with a passband frequency of 22\kern 0.16667em000 Hz to pre-process sounds in the augmented dataset before transforming them to Mel spectrograms. The pre-trained 2D-CNN Alexnet was then trained using these Mel spectrograms. When compared to using the original tiny sound dataset to train pre-trained Alexnet, using the augmented sound dataset enhanced the CNN model's classification results by 6.62\%(94.12\% when trained on the augmented dataset versus 87.5\% when trained on the original dataset). △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: 8 pages, 9 figures, conference

arXiv:2209.02611 [pdf, other]

Deep filter bank regression for super-resolution of anisotropic MR brain images

Authors: Samuel W. Remedios, Shuo Han, Yuan Xue, Aaron Carass, Trac D. Tran, Dzung L. Pham, Jerry L. Prince

Abstract: In 2D multi-slice magnetic resonance (MR) acquisition, the through-plane signals are typically of lower resolution than the in-plane signals. While contemporary super-resolution (SR) methods aim to recover the underlying high-resolution volume, the estimated high-frequency information is implicit via end-to-end data-driven training rather than being explicitly stated and sought. To address this, w… ▽ More In 2D multi-slice magnetic resonance (MR) acquisition, the through-plane signals are typically of lower resolution than the in-plane signals. While contemporary super-resolution (SR) methods aim to recover the underlying high-resolution volume, the estimated high-frequency information is implicit via end-to-end data-driven training rather than being explicitly stated and sought. To address this, we reframe the SR problem statement in terms of perfect reconstruction filter banks, enabling us to identify and directly estimate the missing information. In this work, we propose a two-stage approach to approximate the completion of a perfect reconstruction filter bank corresponding to the anisotropic acquisition of a particular scan. In stage 1, we estimate the missing filters using gradient descent and in stage 2, we use deep networks to learn the map** from coarse coefficients to detail coefficients. In addition, the proposed formulation does not rely on external training data, circumventing the need for domain shift correction. Under our approach, SR performance is improved particularly in "slice gap" scenarios, likely due to the constrained solution space imposed by the framework. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2208.04462 [pdf, other]

Denoising Induction Motor Sounds Using an Autoencoder

Authors: Thanh Tran, Sebastian Bader, Jan Lundgren

Abstract: Denoising is the process of removing noise from sound signals while improving the quality and adequacy of the sound signals. Denoising sound has many applications in speech processing, sound events classification, and machine failure detection systems. This paper describes a method for creating an autoencoder to map noisy machine sounds to clean sounds for denoising purposes. There are several typ… ▽ More Denoising is the process of removing noise from sound signals while improving the quality and adequacy of the sound signals. Denoising sound has many applications in speech processing, sound events classification, and machine failure detection systems. This paper describes a method for creating an autoencoder to map noisy machine sounds to clean sounds for denoising purposes. There are several types of noise in sounds, for example, environmental noise and generated frequency-dependent noise from signal processing methods. Noise generated by environmental activities is environmental noise. In the factory, environmental noise can be created by vehicles, drilling, people working or talking in the survey area, wind, and flowing water. Those noises appear as spikes in the sound record. In the scope of this paper, we demonstrate the removal of generated noise with Gaussian distribution and the environmental noise with a specific example of the water sink faucet noise from the induction motor sounds. The proposed method was trained and verified on 49 normal function sounds and 197 horizontal misalignment fault sounds from the Machinery Fault Database (MAFAULDA). The mean square error (MSE) was used as the assessment criteria to evaluate the similarity between denoised sounds using the proposed autoencoder and the original sounds in the test set. The MSE is below or equal to 0.14 when denoise both types of noises on 15 testing sounds of the normal function category. The MSE is below or equal to 0.15 when denoising 60 testing sounds on the horizontal misalignment fault category. The low MSE shows that both the generated Gaussian noise and the environmental noise were almost removed from the original sounds with the proposed trained autoencoder. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 9 pages, 10 figures, conference

arXiv:2206.05047 [pdf, other]

A GPU-Accelerated Light-field Super-resolution Framework Based on Mixed Noise Model and Weighted Regularization

Authors: Trung-Hieu Tran, Kaicong Sun, Sven Simon

Abstract: This paper presents a GPU-accelerated computational framework for reconstructing high resolution (HR) LF images under a mixed Gaussian-Impulse noise condition. The main focus is on develo** a high-performance approach considering processing speed and reconstruction quality. From a statistical perspective, we derive a joint $\ell^1$-$\ell^2$ data fidelity term for penalizing the HR reconstruction… ▽ More This paper presents a GPU-accelerated computational framework for reconstructing high resolution (HR) LF images under a mixed Gaussian-Impulse noise condition. The main focus is on develo** a high-performance approach considering processing speed and reconstruction quality. From a statistical perspective, we derive a joint $\ell^1$-$\ell^2$ data fidelity term for penalizing the HR reconstruction error taking into account the mixed noise situation. For regularization, we employ the weighted non-local total variation approach, which allows us to effectively realize LF image prior through a proper weighting scheme. We show that the alternating direction method of multipliers algorithm (ADMM) can be used to simplify the computation complexity and results in a high-performance parallel computation on the GPU Platform. An extensive experiment is conducted on both synthetic 4D LF dataset and natural image dataset to validate the proposed SR model's robustness and evaluate the accelerated optimizer's performance. The experimental results show that our approach achieves better reconstruction quality under severe mixed-noise conditions as compared to the state-of-the-art approaches. In addition, the proposed approach overcomes the limitation of the previous work in handling large-scale SR tasks. While fitting within a single off-the-shelf GPU, the proposed accelerator provides an average speedup of 2.46$\times$ and 1.57$\times$ for $\times 2$ and $\times 3$ SR tasks, respectively. In addition, a speedup of $77\times$ is achieved as compared to CPU execution. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2205.10448 [pdf, other]

doi 10.1109/TSP.2022.3167516

Approximate Message Passing with Parameter Estimation for Heavily Quantized Measurements

Authors: Shuai Huang, Deqiang Qiu, Trac D. Tran

Abstract: Designing efficient sparse recovery algorithms that could handle noisy quantized measurements is important in a variety of applications -- from radar to source localization, spectrum sensing and wireless networking. We take advantage of the approximate message passing (AMP) framework to achieve this goal given its high computational efficiency and state-of-the-art performance. In AMP, the signal o… ▽ More Designing efficient sparse recovery algorithms that could handle noisy quantized measurements is important in a variety of applications -- from radar to source localization, spectrum sensing and wireless networking. We take advantage of the approximate message passing (AMP) framework to achieve this goal given its high computational efficiency and state-of-the-art performance. In AMP, the signal of interest is assumed to follow certain prior distribution with unknown parameters. Previous works focused on finding the parameters that maximize the measurement likelihood via expectation maximization -- an increasingly difficult problem to solve in cases involving complicated probability models. In this paper, we treat the parameters as unknown variables and compute their posteriors via AMP. The parameters and signal of interest can then be jointly recovered. Compared to previous methods, the proposed approach leads to a simple and elegant parameter estimation scheme, allowing us to directly work with 1-bit quantization noise model. We then further extend our approach to general multi-bit quantization noise model. Experimental results show that the proposed framework provides significant improvement over state-of-the-art methods across a wide range of sparsity and noise levels. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: arXiv admin note: text overlap with arXiv:2007.07679

Journal ref: IEEE Transactions on Signal Processing, Vol. 70, pp. 2062-2077, Apr. 2022

arXiv:2205.05590 [pdf, other]

A neural prosody encoder for end-ro-end dialogue act classification

Authors: Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Muller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo

Abstract: Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems. Prosodic features such as energy and pitch have been shown to be useful for DAC. Despite their importance, little research has explored neural approaches to integrate prosodic features into end-to-end (E2E) DAC models which infer dialogue acts directly from audio signals. In this work, we pr… ▽ More Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems. Prosodic features such as energy and pitch have been shown to be useful for DAC. Despite their importance, little research has explored neural approaches to integrate prosodic features into end-to-end (E2E) DAC models which infer dialogue acts directly from audio signals. In this work, we propose an E2E neural architecture that takes into account the need for characterizing prosodic phenomena co-occurring at different levels inside an utterance. A novel part of this architecture is a learnable gating mechanism that assesses the importance of prosodic features and selectively retains core information necessary for E2E DAC. Our proposed model improves DAC accuracy by 1.07% absolute across three publicly available benchmark datasets. △ Less

Submitted 11 May, 2022; originally announced May 2022.

arXiv:2203.10612 [pdf, ps, other]

PediCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children

Authors: Hieu H. Pham, Ngoc H. Nguyen, Thanh T. Tran, Tuan N. M. Nguyen, Ha Q. Nguyen

Abstract: The development of diagnostic models for detecting and diagnosing pediatric diseases in CXR scans is undertaken due to the lack of high-quality physician-annotated datasets. To overcome this challenge, we introduce and release PediCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021. Each scan was manually anno… ▽ More The development of diagnostic models for detecting and diagnosing pediatric diseases in CXR scans is undertaken due to the lack of high-quality physician-annotated datasets. To overcome this challenge, we introduce and release PediCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021. Each scan was manually annotated by a pediatric radiologist with more than ten years of experience. The dataset was labeled for the presence of 36 critical findings and 15 diseases. In particular, each abnormal finding was identified via a rectangle bounding box on the image. To the best of our knowledge, this is the first and largest pediatric CXR dataset containing lesion-level annotations and image-level labels for the detection of multiple findings and diseases. For algorithm development, the dataset was divided into a training set of 7,728 and a test set of 1,397. To encourage new advances in pediatric CXR interpretation using data-driven approaches, we provide a detailed description of the PediCXR data sample and make the dataset publicly available on https://physionet.org/content/pedicxr/1.0.0/ △ Less

Submitted 20 March, 2023; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: Accepted by Scientific Data (Nature). arXiv admin note: text overlap with arXiv:2012.15029

arXiv:2201.01294 [pdf, other]

doi 10.1016/j.sigpro.2021.108373

3DVSR: 3D EPI Volume-based Approach for Angular and Spatial Light field Image Super-resolution

Authors: Trung-Hieu Tran, Jan Berberich, Sven Simon

Abstract: Light field (LF) imaging, which captures both spatial and angular information of a scene, is undoubtedly beneficial to numerous applications. Although various techniques have been proposed for LF acquisition, achieving both angularly and spatially high-resolution LF remains a technology challenge. In this paper, a learning-based approach applied to 3D epipolar image (EPI) is proposed to reconstruc… ▽ More Light field (LF) imaging, which captures both spatial and angular information of a scene, is undoubtedly beneficial to numerous applications. Although various techniques have been proposed for LF acquisition, achieving both angularly and spatially high-resolution LF remains a technology challenge. In this paper, a learning-based approach applied to 3D epipolar image (EPI) is proposed to reconstruct high-resolution LF. Through a 2-stage super-resolution framework, the proposed approach effectively addresses various LF super-resolution (SR) problems, i.e., spatial SR, angular SR, and angular-spatial SR. While the first stage provides flexible options to up-sample EPI volume to the desired resolution, the second stage, which consists of a novel EPI volume-based refinement network (EVRN), substantially enhances the quality of the high-resolution EPI volume. An extensive evaluation on 90 challenging synthetic and real-world light field scenes from 7 published datasets shows that the proposed approach outperforms state-of-the-art methods to a large extend for both spatial and angular super-resolution problem, i.e., an average peak signal to noise ratio improvement of more than 2.0 dB, 1.4 dB, and 3.14 dB in spatial SR $\times 2$, spatial SR $\times 4$, and angular SR respectively. The reconstructed 4D light field demonstrates a balanced performance distribution across all perspective images and presents superior visual quality compared to the previous works. △ Less

Submitted 4 January, 2022; originally announced January 2022.

arXiv:2110.02417 [pdf, other]

CADA: Multi-scale Collaborative Adversarial Domain Adaptation for Unsupervised Optic Disc and Cup Segmentation

Authors: Peng Liu, Charlie T. Tran, Bin Kong, Ruogu Fang

Abstract: The diversity of retinal imaging devices poses a significant challenge: domain shift, which leads to performance degradation when applying the deep learning models trained on one domain to new testing domains. In this paper, we propose a multi-scale input along with multiple domain adaptors applied hierarchically in both feature and output spaces. The proposed training strategy and novel unsupervi… ▽ More The diversity of retinal imaging devices poses a significant challenge: domain shift, which leads to performance degradation when applying the deep learning models trained on one domain to new testing domains. In this paper, we propose a multi-scale input along with multiple domain adaptors applied hierarchically in both feature and output spaces. The proposed training strategy and novel unsupervised domain adaptation framework, called Collaborative Adversarial Domain Adaptation (CADA), can effectively overcome the challenge. Multi-scale inputs can reduce the information loss due to the pooling layers used in the network for feature extraction, while our proposed CADA is an interactive paradigm that presents an exquisite collaborative adaptation through both adversarial learning and ensembling weights at different network layers. In particular, to produce a better prediction for the unlabeled target domain data, we simultaneously achieve domain invariance and model generalizability via adversarial learning at multi-scale outputs from different levels of network layers and maintaining an exponential moving average (EMA) of the historical weights during training. Without annotating any sample from the target domain, multiple adversarial losses in encoder and decoder layers guide the extraction of domain-invariant features to confuse the domain classifier. Meanwhile, the ensembling of weights via EMA reduces the uncertainty of adapting multiple discriminator learning. Comprehensive experimental results demonstrate that our CADA model incorporating multi-scale input training can overcome performance degradation and outperform state-of-the-art domain adaptation methods in segmenting retinal optic disc and cup from fundus images stemming from the REFUGE, Drishti-GS, and Rim-One-r3 datasets. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Comments: arXiv admin note: text overlap with arXiv:1910.07638

arXiv:2109.01184 [pdf, other]

Remote Multilinear Compressive Learning with Adaptive Compression

Authors: Dat Thanh Tran, Moncef Gabbouj, Alexandros Iosifidis

Abstract: Multilinear Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals. The level of signal compression affects the detection or classification performance of a MCL model, with higher compression rates often associated with lower inference accuracy. However, higher compression rates are more amenable to a wider range of applications, especially… ▽ More Multilinear Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals. The level of signal compression affects the detection or classification performance of a MCL model, with higher compression rates often associated with lower inference accuracy. However, higher compression rates are more amenable to a wider range of applications, especially those that require low operating bandwidth and minimal energy consumption such as Internet-of-Things (IoT) applications. Many communication protocols provide support for adaptive data transmission to maximize the throughput and minimize energy consumption. By develo** compressive sensing and learning models that can operate with an adaptive compression rate, we can maximize the informational content throughput of the whole application. In this paper, we propose a novel optimization scheme that enables such a feature for MCL models. Our proposal enables practical implementation of adaptive compressive signal acquisition and inference systems. Experimental results demonstrated that the proposed approach can significantly reduce the amount of computations required during the training phase of remote learning systems but also improve the informational content throughput via adaptive-rate sensing. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: 2 figures, 6 tables

arXiv:2108.11089 [pdf, other]

Detecting Drill Failure in the Small Short-sound Drill Dataset

Authors: Thanh Tran, Nhat Truong Pham, Jan Lundgren

Abstract: Monitoring the conditions of machines is vital in the manufacturing industry. Early detection of faulty components in machines for stop** and repairing the failed components can minimize the downtime of the machine. This article presents an approach to detect the failure occurring in drill machines based on drill sounds from Valmet AB. The drill dataset includes three classes: anomalous sounds,… ▽ More Monitoring the conditions of machines is vital in the manufacturing industry. Early detection of faulty components in machines for stop** and repairing the failed components can minimize the downtime of the machine. This article presents an approach to detect the failure occurring in drill machines based on drill sounds from Valmet AB. The drill dataset includes three classes: anomalous sounds, normal sounds, and irrelevant sounds, which are also labeled as "Broken", "Normal", and "Other", respectively. Detecting drill failure effectively remains a challenge due to the following reasons. The waveform of drill sound is complex and short for detection. Additionally, in realistic soundscapes, there are sounds and noise in the context at the same time. Moreover, the balanced dataset is small to apply state-of-the-art deep learning techniques. To overcome these aforementioned difficulties, we augmented sounds to increase the number of sounds in the dataset. We then proposed a convolutional neural network (CNN) combined with a long short-term memory (LSTM) to extract features from log-Mel spectrograms and learn global high-level feature representation for the classification of three classes. A leaky rectified linear unit (Leaky ReLU) was utilized as the activation function for our proposed CNN instead of the rectified linear unit (ReLU). Moreover, we deployed an attention mechanism at the frame level after the LSTM layer to learn long-term global feature representations. As a result, the proposed method reached an overall accuracy of 92.35% for the drill failure detection system. △ Less

Submitted 9 November, 2021; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: 8 pages, 10 figures, journal

arXiv:2108.06486 [pdf, other]

Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural Networks

Authors: Thanh T. Tran, Hieu H. Pham, Thang V. Nguyen, Tung T. Le, Hieu T. Nguyen, Ha Q. Nguyen

Abstract: Chest radiograph (CXR) interpretation in pediatric patients is error-prone and requires a high level of understanding of radiologic expertise. Recently, deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting CXR in adults. However, there is a lack of evidence indicating that D-CNNs can recognize accurately multiple lung pathologies from pediatric CXR scans. I… ▽ More Chest radiograph (CXR) interpretation in pediatric patients is error-prone and requires a high level of understanding of radiologic expertise. Recently, deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting CXR in adults. However, there is a lack of evidence indicating that D-CNNs can recognize accurately multiple lung pathologies from pediatric CXR scans. In particular, the development of diagnostic models for the detection of pediatric chest diseases faces significant challenges such as (i) lack of physician-annotated datasets and (ii) class imbalance problems. In this paper, we retrospectively collect a large dataset of 5,017 pediatric CXR scans, for which each is manually labeled by an experienced radiologist for the presence of 10 common pathologies. A D-CNN model is then trained on 3,550 annotated scans to classify multiple pediatric lung pathologies automatically. To address the high-class imbalance issue, we propose to modify and apply "Distribution-Balanced loss" for training D-CNNs which reshapes the standard Binary-Cross Entropy loss (BCE) to efficiently learn harder samples by down-weighting the loss assigned to the majority classes. On an independent test set of 777 studies, the proposed approach yields an area under the receiver operating characteristic (AUC) of 0.709 (95% CI, 0.690-0.729). The sensitivity, specificity, and F1-score at the cutoff value are 0.722 (0.694-0.750), 0.579 (0.563-0.595), and 0.389 (0.373-0.405), respectively. These results significantly outperform previous state-of-the-art methods on most of the target diseases. Moreover, our ablation studies validate the effectiveness of the proposed loss function compared to other standard losses, e.g., BCE and Focal Loss, for this learning task. Overall, we demonstrate the potential of D-CNNs in interpreting pediatric CXRs. △ Less

Submitted 14 August, 2021; originally announced August 2021.

Comments: This is a preprint of our paper which was accepted for publication to ICCV Workshop 2021

arXiv:2108.04315 [pdf, other]

FL-MISR: Fast Large-Scale Multi-Image Super-Resolution for Computed Tomography Based on Multi-GPU Acceleration

Authors: Kaicong Sun, Trung-Hieu Tran, Jajnabalkya Guhathakurta, Sven Simon

Abstract: Multi-image super-resolution (MISR) usually outperforms single-image super-resolution (SISR) under a proper inter-image alignment by explicitly exploiting the inter-image correlation. However, the large computational demand encumbers the deployment of MISR in practice. In this work, we propose a distributed optimization framework based on data parallelism for fast large-scale MISR using multi-GPU… ▽ More Multi-image super-resolution (MISR) usually outperforms single-image super-resolution (SISR) under a proper inter-image alignment by explicitly exploiting the inter-image correlation. However, the large computational demand encumbers the deployment of MISR in practice. In this work, we propose a distributed optimization framework based on data parallelism for fast large-scale MISR using multi-GPU acceleration named FL-MISR. The scaled conjugate gradient (SCG) algorithm is applied to the distributed subfunctions and the local SCG variables are communicated to synchronize the convergence rate over multi-GPU systems towards a consistent convergence. Furthermore, an inner-outer border exchange scheme is performed to obviate the border effect between neighboring GPUs. The proposed FL-MISR is applied to the computed tomography (CT) system by super-resolving the projections acquired by subpixel detector shift. The SR reconstruction is performed on the fly during the CT acquisition such that no additional computation time is introduced. FL-MISR is extensively evaluated from different aspects and experimental results demonstrate that FL-MISR effectively improves the spatial resolution of CT systems in modulation transfer function (MTF) and visual perception. Comparing to a multi-core CPU implementation, FL-MISR achieves a more than 50x speedup on an off-the-shelf 4-GPU system. △ Less

Submitted 5 October, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

arXiv:2108.00536 [pdf]

Real-Time ECG Interval Monitoring Using a Fully Disposable Wireless Patch Sensor

Authors: Gabriel Nallathambi, Paurakh Rajbhandary, Thang Tran, Olivier Colliou

Abstract: ECG interval monitoring provides key insights into the diagnosis of cardiac diseases. The standard 12-lead ECG is generally used, however, because of the current COVID-19 pandemic there is a strong need for a remote monitoring solution which will reduce exposure of health care providers to coronavirus. This article presents a disposable wireless patch biosensor (VitalPatch) and associated platform… ▽ More ECG interval monitoring provides key insights into the diagnosis of cardiac diseases. The standard 12-lead ECG is generally used, however, because of the current COVID-19 pandemic there is a strong need for a remote monitoring solution which will reduce exposure of health care providers to coronavirus. This article presents a disposable wireless patch biosensor (VitalPatch) and associated platform functionalities for real-time continuous measurement of clinically relevant ECG intervals including PR interval, QRS duration, QT interval, corrected QT interval by Bazett (QTb), and corrected QT interval by Fridericia (QTf). The performance of the VitalPatch is validated by comparing its automated algorithm interval measurements to the manually annotated global intervals of the 12-lead ECG device in 30 subjects. The accuracy of interval monitoring (in terms of mean timing error calculated by subtracting the VitalPatch measurements from the global intervals) is 2.7+/-15.94 ms, -1.97+/-12.29 ms, -14.6+/-12.97 ms, - 15.33+/-14.11 ms, and -15.08+/-13.69 ms for PR interval, QRS duration, QT interval, QTb, and QTf, respectively. These results demonstrate that the VitalPatch is a viable solution for measuring ECG intervals while taking advantage of its remote monitoring feature during the pandemic. △ Less

Submitted 1 August, 2021; originally announced August 2021.

Comments: Presented at 2021 IEEE 17th International Conference on Wearable and Implantable Body Sensor Networks (BSN)

arXiv:2107.11703 [pdf]

One-Leg Stance of Humanoid Robot using Active Balance Control

Authors: Tri Duc Tran, Anh Khoa Lanh Luu, Van Tu Duong, Huy Hung Nguyen, Tan Tien Nguyen

Abstract: The task of self-balancing is one of the most important tasks when develo** humanoid robots. This paper proposes a novel external balance mechanism for humanoid robot to maintain sideway balance. First, a dynamic model of the humanoid robot with balance mechanism and its simplified model are introduced. Secondly, a backstep**-based control method is utilized to split the system into two sub-sy… ▽ More The task of self-balancing is one of the most important tasks when develo** humanoid robots. This paper proposes a novel external balance mechanism for humanoid robot to maintain sideway balance. First, a dynamic model of the humanoid robot with balance mechanism and its simplified model are introduced. Secondly, a backstep**-based control method is utilized to split the system into two sub-systems. Then, a minimum observer-based controller is used to control the first sub-system. Since the second sub-system has unknown parameters, a model reference adaptive controller (MRAC) is used to control it. The proposed design divides the walking and balancing into two separated tasks, allowing the walking control can be executed independently of the balancing control. Furthermore, the use of the balance mechanism ensures the humanoid robot's hip movement does not exceed the threshold of a human when walking. Thus, making the overall pose of the humanoid robot looks more natural. An experiment is carried out on a commercial humanoid robot known as UXA-90 to evaluate the effectiveness of the proposed method. △ Less

Submitted 24 July, 2021; originally announced July 2021.

arXiv:2105.02706 [pdf]

Mobile Robot Localization Using Fuzzy Neural Network Based Extended Kalman Filter

Authors: Thi Thanh Van Nguyen, Manh Duong Phung, Thuan Hoang Tran, Quang Vinh Tran

Abstract: This paper proposes a novel approach to improve the performance of the extended Kalman filter (EKF) for the problem of mobile robot localization. A fuzzy logic system is employed to continuous-ly adjust the noise covariance matrices of the filter. A neural network is implemented to regulate the membership functions of the antecedent and consequent parts of the fuzzy rules. The aim is to gain the a… ▽ More This paper proposes a novel approach to improve the performance of the extended Kalman filter (EKF) for the problem of mobile robot localization. A fuzzy logic system is employed to continuous-ly adjust the noise covariance matrices of the filter. A neural network is implemented to regulate the membership functions of the antecedent and consequent parts of the fuzzy rules. The aim is to gain the accuracy and avoid the divergence of the EKF when the noise covariance matrices are fixed or wrongly determined. Simulations and experiments have been conducted. The results show that the proposed filter is better than the EKF in localizing the mobile robot. △ Less

Submitted 6 May, 2021; originally announced May 2021.

arXiv:2102.11677 [pdf, other]

Cell abundance aware deep learning for cell detection on highly imbalanced pathological data

Authors: Yeman Brhane Hagos, Catherine SY Lecat, Dominic Patel, Lydia Lee, Thien-An Tran, Manuel Rodriguez- Justo, Kwee Yong, Yinyin Yuan

Abstract: Automated analysis of tissue sections allows a better understanding of disease biology and may reveal biomarkers that could guide prognosis or treatment selection. In digital pathology, less abundant cell types can be of biological significance, but their scarcity can result in biased and sub-optimal cell detection model. To minimize the effect of cell imbalance on cell detection, we proposed a de… ▽ More Automated analysis of tissue sections allows a better understanding of disease biology and may reveal biomarkers that could guide prognosis or treatment selection. In digital pathology, less abundant cell types can be of biological significance, but their scarcity can result in biased and sub-optimal cell detection model. To minimize the effect of cell imbalance on cell detection, we proposed a deep learning pipeline that considers the abundance of cell types during model training. Cell weight images were generated, which assign larger weights to less abundant cells and used the weights to regularize Dice overlap loss function. The model was trained and evaluated on myeloma bone marrow trephine samples. Our model obtained a cell detection F1-score of 0.78, a 2% increase compared to baseline models, and it outperformed baseline models at detecting rare cell types. We found that scaling deep learning loss function by the abundance of cells improves cell detection performance. Our results demonstrate the importance of incorporating domain knowledge on deep learning methods for pathological data with class imbalance. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: Accepted at The IEEE International Symposium on Biomedical Imaging (ISBI) 2021, 5 pages, 5 figures

arXiv:2102.09666 [pdf, other]

Dynamic curriculum learning via data parameters for noise robust keyword spotting

Authors: Takuya Higuchi, Shreyas Saxena, Mehrez Souden, Tien Dung Tran, Masood Delfarah, Chandra Dhir

Abstract: We propose dynamic curriculum learning via data parameters for noise robust keyword spotting. Data parameter learning has recently been introduced for image processing, where weight parameters, so-called data parameters, for target classes and instances are introduced and optimized along with model parameters. The data parameters scale logits and control importance over classes and instances durin… ▽ More We propose dynamic curriculum learning via data parameters for noise robust keyword spotting. Data parameter learning has recently been introduced for image processing, where weight parameters, so-called data parameters, for target classes and instances are introduced and optimized along with model parameters. The data parameters scale logits and control importance over classes and instances during training, which enables automatic curriculum learning without additional annotations for training data. Similarly, in this paper, we propose using this curriculum learning approach for acoustic modeling, and train an acoustic model on clean and noisy utterances with the data parameters. The proposed approach automatically learns the difficulty of the classes and instances, e.g. due to low speech to noise ratio (SNR), in the gradient descent optimization and performs curriculum learning. This curriculum learning leads to overall improvement of the accuracy of the acoustic model. We evaluate the effectiveness of the proposed approach on a keyword spotting task. Experimental results show 7.7% relative reduction in false reject ratio with the data parameters compared to a baseline model which is simply trained on the multiconditioned dataset. △ Less

Submitted 18 February, 2021; originally announced February 2021.

Comments: Accepted at ICASSP 2021

arXiv:2101.08681 [pdf, other]

Streaming from the Air: Enabling Drone-sourced Video Streaming Applications on 5G Open-RAN Architectures

Authors: Lorenzo Bertizzolo, Tuyen X. Tran, John Buczek, Bharath Balasubramanian, Rittwik Jana, Yu Zhou, Tommaso Melodia

Abstract: Enabling high data-rate uplink cellular connectivity for drones is a challenging problem, since a flying drone has a higher likelihood of having line-of-sight propagation to base stations that terrestrial UEs normally do not have line-of-sight to. This may result in uplink inter-cell interference and uplink performance degradation for the neighboring ground UEs when drones transmit at high data-ra… ▽ More Enabling high data-rate uplink cellular connectivity for drones is a challenging problem, since a flying drone has a higher likelihood of having line-of-sight propagation to base stations that terrestrial UEs normally do not have line-of-sight to. This may result in uplink inter-cell interference and uplink performance degradation for the neighboring ground UEs when drones transmit at high data-rates (e.g., video streaming). We address this problem from a cellular operator's standpoint to support drone-sourced video streaming of a point of interest. We propose a low-complexity, closed-loop control system for Open-RAN architectures that jointly optimizes the drone's location in space and its transmission directionality to support video streaming and minimize its uplink interference impact on the network. We prototype and experimentally evaluate the proposed control system on a dedicated outdoor multi-cell RAN testbed, which is the first measurement campaign of its kind. Furthermore, we perform a large-scale simulation assessment of the proposed control system using the actual cell deployment topologies and cell load profiles of a major US cellular carrier. The proposed Open-RAN control scheme achieves an average 19% network capacity gain over traditional BS-constrained control solutions and satisfies the application data-rate requirements of the drone (e.g., to stream an HD video). △ Less

Submitted 16 November, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

arXiv:2007.14564 [pdf, other]

Bayesian Massive MIMO Channel Estimation with Parameter Estimation Using Low-Resolution ADCs

Authors: Shuai Huang, Deqiang Qiu, Trac D. Tran

Abstract: In order to reduce hardware complexity and power consumption, massive multiple-input multiple-output (MIMO) systems employ low-resolution analog-to-digital converters (ADCs) to acquire quantized measurements $\boldsymbol y$. This poses new challenges to the channel estimation problem, and the sparse prior on the channel coefficient vector $\boldsymbol x$ in the angle domain is often used to compen… ▽ More In order to reduce hardware complexity and power consumption, massive multiple-input multiple-output (MIMO) systems employ low-resolution analog-to-digital converters (ADCs) to acquire quantized measurements $\boldsymbol y$. This poses new challenges to the channel estimation problem, and the sparse prior on the channel coefficient vector $\boldsymbol x$ in the angle domain is often used to compensate for the information lost during quantization. By interpreting the sparse prior from a probabilistic perspective, we can assume $\boldsymbol x$ follows certain sparse prior distribution and recover it using approximate message passing (AMP). However, the distribution parameters are unknown in practice and need to be estimated. Due to the increased computational complexity in the quantization noise model, previous works either use an approximated noise model or manually tune the noise distribution parameters. In this paper, we treat both signals and parameters as random variables and recover them jointly within the AMP framework. The proposed approach leads to a much simpler parameter estimation method, allowing us to work with the quantization noise model directly. Experimental results show that the proposed approach achieves state-of-the-art performance under various noise levels and does not require parameter tuning, making it a practical and maintenance-free approach for channel estimation. △ Less

Submitted 11 February, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

arXiv:2007.07679 [pdf, other]

1-Bit Compressive Sensing via Approximate Message Passing with Built-in Parameter Estimation

Authors: Shuai Huang, Trac D. Tran

Abstract: 1-bit compressive sensing aims to recover sparse signals from quantized 1-bit measurements. Designing efficient approaches that could handle noisy 1-bit measurements is important in a variety of applications. In this paper we use the approximate message passing (AMP) to achieve this goal due to its high computational efficiency and state-of-the-art performance. In AMP the signal of interest is ass… ▽ More 1-bit compressive sensing aims to recover sparse signals from quantized 1-bit measurements. Designing efficient approaches that could handle noisy 1-bit measurements is important in a variety of applications. In this paper we use the approximate message passing (AMP) to achieve this goal due to its high computational efficiency and state-of-the-art performance. In AMP the signal of interest is assumed to follow some prior distribution, and its posterior distribution can be computed and used to recover the signal. In practice, the parameters of the prior distributions are often unknown and need to be estimated. Previous works tried to find the parameters that maximize either the measurement likelihood via expectation maximization, which becomes increasingly difficult to solve in cases of complicated probability models. Here we propose to treat the parameters as unknown variables and compute their posteriors via AMP as well, so that the parameters and the signal can be recovered jointly. Compared to previous methods, the proposed approach leads to a simple and elegant parameter estimation scheme, allowing us to directly work with 1-bit quantization noise model. Experimental results show that the proposed approach generally perform much better than the other state-of-the-art methods in the zero-noise and moderate-noise regimes, and outperforms them in most of the cases in the high-noise regime. △ Less

Submitted 26 April, 2022; v1 submitted 15 July, 2020; originally announced July 2020.

arXiv:2005.07068 [pdf]

Recognition of 26 Degrees of Freedom of Hands Using Model-based approach and Depth-Color Images

Authors: Cong Hoang Quach, Minh Trien Pham, Anh Viet Dang, Dinh Tuan Pham, Thuan Hoang Tran, Manh Duong Phung

Abstract: In this study, we present an model-based approach to recognize full 26 degrees of freedom of a human hand. Input data include RGB-D images acquired from a Kinect camera and a 3D model of the hand constructed from its anatomy and graphical matrices. A cost function is then defined so that its minimum value is achieved when the model and observation images are matched. To solve the optimization prob… ▽ More In this study, we present an model-based approach to recognize full 26 degrees of freedom of a human hand. Input data include RGB-D images acquired from a Kinect camera and a 3D model of the hand constructed from its anatomy and graphical matrices. A cost function is then defined so that its minimum value is achieved when the model and observation images are matched. To solve the optimization problem in 26 dimensional space, the particle swarm optimization algorimth with improvements are used. In addition, parallel computation in graphical processing units (GPU) is utilized to handle computationally expensive tasks. Simulation and experimental results show that the system can recognize 26 degrees of freedom of hands with the processing time of 0.8 seconds per frame. The algorithm is robust to noise and the hardware requirement is simple with a single camera. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: in Proceedings of the 2014 National Conference on Electronics, Communications and Information Technology (REV-ECIT). in Vietnamese language

arXiv:2005.06179 [pdf]

Using multiple sensors for autonomous mobile robot navigation

Authors: Thuan Hoang Tran, Manh Duong Phung, Anh Viet Dang, Quang Vinh Tran

Abstract: This paper presents the use of multi-sensor measurement system to guide autonomous mobile robot in the house. The system allows the 3D image acquisition to global map**, and algorithms to reduce the dimensionality of images to 2D global map navigation, trajectory design approach using the Lyapunov function method and avoid obstacles by the potential energy can also be presented. Also, sensor int… ▽ More This paper presents the use of multi-sensor measurement system to guide autonomous mobile robot in the house. The system allows the 3D image acquisition to global map**, and algorithms to reduce the dimensionality of images to 2D global map navigation, trajectory design approach using the Lyapunov function method and avoid obstacles by the potential energy can also be presented. Also, sensor integrated method based on extended Kalman filter allows us to identify the exact location and orientation of the robot in the presence of interference from the environment. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: in Proceeding of The 6th Vietnam Conference on Mechatronics (VCM 2012). in Vietnamese language

arXiv:2005.06175 [pdf]

Stabilization control of networked mobile robot using past observation-based preditive filter

Authors: Manh Duong Phung, Thi Thanh Van Nguyen, Thuan Hoang Tran, Quang Vinh Tran

Abstract: This paper addresses the stabilization control problem for networked mobile robot subject to communication delay. A new state estimation filter namely past observation-based predictive filter is developed. This filter enables the prediction of system state from delayed measurement. The state estimator combined with developed control laws ensures the asymptotic stability of the networked system. Si… ▽ More This paper addresses the stabilization control problem for networked mobile robot subject to communication delay. A new state estimation filter namely past observation-based predictive filter is developed. This filter enables the prediction of system state from delayed measurement. The state estimator combined with developed control laws ensures the asymptotic stability of the networked system. Simulations with parameters extracted from a real robot system were conducted and results confirmed the correctness as well as applicability of proposed approach. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: in Proceeding of The 6th Vietnam Conference on Mechatronics (VCM 2012). in Vietnamese language

arXiv:2004.06366 [pdf, other]

Simple Multi-Resolution Representation Learning for Human Pose Estimation

Authors: Trung Q. Tran, Giang V. Nguyen, Daeyoung Kim

Abstract: Human pose estimation - the process of recognizing human keypoints in a given image - is one of the most important tasks in computer vision and has a wide range of applications including movement diagnostics, surveillance, or self-driving vehicle. The accuracy of human keypoint prediction is increasingly improved thanks to the burgeoning development of deep learning. Most existing methods solved h… ▽ More Human pose estimation - the process of recognizing human keypoints in a given image - is one of the most important tasks in computer vision and has a wide range of applications including movement diagnostics, surveillance, or self-driving vehicle. The accuracy of human keypoint prediction is increasingly improved thanks to the burgeoning development of deep learning. Most existing methods solved human pose estimation by generating heatmaps in which the ith heatmap indicates the location confidence of the ith keypoint. In this paper, we introduce novel network structures referred to as multi-resolution representation learning for human keypoint prediction. At different resolutions in the learning process, our networks branch off and use extra layers to learn heatmap generation. We firstly consider the architectures for generating the multi-resolution heatmaps after obtaining the lowest-resolution feature maps. Our second approach allows learning during the process of feature extraction in which the heatmaps are generated at each resolution of the feature extractor. The first and second approaches are referred to as multi-resolution heatmap learning and multi-resolution feature map learning respectively. Our architectures are simple yet effective, achieving good performance. We conducted experiments on two common benchmarks for human pose estimation: MSCOCO and MPII dataset. The code is made publicly available at https://github.com/tqtrunghnvn/SimMRPose. △ Less

Submitted 22 January, 2021; v1 submitted 14 April, 2020; originally announced April 2020.

arXiv:1908.06239 [pdf, other]

doi 10.1109/ACCESS.2019.2953983

Impacts of Retina-related Zones on Quality Perception of Omnidirectional Image

Authors: Huyen T. T. Tran, Duc V. Nguyen, Nam Pham Ngoc, Trang H. Hoang, Truong Thu Huong, Truong Cong Thang

Abstract: Virtual Reality (VR), which brings immersive experiences to viewers, has been gaining popularity in recent years. A key feature in VR systems is the use of omnidirectional content, which provides 360-degree views of scenes. In this work, we study the human quality perception of omnidirectional images, focusing on different zones surrounding the foveation point. For that purpose, an extensive subje… ▽ More Virtual Reality (VR), which brings immersive experiences to viewers, has been gaining popularity in recent years. A key feature in VR systems is the use of omnidirectional content, which provides 360-degree views of scenes. In this work, we study the human quality perception of omnidirectional images, focusing on different zones surrounding the foveation point. For that purpose, an extensive subjective experiment is carried out to assess the perceptual quality of omnidirectional images with non-uniform quality. Through experimental results, the impacts of different zones are analyzed. Moreover, nineteen objective quality metrics, including foveal quality metrics, are evaluated using our database. It is quantitatively shown that the zones corresponding to the fovea and parafovea of human eyes are extremely important for quality perception, while the impacts of the other zones corresponding to the perifovea and periphery are small. Besides, the investigated metrics are found to be not effective enough to reflect the quality perceived by viewers. △ Less

Submitted 17 August, 2019; originally announced August 2019.

Comments: IEEE Access, 2019

arXiv:1907.10102 [pdf, other]

Next-generation Wireless Solutions for the Smart Factory, Smart Vehicles, the Smart Grid and Smart Cities

Authors: Tai Manh Ho, Thinh Duy Tran, Ti Ti Nguyen, S. M. Ahsan Kazmi, Long Bao Le, Choong Seon Hong, Lajos Hanzo

Abstract: 5G wireless systems will extend mobile communication services beyond mobile telephony, mobile broadband, and massive machine-type communication into new application domains, namely the so-called vertical domains including the smart factory, smart vehicles, smart grid, smart city, etc. Supporting these vertical domains comes with demanding requirements: high-availability, high-reliability, low-late… ▽ More 5G wireless systems will extend mobile communication services beyond mobile telephony, mobile broadband, and massive machine-type communication into new application domains, namely the so-called vertical domains including the smart factory, smart vehicles, smart grid, smart city, etc. Supporting these vertical domains comes with demanding requirements: high-availability, high-reliability, low-latency, and in some cases, high-accuracy positioning. In this survey, we first identify the potential key performance requirements of 5G communication in support of automation in the vertical domains and highlight the 5G enabling technologies conceived for meeting these requirements. We then discuss the key challenges faced both by industry and academia which have to be addressed in order to support automation in the vertical domains. We also provide a survey of the related research dedicated to automation in the vertical domains. Finally, our vision of 6G wireless systems is discussed briefly. △ Less

Submitted 23 July, 2019; originally announced July 2019.

arXiv:1906.03450 [pdf, other]

Adversarial Mahalanobis Distance-based Attentive Song Recommender for Automatic Playlist Continuation

Authors: Thanh Tran, Renee Sweeney, Kyumin Lee

Abstract: In this paper, we aim to solve the automatic playlist continuation (APC) problem by modeling complex interactions among users, playlists, and songs using only their interaction data. Prior methods mainly rely on dot product to account for similarities, which is not ideal as dot product is not metric learning, so it does not convey the important inequality property. Based on this observation, we pr… ▽ More In this paper, we aim to solve the automatic playlist continuation (APC) problem by modeling complex interactions among users, playlists, and songs using only their interaction data. Prior methods mainly rely on dot product to account for similarities, which is not ideal as dot product is not metric learning, so it does not convey the important inequality property. Based on this observation, we propose three novel deep learning approaches that utilize Mahalanobis distance. Our first approach uses user-playlist-song interactions, and combines Mahalanobis distance scores between (i) a target user and a target song, and (ii) between a target playlist and the target song to account for both the user's preference and the playlist's theme. Our second approach measures song-song similarities by considering Mahalanobis distance scores between the target song and each member song (i.e., existing song) in the target playlist. The contribution of each distance score is measured by our proposed memory metric-based attention mechanism. In the third approach, we fuse the two previous models into a unified model to further enhance their performance. In addition, we adopt and customize Adversarial Personalized Ranking (APR) for our three approaches to further improve their robustness and predictive capabilities. Through extensive experiments, we show that our proposed models outperform eight state-of-the-art models in two large-scale real-world datasets. △ Less

Submitted 8 June, 2019; originally announced June 2019.

Journal ref: SIGIR 2019

arXiv:1905.06008 [pdf]

Integration of SCADA services in cross-infrastructure holistic tests of cyber-physical energy systems

Authors: Van Hoa Nguyen, Tung Lam Nguyen, Quoc Tuan Tran, Yvon Besanger, Raphael Caire

Abstract: Cyber-Physical Energy System, due to its multi-domain nature, requires a holistic validation methodology, which may involve the integration of assets and expertise from various research infrastructures. In this paper, the integration of Supervisory Control and Data Acquisition services to cross-infrastructure experiment is proposed. The method requires a high degree of interoperability among the p… ▽ More Cyber-Physical Energy System, due to its multi-domain nature, requires a holistic validation methodology, which may involve the integration of assets and expertise from various research infrastructures. In this paper, the integration of Supervisory Control and Data Acquisition services to cross-infrastructure experiment is proposed. The method requires a high degree of interoperability among the participating partners and can be applied to extend the capacity as well as the degree of realism of advanced validation method such as co-simulation, remote hardware-in-the-loop or hybrid simulation. The proposed method is applied to a case study of multi-agent system based control for islanded microgrid where real devices from one platform is integrated to real-time simulation and control platform in a distanced infrastructure, in a holistic experimental implementation. △ Less

Submitted 15 May, 2019; originally announced May 2019.

Comments: Accepted for presentation in the IEEE EEEIC 2019 Conference

arXiv:1905.05002 [pdf, other]

A Compact Low-Latency Systematic Successive Cancellation Polar Decoder for Visible Light Communication Systems

Authors: Duc-Phuc Nguyen, Dinh-Dung Le, Thi-Hong Tran, Takashi Nakada, Yasuhiko Nakashima

Abstract: Channel polarization and Polar code are widely considered as major breakthroughs in coding theory because they have shown promising features for future wireless standards. The main drawbacks of Polar code are high-latency in decoding hardware, and unimpressive error-correction performance in case limited code-length is implemented. These two disadvantages limit implementation of Polar code in low-… ▽ More Channel polarization and Polar code are widely considered as major breakthroughs in coding theory because they have shown promising features for future wireless standards. The main drawbacks of Polar code are high-latency in decoding hardware, and unimpressive error-correction performance in case limited code-length is implemented. These two disadvantages limit implementation of Polar code in low-throughput wireless communication systems. In this paper, we propose a low-complexity low-latency hardware architecture for the soft-decision compact (16,11) Systematic Successive Cancellation Polar Decoder (S-SCD). Experimental results has shown that the latency of the proposed S-SCD improves 3.75 times and 2.75 times compared with conventional and 2b-SC architectures. Besides, it has also shown a better BER/FER performance compared with RS(15,11) code, which is applied widely in current VLC-based systems. △ Less

Submitted 6 May, 2019; originally announced May 2019.

Comments: IEICE Technical Report, Vol.117, Issue 44, pp.3-7

arXiv:1810.05541 [pdf]

On the applicability of distributed ledger architectures to peer-to-peer energy trading framework

Authors: Van Hoa Nguyen, Yvon Besanger, Quoc Tuan Tran, Minh Tri Le

Abstract: As more and more distributed renewable energy resources are integrated to the grid, the traditional consumers have become the prosumers who can sell back their surplus energy to the others who are in energy shortage. This peer-to-peer (P2P) energy transaction framework benefits the end users, financially and in term of energy security; and the network operators, in term of flexibility in DRES mana… ▽ More As more and more distributed renewable energy resources are integrated to the grid, the traditional consumers have become the prosumers who can sell back their surplus energy to the others who are in energy shortage. This peer-to-peer (P2P) energy transaction framework benefits the end users, financially and in term of energy security; and the network operators, in term of flexibility in DRES management, peak load shifting and regulation of voltage/frequency. Environmentally, P2P energy transaction also helps to reduce carbon footprint, reduces DRES payback period and incentivizes the installation of DRES. The current centralized market model is no longer suitable and it is therefore necessary to develop an adapted decentralized architecture for the advanced P2P energy transaction framework intra/inter-microgrid. In this paper, we discuss several distributed ledger approaches for such framework: Blockchain, Block Lattice and Directed Acyclic Graph (the Tangle). The technical advantages of these architectures as well as the persistent challenges are then considered. △ Less

Submitted 11 October, 2018; originally announced October 2018.

Comments: IEEE EEEIC 2018, Palermo, Italy, June 2018

arXiv:1810.03743 [pdf, other]

JOBS: Joint-Sparse Optimization from Bootstrap Samples

Authors: Luoluo Liu, Sang Peter Chin, Trac D. Tran

Abstract: Classical signal recovery based on $\ell_1$ minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. In practice, it is often the case that not all measurements are available or required for recovery. Measurements might be corrupted/missing or they arrive sequentially in streaming fashion. In this paper, we propose a global sparse recover… ▽ More Classical signal recovery based on $\ell_1$ minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. In practice, it is often the case that not all measurements are available or required for recovery. Measurements might be corrupted/missing or they arrive sequentially in streaming fashion. In this paper, we propose a global sparse recovery strategy based on subsets of measurements, named JOBS, in which multiple measurements vectors are generated from the original pool of measurements via bootstrap**, and then a joint-sparse constraint is enforced to ensure support consistency among multiple predictors. The final estimate is obtained by averaging over the $K$ predictors. The performance limits associated with different choices of number of bootstrap samples $L$ and number of estimates $K$ is analyzed theoretically. Simulation results validate some of the theoretical analysis, and show that the proposed method yields state-of-the-art recovery performance, outperforming $\ell_1$ minimization and a few other existing bootstrap-based techniques in the challenging case of low levels of measurements and is preferable over other bagging-based methods in the streaming setting since it performs better with small $K$ and $L$ for data-sets with large sizes. △ Less

Submitted 10 December, 2018; v1 submitted 8 October, 2018; originally announced October 2018.

arXiv:1805.03398 [pdf, other]

VLSI Architecture of Compact Non-RLL Beacon-based Visible Light Communication Transmitter and Receiver

Authors: Duc-Phuc Nguyen, Dinh-Dung Le, Thi-Hong Tran, Huu-Thuan Huynh, Yasuhiko Nakashima

Abstract: In this paper, we introduce a couple of hardware implementations of compact VLC transmitter and receiver for the first time. Compared with related works, our VLC transmitter is non-RLL one, that means flicker mitigation can be guaranteed even without RLL codes. In particular, we have utilized a centralized bit probability distribution of a prescrambler and a Polar encoder to create a non-RLL flick… ▽ More In this paper, we introduce a couple of hardware implementations of compact VLC transmitter and receiver for the first time. Compared with related works, our VLC transmitter is non-RLL one, that means flicker mitigation can be guaranteed even without RLL codes. In particular, we have utilized a centralized bit probability distribution of a prescrambler and a Polar encoder to create a non-RLL flicker mitigation solution. Moreover, at the receiver, a 3-bit soft-decision filter is proposed to analyze signals received from real VLC channel to extract log-likelihood ratio (LLR) values and feed them to the FEC decoder. Therefore, soft-decoding of Polar decoder can be implemented to improve the bit-error-rate (BER) performance of the VLC system. Finally, we introduce a novel very large scale integration (VLSI) architecture for the compact VLC transmitter and receiver; and synthesis our design under FPGA/ASIC synthesis tools. Due to the non-RLL basic, our system has an evidently good code-rate and a reduced-complexity compared with other RLL-based receiver works. Also, we present FPGA and ASIC synthesis results of the proposed architecture with evaluations of power consumption, area, energy-per-bits and so on. △ Less

Submitted 9 May, 2018; originally announced May 2018.

Comments: Being reviewd by EURASIP Journal of Wireless Communication and Networking

Showing 1–50 of 58 results for author: Tran, T