Search | arXiv e-print repository

Tensor Power Flow Formulations for Multidimensional Analyses in Distribution Systems

Authors: Edgar Mauricio Salazar Duque, Juan S. Giraldo, Pedro P. Vergara, Phuong H. Nguyen, Han, Slootweg

Abstract: In this paper, we present two multidimensional power flow formulations based on a fixed-point iteration (FPI) algorithm to efficiently solve hundreds of thousands of power flows in distribution systems. The presented algorithms are the base for a new TensorPowerFlow (TPF) tool and shine for their simplicity, benefiting from multicore \gls{cpu} and \gls{gpu} parallelization. We also focus on the ma… ▽ More In this paper, we present two multidimensional power flow formulations based on a fixed-point iteration (FPI) algorithm to efficiently solve hundreds of thousands of power flows in distribution systems. The presented algorithms are the base for a new TensorPowerFlow (TPF) tool and shine for their simplicity, benefiting from multicore \gls{cpu} and \gls{gpu} parallelization. We also focus on the mathematical convergence properties of the algorithm, showing that its unique solution is at the practical operational point, which is the solution of high-voltage and low-current. The proof is validated using numerical simulations showing the robustness of the FPI algorithm compared to the classical \gls{nr} approach. In the case study, a benchmark with different PF solution methods is performed, showing that for applications requiring a yearly simulation at 1-minute resolution the computation time is decreased by a factor of 164, compared to the NR in its sparse formulation. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2401.05425 [pdf, other]

An Unobtrusive and Lightweight Ear-worn System for Continuous Epileptic Seizure Detection

Authors: Abdul Aziz, Nhat Pham, Neel Vora, Cody Reynolds, Jaime Lehnen, Pooja Venkatesh, Zhuoran Yao, Jay Harvey, Tam Vu, Kan Ding, Phuc Nguyen

Abstract: Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scal… ▽ More Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scalp-based EEG test, despite being the gold standard for diagnosing epilepsy, is costly, necessitates hospitalization, demands skilled professionals for operation, and is discomforting for users. In this paper, we propose EarSD, a novel lightweight, unobtrusive, and socially acceptable ear-worn system to detect epileptic seizure onsets by measuring the physiological signals from behind the user's ears. EarSD includes an integrated custom-built sensing, computing, and communication PCB to collect and amplify the signals of interest, remove the noises caused by motion artifacts and environmental impacts, and stream the data wirelessly to the computer or mobile phone nearby, where data are uploaded to the host computer for further processing. We conducted both in-lab and in-hospital experiments with epileptic seizure patients who were hospitalized for seizure studies. The preliminary results confirm that EarSD can detect seizures with up to 95.3 percent accuracy by just using classical machine learning algorithms. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.12587 [pdf, other]

Real-Time Diagnostic Integrity Meets Efficiency: A Novel Platform-Agnostic Architecture for Physiological Signal Compression

Authors: Neel R Vora, Amir Hajighasemi, Cody T. Reynolds, Amirmohammad Radmehr, Mohamed Mohamed, Jillur Rahman Saurav, Abdul Aziz, Jai Prakash Veerla, Mohammad S Nasr, Hayden Lotspeich, Partha Sai Guttikonda, Thuong Pham, Aarti Darji, Parisa Boodaghi Malidarreh, Helen H Shang, Jay Harvey, Kan Ding, Phuc Nguyen, Jacob M Luber

Abstract: Head-based signals such as EEG, EMG, EOG, and ECG collected by wearable systems will play a pivotal role in clinical diagnosis, monitoring, and treatment of important brain disorder diseases. However, the real-time transmission of the significant corpus physiological signals over extended periods consumes substantial power and time, limiting the viability of battery-dependent physiological monit… ▽ More Head-based signals such as EEG, EMG, EOG, and ECG collected by wearable systems will play a pivotal role in clinical diagnosis, monitoring, and treatment of important brain disorder diseases. However, the real-time transmission of the significant corpus physiological signals over extended periods consumes substantial power and time, limiting the viability of battery-dependent physiological monitoring wearables. This paper presents a novel deep-learning framework employing a variational autoencoder (VAE) for physiological signal compression to reduce wearables' computational complexity and energy consumption. Our approach achieves an impressive compression ratio of 1:293 specifically for spectrogram data, surpassing state-of-the-art compression techniques such as JPEG2000, H.264, Direct Cosine Transform (DCT), and Huffman Encoding, which do not excel in handling physiological signals. We validate the efficacy of the compressed algorithms using collected physiological signals from real patients in the Hospital and deploy the solution on commonly used embedded AI chips (i.e., ARM Cortex V8 and Jetson Nano). The proposed framework achieves a 91% seizure detection accuracy using XGBoost, confirming the approach's reliability, practicality, and scalability. △ Less

Submitted 4 January, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.09445 [pdf, other]

IncepSE: Leveraging InceptionTime's performance with Squeeze and Excitation mechanism in ECG analysis

Authors: Tue Minh Cao, Nhat Hong Tran, Le Phi Nguyen, Hieu Huy Pham, Hung Thanh Nguyen

Abstract: Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques tha… ▽ More Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques that are aimed at tackling the formidable challenges of severe imbalance dataset PTB-XL and gradient corruption. By this means, we manage to set a new height for deep learning model in a supervised learning manner across the majority of tasks. Our model consistently surpasses InceptionTime by substantial margins compared to other state-of-the-arts in this domain, noticeably 0.013 AUROC score improvement in the "all" task, while also mitigating the inherent dataset fluctuations during training. △ Less

Submitted 16 November, 2023; originally announced December 2023.

arXiv:2311.01715 [pdf, other]

Acousto-optic reconstruction of exterior sound field based on concentric circle sampling with circular harmonic expansion

Authors: Phuc Duc Nguyen, Kenji Ishikawa, Noboru Harada, Takehiro Moriya

Abstract: Acousto-optic sensing provides an alternative approach to traditional microphone arrays by shedding light on the interaction of light with an acoustic field. Sound field reconstruction is a fascinating and advanced technique used in acousto-optics sensing. Current challenges in sound-field reconstruction methods pertain to scenarios in which the sound source is located within the reconstruction ar… ▽ More Acousto-optic sensing provides an alternative approach to traditional microphone arrays by shedding light on the interaction of light with an acoustic field. Sound field reconstruction is a fascinating and advanced technique used in acousto-optics sensing. Current challenges in sound-field reconstruction methods pertain to scenarios in which the sound source is located within the reconstruction area, known as the exterior problem. Existing reconstruction algorithms, primarily designed for interior scenarios, often exhibit suboptimal performance when applied to exterior cases. This paper introduces a novel technique for exterior sound-field reconstruction. The proposed method leverages concentric circle sampling and a two-dimensional exterior sound-field reconstruction approach based on circular harmonic extensions. To evaluate the efficacy of this approach, both numerical simulations and practical experiments are conducted. The results highlight the superior accuracy of the proposed method when compared to conventional reconstruction methods, all while utilizing a minimal amount of measured projection data. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2304.14455 [pdf, other]

doi 10.1109/ICCAIS59597.2023.10382296

Bearing-Based Network Localization Under Randomized Gossip Protocol

Authors: Nhat-Minh Le-Phan, Minh Hoang Trinh, Phuoc Doan Nguyen

Abstract: In this paper, we consider a randomized gossip algorithm for the bearing-based network localization problem. Let each sensor node be able to obtain the bearing vectors and communicate its position estimates with several neighboring agents. Each update involves two agents, and the update sequence follows a stochastic process. Under the assumption that the network is infinitesimally bearing rigid an… ▽ More In this paper, we consider a randomized gossip algorithm for the bearing-based network localization problem. Let each sensor node be able to obtain the bearing vectors and communicate its position estimates with several neighboring agents. Each update involves two agents, and the update sequence follows a stochastic process. Under the assumption that the network is infinitesimally bearing rigid and contains at least two beacon nodes, we show that when the updating step-size is properly selected, the proposed algorithm can successfully estimate the actual sensor nodes' positions with probability one. The randomized update provides a simple, distributed, and cost-effective method for localizing the network. The theoretical result is supported with a simulation of a 1089-node sensor network. △ Less

Submitted 17 January, 2024; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: preprint, 6 pages, 2 figures. Published in the Proceeding of the 12th International Conference on Control, Automation and Information Sciences (ICCAIS). arXiv admin note: text overlap with arXiv:2303.14733

arXiv:2304.11080 [pdf, other]

Multimodal contrastive learning for diagnosing cardiovascular diseases from electrocardiography (ECG) signals and patient metadata

Authors: Tue M. Cao, Nhat H. Tran, Phi Le Nguyen, Hieu Pham

Abstract: This work discusses the use of contrastive learning and deep learning for diagnosing cardiovascular diseases from electrocardiography (ECG) signals. While the ECG signals usually contain 12 leads (channels), many healthcare facilities and devices lack access to all these 12 leads. This raises the problem of how to use only fewer ECG leads to produce meaningful diagnoses with high performance. We i… ▽ More This work discusses the use of contrastive learning and deep learning for diagnosing cardiovascular diseases from electrocardiography (ECG) signals. While the ECG signals usually contain 12 leads (channels), many healthcare facilities and devices lack access to all these 12 leads. This raises the problem of how to use only fewer ECG leads to produce meaningful diagnoses with high performance. We introduce a simple experiment to test whether contrastive learning can be applied to this task. More specifically, we added the similarity between the embedding vectors when the 12 leads signal and the fewer leads ECG signal to the loss function to bring these representations closer together. Despite its simplicity, this has been shown to have improved the performance of diagnosing with all lead combinations, proving the potential of contrastive learning on this task. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: Accepted for presentation at the Midwest Machine Learning Symposium (MMLS 2023), Chicago, IL, USA

arXiv:2303.14733 [pdf, other]

doi 10.1109/TNSE.2024.3376643

Randomized Matrix Weighted Consensus

Authors: Nhat-Minh Le-Phan, Minh Hoang Trinh, Phuoc Doan Nguyen

Abstract: In this paper, randomized gossip-type matrix-weighted consensus algorithms are proposed for both leaderless and leader-follower topologies. First, we introduce the notion of expected matrix-weighted network, which captures the multi-dimensional interactions between any two agents in a probabilistic sense. Under some mild assumptions on the distribution of the expected matrix weights and the upper… ▽ More In this paper, randomized gossip-type matrix-weighted consensus algorithms are proposed for both leaderless and leader-follower topologies. First, we introduce the notion of expected matrix-weighted network, which captures the multi-dimensional interactions between any two agents in a probabilistic sense. Under some mild assumptions on the distribution of the expected matrix weights and the upper bound of the updating step size, the proposed asynchronous pairwise update algorithms drive the network to achieve a consensus in expectation. An upper bound of the $ε$-convergence time of the algorithm is then derived. Furthermore, the proposed algorithms are applied to the bearing-based network localization and formation control problems. The theoretical results are supported by several numerical examples. △ Less

Submitted 6 February, 2024; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: 32 pages, 6 figures, preprint

arXiv:2212.03228 [pdf, other]

ISAACS: Iterative Soft Adversarial Actor-Critic for Safety

Authors: Kai-Chieh Hsu, Duy Phuong Nguyen, Jaime Fernández Fisac

Abstract: The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit li… ▽ More The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer's uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy. △ Less

Submitted 7 June, 2024; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: Accepted in 5th Annual Learning for Dynamics & Control Conference (L4DC), University of Pennsylvania

arXiv:2210.08610 [pdf, other]

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

Authors: Lam Pham, Dusan Salovic, Anahid Jalali, Alexander Schindler, Khoa Tran, Canh Vu, Phu X. Nguyen

Abstract: In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of Mobile… ▽ More In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of MobileNetV1, MobileNetV2, VGG16, VGG19, ResNet50V2, ResNet152V2, DenseNet121, DenseNet201, and Xception. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages residual-inception architectures and multiple kernels. Given the novel residual-inception (NRI) model, we further evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events occurring in a sound scene recording can help to improve ASC accuracy, then indicate how a sound scene context is well presented by combining both sound scene and sound event information. We conduct extensive experiments on various ASC datasets, including Crowded Scenes, IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, 2022 Task 1. The experimental results on several different ASC challenges highlight two main achievements; the first is to propose robust, general, and low complexity ASC systems which are suitable for real-life applications on a wide range of edge devices and mobiles; the second is to propose an effective visualization method for comprehensively presenting a sound scene context. △ Less

Submitted 16 October, 2022; originally announced October 2022.

arXiv:2208.07824 [pdf, other]

doi 10.1109/MASS56207.2022.00097

A Deep Reinforcement Learning-based Adaptive Charging Policy for WRSNs

Authors: Ngoc Bui, Phi Le Nguyen, Viet Anh Nguyen, Phan Thuan Do

Abstract: Wireless sensor networks consist of randomly distributed sensor nodes for monitoring targets or areas of interest. Maintaining the network for continuous surveillance is a challenge due to the limited battery capacity in each sensor. Wireless power transfer technology is emerging as a reliable solution for energizing the sensors by deploying a mobile charger (MC) to recharge the sensor. However, d… ▽ More Wireless sensor networks consist of randomly distributed sensor nodes for monitoring targets or areas of interest. Maintaining the network for continuous surveillance is a challenge due to the limited battery capacity in each sensor. Wireless power transfer technology is emerging as a reliable solution for energizing the sensors by deploying a mobile charger (MC) to recharge the sensor. However, designing an optimal charging path for the MC is challenging because of uncertainties arising in the networks. The energy consumption rate of the sensors may fluctuate significantly due to unpredictable changes in the network topology, such as node failures. These changes also lead to shifts in the importance of each sensor, which are often assumed to be the same in existing works. We address these challenges in this paper by proposing a novel adaptive charging scheme using a deep reinforcement learning (DRL) approach. Specifically, we endow the MC with a charging policy that determines the next sensor to charge conditioning on the current state of the network. We then use a deep neural network to parametrize this charging policy, which will be trained by reinforcement learning techniques. Our model can adapt to spontaneous changes in the network topology. The empirical results show that the proposed algorithm outperforms the existing on-demand algorithms by a significant margin. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: 9 pages

arXiv:2204.13155 [pdf, other]

doi 10.1089/soro.2022.0010

A Soft-Bodied Aerial Robot for Collision Resilience and Contact-Reactive Perching

Authors: Pham H. Nguyen, Karishma Patnaik, Shatadal Mishra, Panagiotis Polygerinos, Wenlong Zhang

Abstract: Current aerial robots demonstrate limited interaction capabilities in unstructured environments when compared with their biological counterparts. Some examples include their inability to tolerate collisions and to successfully land or perch on objects of unknown shapes, sizes, and texture. Efforts to include compliance have introduced designs that incorporate external mechanical impact protection… ▽ More Current aerial robots demonstrate limited interaction capabilities in unstructured environments when compared with their biological counterparts. Some examples include their inability to tolerate collisions and to successfully land or perch on objects of unknown shapes, sizes, and texture. Efforts to include compliance have introduced designs that incorporate external mechanical impact protection at the cost of reduced agility and flight time due to the added weight. In this work, we propose and develop a light-weight, inflatable, soft-bodied aerial robot (SoBAR) that can pneumatically vary its body stiffness to achieve intrinsic collision resilience. Unlike the conventional rigid aerial robots, SoBAR successfully demonstrates its ability to repeatedly endure and recover from collisions in various directions, not only limited to in-plane ones. Furthermore, we exploit its capabilities to demonstrate perching where the 3D collision resilience helps in improving the perching success rates. We also augment SoBAR with a novel hybrid fabric-based, bistable (HFB) grasper that can utilize impact energies to perform contact-reactive gras** through rapid shape conforming abilities. We exhaustively study and offer insights into the collision resilience, impact absorption, and manipulation capabilities of SoBAR with the HFB grasper. Finally, we compare the performance of conventional aerial robots with the SoBAR through collision characterizations, gras** identifications, and experimental validations of collision resilience and perching in various scenarios and on differently shaped objects. △ Less

Submitted 4 January, 2023; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: Accepted for Publication, Soft Robotics Journal - Mary Ann Liebert Inc., Manuscript Details - 20 pages, 17 Figures, 2 Tables

arXiv:2203.10035 [pdf, other]

doi 10.2312/3dor.20211307

SHREC 2021: Classification in cryo-electron tomograms

Authors: Ilja Gubins, Marten L. Chaillet, Gijs van der Schot, M. Cristina Trueba, Remco C. Veltkamp, Friedrich Förster, Xiao Wang, Daisuke Kihara, Emmanuel Moebel, Nguyen P. Nguyen, Tommi White, Filiz Bunyak, Giorgos Papoulias, Stavros Gerolymatos, Evangelia I. Zacharaki, Konstantinos Moustakas, Xiangrui Zeng, Sinuo Liu, Min Xu, Yaoyu Wang, Cheng Chen, Xuefeng Cui, Fa Zhang

Abstract: Cryo-electron tomography (cryo-ET) is an imaging technique that allows three-dimensional visualization of macro-molecular assemblies under near-native conditions. Cryo-ET comes with a number of challenges, mainly low signal-to-noise and inability to obtain images from all angles. Computational methods are key to analyze cryo-electron tomograms. To promote innovation in computational methods, we… ▽ More Cryo-electron tomography (cryo-ET) is an imaging technique that allows three-dimensional visualization of macro-molecular assemblies under near-native conditions. Cryo-ET comes with a number of challenges, mainly low signal-to-noise and inability to obtain images from all angles. Computational methods are key to analyze cryo-electron tomograms. To promote innovation in computational methods, we generate a novel simulated dataset to benchmark different methods of localization and classification of biological macromolecules in tomograms. Our publicly available dataset contains ten tomographic reconstructions of simulated cell-like volumes. Each volume contains twelve different types of complexes, varying in size, function and structure. In this paper, we have evaluated seven different methods of finding and classifying proteins. Seven research groups present results obtained with learning-based methods and trained on the simulated dataset, as well as a baseline template matching (TM), a traditional method widely used in cryo-ET research. We show that learning-based approaches can achieve notably better localization and classification performance than TM. We also experimentally confirm that there is a negative relationship between particle size and performance for all methods. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: Workshop version of the paper can be found here: https://diglib.eg.org/handle/10.2312/3dor20211307

arXiv:2201.04581 [pdf, other]

Sound-Dr: Reliable Sound Dataset and Baseline Artificial Intelligence System for Respiratory Illnesses

Authors: Truong V. Hoang, Quang H. Nguyen, Cuong Q. Nguyen, Phong X. Nguyen, Hoang D. Nguyen

Abstract: As the burden of respiratory diseases continues to fall on society worldwide, this paper proposes a high-quality and reliable dataset of human sounds for studying respiratory illnesses, including pneumonia and COVID-19. It consists of coughing, mouth breathing, and nose breathing sounds together with metadata on related clinical characteristics. We also develop a proof-of-concept system for establ… ▽ More As the burden of respiratory diseases continues to fall on society worldwide, this paper proposes a high-quality and reliable dataset of human sounds for studying respiratory illnesses, including pneumonia and COVID-19. It consists of coughing, mouth breathing, and nose breathing sounds together with metadata on related clinical characteristics. We also develop a proof-of-concept system for establishing baselines and benchmarking against multiple datasets, such as Coswara and COUGHVID. Our comprehensive experiments show that the Sound-Dr dataset has richer features, better performance, and is more robust to dataset shifts in various machine learning tasks. It is promising for a wide range of real-time applications on mobile devices. The proposed dataset and system will serve as practical tools to support healthcare professionals in diagnosing respiratory disorders. The dataset and code are publicly available here: https://github.com/ReML-AI/Sound-Dr/. △ Less

Submitted 4 August, 2023; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: 9 pages, PHMAP2023, PHM

MSC Class: 68-11; 92-XX ACM Class: E.0; I.2.1

Journal ref: IJPHM (2023)

arXiv:2112.09172 [pdf, ps, other]

An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification

Authors: Lam Pham, Dat Ngo, Phu X. Nguyen, Truong Hoang, Alexander Schindler

Abstract: This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning framew… ▽ More This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning frameworks are proposed to deploy either audio or visual input data independently. Finally, results obtained from high-performed deep learning frameworks are fused to achieve the best accuracy score. Our experimental results indicate that audio and visual input factors independently contribute to the SC task's performance. Significantly, an ensemble of deep learning frameworks exploring either audio or visual input data can achieve the best accuracy of 95.7%. △ Less

Submitted 16 December, 2021; originally announced December 2021.

arXiv:2110.01605 [pdf, other]

CCS-GAN: COVID-19 CT-scan classification with very few positive training images

Authors: Sumeet Menon, Jayalakshmi Mangalagiri, Josh Galita, Michael Morris, Babak Saboury, Yaacov Yesha, Yelena Yesha, Phuong Nguyen, Aryya Gangopadhyay, David Chapman

Abstract: We present a novel algorithm that is able to classify COVID-19 pneumonia from CT Scan slices using a very small sample of training images exhibiting COVID-19 pneumonia in tandem with a larger number of normal images. This algorithm is able to achieve high classification accuracy using as few as 10 positive training slices (from 10 positive cases), which to the best of our knowledge is one order of… ▽ More We present a novel algorithm that is able to classify COVID-19 pneumonia from CT Scan slices using a very small sample of training images exhibiting COVID-19 pneumonia in tandem with a larger number of normal images. This algorithm is able to achieve high classification accuracy using as few as 10 positive training slices (from 10 positive cases), which to the best of our knowledge is one order of magnitude fewer than the next closest published work at the time of writing. Deep learning with extremely small positive training volumes is a very difficult problem and has been an important topic during the COVID-19 pandemic, because for quite some time it was difficult to obtain large volumes of COVID-19 positive images for training. Algorithms that can learn to screen for diseases using few examples are an important area of research. We present the Cycle Consistent Segmentation Generative Adversarial Network (CCS-GAN). CCS-GAN combines style transfer with pulmonary segmentation and relevant transfer learning from negative images in order to create a larger volume of synthetic positive images for the purposes of improving diagnostic classification performance. The performance of a VGG-19 classifier plus CCS-GAN was trained using a small sample of positive image slices ranging from at most 50 down to as few as 10 COVID-19 positive CT-scan images. CCS-GAN achieves high accuracy with few positive images and thereby greatly reduces the barrier of acquiring large training volumes in order to train a diagnostic classifier for COVID-19. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: 10 pages, 9 figures, 1 table, submitted to IEEE Transactions on Medical Imaging

arXiv:2109.15036 [pdf]

doi 10.3390/electronics10202558

Automated Workers Ergonomic Risk Assessment in Manual Material Handling using sEMG Wearable Sensors and Machine Learning

Authors: Srimantha E. Mudiyanselage, Phuong H. D. Nguyen, Mohammad Sadra Rajabi, Reza Akhavian

Abstract: Manual material handling tasks have the potential to be highly unsafe from an ergonomic viewpoint. Safety inspections to monitor body postures can help mitigate ergonomic risks of material handling. However, the real effect of awkward muscle movements, strains, and excessive forces that may result in an injury may not be identified by external cues. This paper evaluates the ability of surface elec… ▽ More Manual material handling tasks have the potential to be highly unsafe from an ergonomic viewpoint. Safety inspections to monitor body postures can help mitigate ergonomic risks of material handling. However, the real effect of awkward muscle movements, strains, and excessive forces that may result in an injury may not be identified by external cues. This paper evaluates the ability of surface electromyogram (EMG)-based systems together with machine learning algorithms to automatically detect body movements that may harm muscles in material handling. The analysis utilized a lifting equation developed by the U.S. National Institute for Occupational Safety and Health (NIOSH). This equation determines a Recommended Weight Limit, which suggests the maximum acceptable weight that a healthy worker can lift and carry as well as a Lifting Index value to assess the risk extent. Four different machine learning models, namely Decision Tree, Support Vector Machine, K-Nearest Neighbor, and Random Forest are developed to classify the risk assessments calculated based on the NIOSH lifting equation. The sensitivity of the models to various parameters is also evaluated to find the best performance using each algorithm. Results indicate that Decision Tree models have the potential to predict the risk level with close to 99.35% accuracy. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Journal ref: Electronics. 2021; 10(20):2558

arXiv:2109.07673 [pdf, other]

Back to the Future: Efficient, Time-Consistent Solutions in Reach-Avoid Games

Authors: Dennis R. Anthony, Duy P. Nguyen, David Fridovich-Keil, Jaime F. Fisac

Abstract: We study the class of reach-avoid dynamic games in which multiple agents interact noncooperatively, and each wishes to satisfy a distinct target criterion while avoiding a failure criterion. Reach-avoid games are commonly used to express safety-critical optimal control problems found in mobile robot motion planning. Here, we focus on finding time-consistent solutions, in which future motion plans… ▽ More We study the class of reach-avoid dynamic games in which multiple agents interact noncooperatively, and each wishes to satisfy a distinct target criterion while avoiding a failure criterion. Reach-avoid games are commonly used to express safety-critical optimal control problems found in mobile robot motion planning. Here, we focus on finding time-consistent solutions, in which future motion plans remain optimal even when a robot diverges from the plan early on due to, e.g., intrinsic dynamic uncertainty or extrinsic environment disturbances. Our main contribution is a computationally-efficient algorithm for multi-agent reach-avoid games which renders time-consistent solutions for all players. We demonstrate our approach in two- and three-player simulated driving scenarios, in which our method provides safe control strategies for all agents. △ Less

Submitted 2 March, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

Comments: accepted to ICRA 2022

arXiv:2107.06488 [pdf]

Behavior Analysis and Design of Concrete-Filled Steel Circular-Tube Short Columns Subjected to Axial Compression

Authors: Duc-Duy Pham, Phu-Cuong Nguyen

Abstract: In this paper, a new finite element (FE) model using ABAQUS software was developed to investigate the compressive behavior of Concrete-Filled Steel Circular-Tube (CFSCT) columns. Experimental studies indicated that the confinement offered by the circular steel tube in a CFSCT column increased both the strength and ductility of the filled concrete. Base on the database of 663 test results CFSCT col… ▽ More In this paper, a new finite element (FE) model using ABAQUS software was developed to investigate the compressive behavior of Concrete-Filled Steel Circular-Tube (CFSCT) columns. Experimental studies indicated that the confinement offered by the circular steel tube in a CFSCT column increased both the strength and ductility of the filled concrete. Base on the database of 663 test results CFSCT columns under axial compression are collected from the available literature, a formula to determine the lateral confining pressures on concrete. Concrete-Damaged Plasticity Model (CDPM) and parameters are available in ABAQUS are used in the analysis. From results analysis, a proposed formula for predicting ultimate load by determining intensification and diminution for concrete and steel. The proposed formula is then compared with the FE model, the previous study, and the design code current in strength prediction of CFSCT columns under compression. The comparative result shows that the FE model, the proposed formula is more stable and accurate than the previous study and current standards when using material normal or high strength. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Comments: 44 pages, 20 figures, an international paper, 7 tables, 2 authors, Phu-Cuong Nguyen is the corresponding author, Duc-Duy Pham is Master student

Report number: E2019.02.2

arXiv:2104.02060 [pdf]

Toward Generating Synthetic CT Volumes using a 3D-Conditional Generative Adversarial Network

Authors: Jayalakshmi Mangalagiri, David Chapman, Aryya Gangopadhyay, Yaacov Yesha, Joshua Galita, Sumeet Menon, Yelena Yesha, Babak Saboury, Michael Morris, Phuong Nguyen

Abstract: We present a novel conditional Generative Adversarial Network (cGAN) architecture that is capable of generating 3D Computed Tomography scans in voxels from noisy and/or pixelated approximations and with the potential to generate full synthetic 3D scan volumes. We believe conditional cGAN to be a tractable approach to generate 3D CT volumes, even though the problem of generating full resolution dee… ▽ More We present a novel conditional Generative Adversarial Network (cGAN) architecture that is capable of generating 3D Computed Tomography scans in voxels from noisy and/or pixelated approximations and with the potential to generate full synthetic 3D scan volumes. We believe conditional cGAN to be a tractable approach to generate 3D CT volumes, even though the problem of generating full resolution deep fakes is presently impractical due to GPU memory limitations. We present results for autoencoder, denoising, and depixelating tasks which are trained and tested on two novel COVID19 CT datasets. Our evaluation metrics, Peak Signal to Noise ratio (PSNR) range from 12.53 - 46.46 dB, and the Structural Similarity index ( SSIM) range from 0.89 to 1. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: It is a short paper accepted in CSCI 2020 conference and is accepted to publication in the IEEE CPS proceedings

arXiv:2102.03893 [pdf, other]

Enhancement of Distribution System State Estimation Using Pruned Physics-Aware Neural Networks

Authors: Minh-Quan Tran, Ahmed S. Zamzam, Phuong H. Nguyen

Abstract: Realizing complete observability in the three-phase distribution system remains a challenge that hinders the implementation of classic state estimation algorithms. In this paper, a new method, called the pruned physics-aware neural network (P2N2), is developed to improve the voltage estimation accuracy in the distribution system. The method relies on the physical grid topology, which is used to de… ▽ More Realizing complete observability in the three-phase distribution system remains a challenge that hinders the implementation of classic state estimation algorithms. In this paper, a new method, called the pruned physics-aware neural network (P2N2), is developed to improve the voltage estimation accuracy in the distribution system. The method relies on the physical grid topology, which is used to design the connections between different hidden layers of a neural network model. To verify the proposed method, a numerical simulation based on one-year smart meter data of load consumptions for three-phase power flow is developed to generate the measurement and voltage state data. The IEEE 123-node system is selected as the test network to benchmark the proposed algorithm against the classic weighted least squares (WLS). Numerical results show that P2N2 outperforms WLS in terms of data redundancy and estimation accuracy. △ Less

Submitted 15 October, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

arXiv:2010.11682 [pdf]

Lung Nodule Classification Using Biomarkers, Volumetric Radiomics and 3D CNNs

Authors: Kushal Mehta, Arshita Jain, Jayalakshmi Mangalagiri, Sumeet Menon, Phuong Nguyen, David R. Chapman

Abstract: We present a hybrid algorithm to estimate lung nodule malignancy that combines imaging biomarkers from Radiologist's annotation with image classification of CT scans. Our algorithm employs a 3D Convolutional Neural Network (CNN) as well as a Random Forest in order to combine CT imagery with biomarker annotation and volumetric radiomic features. We analyze and compare the performance of the algorit… ▽ More We present a hybrid algorithm to estimate lung nodule malignancy that combines imaging biomarkers from Radiologist's annotation with image classification of CT scans. Our algorithm employs a 3D Convolutional Neural Network (CNN) as well as a Random Forest in order to combine CT imagery with biomarker annotation and volumetric radiomic features. We analyze and compare the performance of the algorithm using only imagery, only biomarkers, combined imagery + biomarkers, combined imagery + volumetric radiomic features and finally the combination of imagery + biomarkers + volumetric features in order to classify the suspicion level of nodule malignancy. The National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) IDRI dataset is used to train and evaluate the classification task. We show that the incorporation of semi-supervised learning by means of K-Nearest-Neighbors (KNN) can increase the available training sample size of the LIDC-IDRI thereby further improving the accuracy of malignancy estimation of most of the models tested although there is no significant improvement with the use of KNN semi-supervised learning if image classification with CNNs and volumetric features are combined with descriptive biomarkers. Unexpectedly, we also show that a model using image biomarkers alone is more accurate than one that combines biomarkers with volumetric radiomics, 3D CNNs, and semi-supervised learning. We discuss the possibility that this result may be influenced by cognitive bias in LIDC-IDRI because malignancy estimates were recorded by the same radiologist panel as biomarkers, as well as future work to incorporate pathology information over a subset of study participants. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: This paper has been submitted to the Journal of Digital Imaging (JDI 2020). The poster of this paper has received the 2nd prize for the Research Poster Award. Link: https://siim.org/page/20m_p_lung_node_malignancy

arXiv:2009.12478 [pdf, other]

Generating Realistic COVID19 X-rays with a Mean Teacher + Transfer Learning GAN

Authors: Sumeet Menon, Joshua Galita, David Chapman, Aryya Gangopadhyay, Jayalakshmi Mangalagiri, Phuong Nguyen, Yaacov Yesha, Yelena Yesha, Babak Saboury, Michael Morris

Abstract: COVID-19 is a novel infectious disease responsible for over 800K deaths worldwide as of August 2020. The need for rapid testing is a high priority and alternative testing strategies including X-ray image classification are a promising area of research. However, at present, public datasets for COVID19 x-ray images have low data volumes, making it challenging to develop accurate image classifiers. S… ▽ More COVID-19 is a novel infectious disease responsible for over 800K deaths worldwide as of August 2020. The need for rapid testing is a high priority and alternative testing strategies including X-ray image classification are a promising area of research. However, at present, public datasets for COVID19 x-ray images have low data volumes, making it challenging to develop accurate image classifiers. Several recent papers have made use of Generative Adversarial Networks (GANs) in order to increase the training data volumes. But realistic synthetic COVID19 X-rays remain challenging to generate. We present a novel Mean Teacher + Transfer GAN (MTT-GAN) that generates COVID19 chest X-ray images of high quality. In order to create a more accurate GAN, we employ transfer learning from the Kaggle Pneumonia X-Ray dataset, a highly relevant data source orders of magnitude larger than public COVID19 datasets. Furthermore, we employ the Mean Teacher algorithm as a constraint to improve stability of training. Our qualitative analysis shows that the MTT-GAN generates X-ray images that are greatly superior to a baseline GAN and visually comparable to real X-rays. Although board-certified radiologists can distinguish MTT-GAN fakes from real COVID19 X-rays. Quantitative analysis shows that MTT-GAN greatly improves the accuracy of both a binary COVID19 classifier as well as a multi-class Pneumonia classifier as compared to a baseline GAN. Our classification accuracy is favourable as compared to recently reported results in the literature for similar binary and multi-class COVID19 screening tasks. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Comments: 10 pages, 11 figures, 2 tables; Submitted to IEEE BigData 2020 conference

arXiv:2009.07520 [pdf, other]

doi 10.3934/ipi.2021053

PCA Reduced Gaussian Mixture Models with Applications in Superresolution

Authors: Johannes Hertrich, Dang Phoung Lan Nguyen, Jean-Fancois Aujol, Dominique Bernard, Yannick Berthoumieu, Abdellatif Saadaldin, Gabriele Steidl

Abstract: Despite the rapid development of computational hardware, the treatment of large and high dimensional data sets is still a challenging problem. This paper provides a twofold contribution to the topic. First, we propose a Gaussian Mixture Model in conjunction with a reduction of the dimensionality of the data in each component of the model by principal component analysis, called PCA-GMM. To learn th… ▽ More Despite the rapid development of computational hardware, the treatment of large and high dimensional data sets is still a challenging problem. This paper provides a twofold contribution to the topic. First, we propose a Gaussian Mixture Model in conjunction with a reduction of the dimensionality of the data in each component of the model by principal component analysis, called PCA-GMM. To learn the (low dimensional) parameters of the mixture model we propose an EM algorithm whose M-step requires the solution of constrained optimization problems. Fortunately, these constrained problems do not depend on the usually large number of samples and can be solved efficiently by an (inertial) proximal alternating linearized minimization algorithm. Second, we apply our PCA-GMM for the superresolution of 2D and 3D material images based on the approach of Sandeep and Jacob. Numerical results confirm the moderate influence of the dimensionality reduction on the overall superresolution result. △ Less

Submitted 6 May, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

Journal ref: Inverse Problems and Imaging, vol. 16, pp. 341-366, 2022

arXiv:2008.06828 [pdf, other]

A novel approach to remove foreign objects from chest X-ray images

Authors: Hieu X. Le, Phuong D. Nguyen, Thang H. Nguyen, Khanh N. Q. Le, Thanh T. Nguyen

Abstract: We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, a… ▽ More We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, an object detection model is trained to separate the foreign objects from the given image. Subsequently, the binary mask of each object is extracted utilizing a segmentation model. Each pair of the binary mask and the extracted object are then used for inpainting purposes. Finally, the in-painted regions are now merged back to the original image, resulting in a clean and non-foreign-object-existing output. To conclude, we achieved state-of-the-art accuracy. The experimental results showed a new approach to the possible applications of this method for chest X-ray images detection. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: 9 pages, 7 figures, 7 tables

arXiv:2005.03271 [pdf, other]

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

Authors: Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

Abstract: In recent years, all-neural end-to-end approaches have obtained state-of-the-art results on several challenging automatic speech recognition (ASR) tasks. However, most existing works focus on building ASR models where train and test data are drawn from the same domain. This results in poor generalization characteristics on mismatched-domains: e.g., end-to-end models trained on short segments perfo… ▽ More In recent years, all-neural end-to-end approaches have obtained state-of-the-art results on several challenging automatic speech recognition (ASR) tasks. However, most existing works focus on building ASR models where train and test data are drawn from the same domain. This results in poor generalization characteristics on mismatched-domains: e.g., end-to-end models trained on short segments perform poorly when evaluated on longer utterances. In this work, we analyze the generalization properties of streaming and non-streaming recurrent neural network transducer (RNN-T) based end-to-end models in order to identify model components that negatively affect generalization performance. We propose two solutions: combining multiple regularization techniques during training, and using dynamic overlap** inference. On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22.3% to 14.8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67.0% to 25.3%. Finally, when trained on Librispeech, we find that dynamic overlap** inference improves WER on YouTube from 99.8% to 33.0%. △ Less

Submitted 23 December, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

Comments: SLT camera-ready version

arXiv:2003.10822 [pdf, other]

Pre-processing Image using Brightening, CLAHE and RETINEX

Authors: Thi Phuoc Hanh Nguyen, Zinan Cai, Khanh Nguyen, Sokuntheariddh Keth, Ningyuan Shen, Mira Park

Abstract: This paper focuses on finding the most optimal pre-processing methods considering three common algorithms for image enhancement: Brightening, CLAHE and Retinex. For the purpose of image training in general, these methods will be combined to find out the most optimal method for image enhancement. We have carried out the research on the different permutation of three methods: Brightening, CLAHE and… ▽ More This paper focuses on finding the most optimal pre-processing methods considering three common algorithms for image enhancement: Brightening, CLAHE and Retinex. For the purpose of image training in general, these methods will be combined to find out the most optimal method for image enhancement. We have carried out the research on the different permutation of three methods: Brightening, CLAHE and Retinex. The evaluation is based on Canny Edge detection applied to all processed images. Then the sharpness of objects will be justified by true positive pixels number in comparison between images. After using different number combinations pre-processing functions on images, CLAHE proves to be the most effective in edges improvement, Brightening does not show much effect on the edges enhancement, and the Retinex even reduces the sharpness of images and shows little contribution on images enhancement. △ Less

Submitted 22 March, 2020; originally announced March 2020.

arXiv:2003.09677 [pdf, ps, other]

UAV-Assisted Secure Communications in Terrestrial Cognitive Radio Networks: Joint Power Control and 3D Trajectory Optimization

Authors: Phu X. Nguyen, Van-Dinh Nguyen, Hieu V. Nguyen, Oh-Soon Shin

Abstract: This paper considers secure communications for an underlay cognitive radio network (CRN) in the presence of an external eavesdropper (Eve). The secrecy performance of CRNs is usually limited by the primary receiver's interference power constraint. To overcome this issue, we propose to use an unmanned aerial vehicle (UAV) as a friendly jammer to interfere with Eve in decoding the confidential messa… ▽ More This paper considers secure communications for an underlay cognitive radio network (CRN) in the presence of an external eavesdropper (Eve). The secrecy performance of CRNs is usually limited by the primary receiver's interference power constraint. To overcome this issue, we propose to use an unmanned aerial vehicle (UAV) as a friendly jammer to interfere with Eve in decoding the confidential message from the secondary transmitter (ST). Our goal is to jointly optimize the transmit power and UAV's trajectory in the three-dimensional (3D) space to maximize the average achievable secrecy rate of the secondary system. The formulated optimization problem is nonconvex due to the nonconvexity of the objective and nonconvexity of constraints, which is very challenging to solve. To obtain a suboptimal but efficient solution to the problem, we first transform the original problem into a more tractable form and develop an iterative algorithm for its solution by leveraging the inner approximation framework. We further extend the proposed algorithm to the case of imperfect location information of Eve, where the average worst-case secrecy rate is considered as the objective function. Extensive numerical results are provided to demonstrate the merits of the proposed algorithms over existing approaches. △ Less

Submitted 25 March, 2020; v1 submitted 21 March, 2020; originally announced March 2020.

arXiv:1911.10229 [pdf]

Improved motion correction for functional MRI using an omnibus regression model

Authors: Vyom Raval, Kevin P. Nguyen, Albert Montillo

Abstract: Head motion during functional Magnetic Resonance Imaging acquisition can significantly contaminate the neural signal and introduce spurious, distance-dependent changes in signal correlations. This can heavily confound studies of development, aging, and disease. Previous approaches to suppress head motion artifacts have involved sequential regression of nuisance covariates, but this has been shown… ▽ More Head motion during functional Magnetic Resonance Imaging acquisition can significantly contaminate the neural signal and introduce spurious, distance-dependent changes in signal correlations. This can heavily confound studies of development, aging, and disease. Previous approaches to suppress head motion artifacts have involved sequential regression of nuisance covariates, but this has been shown to reintroduce artifacts. We propose a new motion correction pipeline using an omnibus regression model that avoids this problem by simultaneously regressing out multiple artifacts using the best performing algorithms to estimate each artifact. We quantitatively evaluate its motion artifact suppression performance against sequential regression pipelines using a large heterogeneous dataset (n=151) which includes high-motion subjects and multiple disease phenotypes. The proposed concatenated regression pipeline significantly reduces the association between head motion and functional connectivity while significantly outperforming the traditional sequential regression pipelines in eliminating distance-dependent head motion artifacts. △ Less

Submitted 21 January, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

Comments: 4 pages, 2 figures, accepted for IEEE ISBI 2020 conference Updated following ISBI reviewer suggestions

arXiv:1911.10227 [pdf]

Prediction of individual progression rate in Parkinson's disease using clinical measures and biomechanical measures of gait and postural stability

Authors: Vyom Raval, Kevin P. Nguyen, Ashley Gerald, Richard B. Dewey Jr., Albert Montillo

Abstract: Parkinson's disease (PD) is a common neurological disorder characterized by gait impairment. PD has no cure, and an impediment to develo** a treatment is the lack of any accepted method to predict disease progression rate. The primary aim of this study was to develop a model using clinical measures and biomechanical measures of gait and postural stability to predict an individual's PD progressio… ▽ More Parkinson's disease (PD) is a common neurological disorder characterized by gait impairment. PD has no cure, and an impediment to develo** a treatment is the lack of any accepted method to predict disease progression rate. The primary aim of this study was to develop a model using clinical measures and biomechanical measures of gait and postural stability to predict an individual's PD progression over two years. Data from 160 PD subjects were utilized. Machine learning models, including XGBoost and Feed Forward Neural Networks, were developed using extensive model optimization and cross-validation. The highest performing model was a neural network that used a group of clinical measures, achieved a PPV of 71% in identifying fast progressors, and explained a large portion (37%) of the variance in an individual's progression rate on held-out test data. This demonstrates the potential to predict individual PD progression rate and enrich trials by analyzing clinical and biomechanical measures with machine learning. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: 5 pages, 4 figures, IEEE ICASSP conference submission

arXiv:1911.02242 [pdf, other]

A comparison of end-to-end models for long-form speech recognition

Authors: Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu

Abstract: End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies have focused primarily on short utterances that typically last for just a few seconds or, at most, a few tens of seconds. Whether such architectures are practical… ▽ More End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies have focused primarily on short utterances that typically last for just a few seconds or, at most, a few tens of seconds. Whether such architectures are practical on long utterances that last from minutes to hours remains an open question. In this paper, we both investigate and improve the performance of end-to-end models on long-form transcription. We first present an empirical comparison of different end-to-end models on a real world long-form task and demonstrate that the RNN-T model is much more robust than attention-based systems in this regime. We next explore two improvements to attention-based systems that significantly improve its performance: restricting the attention to be monotonic, and applying a novel decoding algorithm that breaks long utterances into shorter overlap** segments. Combining these two improvements, we show that attention-based end-to-end models can be very competitive to RNN-T on long-form speech recognition. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: ASRU camera-ready version

arXiv:1910.08112 [pdf, other]

Anatomically-Informed Data Augmentation for functional MRI with Applications to Deep Learning

Authors: Kevin P. Nguyen, Cherise Chin Fatt, Alex Treacher, Cooper Mellema, Madhukar H. Trivedi, Albert Montillo

Abstract: The application of deep learning to build accurate predictive models from functional neuroimaging data is often hindered by limited dataset sizes. Though data augmentation can help mitigate such training obstacles, most data augmentation methods have been developed for natural images as in computer vision tasks such as CIFAR, not for medical images. This work helps to fills in this gap by proposin… ▽ More The application of deep learning to build accurate predictive models from functional neuroimaging data is often hindered by limited dataset sizes. Though data augmentation can help mitigate such training obstacles, most data augmentation methods have been developed for natural images as in computer vision tasks such as CIFAR, not for medical images. This work helps to fills in this gap by proposing a method for generating new functional Magnetic Resonance Images (fMRI) with realistic brain morphology. This method is tested on a challenging task of predicting antidepressant treatment response from pre-treatment task-based fMRI and demonstrates a 26% improvement in performance in predicting response using augmented images. This improvement compares favorably to state-of-the-art augmentation methods for natural images. Through an ablative test, augmentation is also shown to substantively improve performance when applied before hyperparameter optimization. These results suggest the optimal order of operations and support the role of data augmentation method for improving predictive performance in tasks using fMRI. △ Less

Submitted 17 October, 2019; originally announced October 2019.

Comments: SPIE Medical Imaging 2020

arXiv:1910.02785 [pdf, other]

BUZz: BUffer Zones for defending adversarial examples in image classification

Authors: Kaleel Mahmood, Phuong Ha Nguyen, Lam M. Nguyen, Thanh Nguyen, Marten van Dijk

Abstract: We propose a novel defense against all existing gradient based adversarial attacks on deep neural networks for image classification problems. Our defense is based on a combination of deep neural networks and simple image transformations. While straightforward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zone… ▽ More We propose a novel defense against all existing gradient based adversarial attacks on deep neural networks for image classification problems. Our defense is based on a combination of deep neural networks and simple image transformations. While straightforward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zones offers significant improvements over state-of-the-art defenses. We are able to achieve this improvement even when the adversary has access to the {\em entire} original training data set and unlimited query access to the defense. We verify our claim through experimentation using Fashion-MNIST and CIFAR-10: We demonstrate $<11\%$ attack success rate -- significantly lower than what other well-known state-of-the-art defenses offer -- at only a price of a $11-18\%$ drop in clean accuracy. By using a new intuitive metric, we explain why this trade-off offers a significant improvement over prior work. △ Less

Submitted 16 June, 2020; v1 submitted 3 October, 2019; originally announced October 2019.

arXiv:1909.13055 [pdf, other]

DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision

Authors: Duc Tam Nguyen, Maximilian Dax, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Zhongyu Lou, Thomas Brox

Abstract: Deep neural network (DNN) based salient object detection in images based on high-quality labels is expensive. Alternative unsupervised approaches rely on careful selection of multiple handcrafted saliency methods to generate noisy pseudo-ground-truth labels. In this work, we propose a two-stage mechanism for robust unsupervised object saliency prediction, where the first stage involves refinement… ▽ More Deep neural network (DNN) based salient object detection in images based on high-quality labels is expensive. Alternative unsupervised approaches rely on careful selection of multiple handcrafted saliency methods to generate noisy pseudo-ground-truth labels. In this work, we propose a two-stage mechanism for robust unsupervised object saliency prediction, where the first stage involves refinement of the noisy pseudo labels generated from different handcrafted methods. Each handcrafted method is substituted by a deep network that learns to generate the pseudo labels. These labels are refined incrementally in multiple iterations via our proposed self-supervision technique. In the second stage, the refined labels produced from multiple networks representing multiple saliency methods are used to train the actual saliency detection network. We show that this self-learning procedure outperforms all the existing unsupervised methods over different datasets. Results are even comparable to those of fully-supervised state-of-the-art approaches. The code is available at https://tinyurl.com/wtlhgo3 . △ Less

Submitted 15 March, 2021; v1 submitted 28 September, 2019; originally announced September 2019.

Comments: NeuRIPS-2019 (Vancouver, Canada): camera ready version

arXiv:1906.09548 [pdf, ps, other]

doi 10.1109/VTCFall.2019.8891244

Computation Offloading and Resource Allocation for Backhaul Limited Cooperative MEC Systems

Authors: Phuong-Duy Nguyen, Vu Nguyen Ha, Long Bao Le

Abstract: In this paper, we jointly optimize computation offloading and resource allocation to minimize the weighted sum of energy consumption of all mobile users in a backhaul limited cooperative MEC system with multiple fog servers. Considering the partial offloading strategy and TDMA transmission at each base station, the underlying optimization problem with constraints on maximum task latency and limite… ▽ More In this paper, we jointly optimize computation offloading and resource allocation to minimize the weighted sum of energy consumption of all mobile users in a backhaul limited cooperative MEC system with multiple fog servers. Considering the partial offloading strategy and TDMA transmission at each base station, the underlying optimization problem with constraints on maximum task latency and limited computation resource at mobile users and fog servers is non-convex. We propose to convexify the problem exploiting the relationship among some optimization variables from which an optimal algorithm is proposed to solve the resulting problem. We then present numerical results to demonstrate the significant gains of our proposed design compared to conventional designs without exploiting cooperation among fog servers and a greedy algorithm. △ Less

Submitted 22 June, 2019; originally announced June 2019.

arXiv:1810.07217 [pdf, other]

Hierarchical Generative Modeling for Controllable Speech Synthesis

Authors: Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang

Abstract: This paper proposes a neural sequence-to-sequence text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions. The model is formulated as a conditional generative model based on the variational autoencoder (VAE) framework, with two levels of hierarch… ▽ More This paper proposes a neural sequence-to-sequence text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions. The model is formulated as a conditional generative model based on the variational autoencoder (VAE) framework, with two levels of hierarchical latent variables. The first level is a categorical variable, which represents attribute groups (e.g. clean/noisy) and provides interpretability. The second level, conditioned on the first, is a multivariate Gaussian variable, which characterizes specific attribute configurations (e.g. noise level, speaking rate) and enables disentangled fine-grained control over these attributes. This amounts to using a Gaussian mixture model (GMM) for the latent distribution. Extensive evaluation demonstrates its ability to control the aforementioned attributes. In particular, we train a high-quality controllable TTS model on real found data, which is capable of inferring speaker and style attributes from a noisy utterance and use it to synthesize clean speech with controllable speaking style. △ Less

Submitted 27 December, 2018; v1 submitted 16 October, 2018; originally announced October 2018.

Comments: 27 pages, accepted to ICLR 2019

arXiv:1806.04558 [pdf, other]

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Authors: Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu

Abstract: We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently trained components: (1) a speaker encoder network, trained on a speaker verification task using an independent dataset of noisy speech from thousands of speakers… ▽ More We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently trained components: (1) a speaker encoder network, trained on a speaker verification task using an independent dataset of noisy speech from thousands of speakers without transcripts, to generate a fixed-dimensional embedding vector from seconds of reference speech from a target speaker; (2) a sequence-to-sequence synthesis network based on Tacotron 2, which generates a mel spectrogram from text, conditioned on the speaker embedding; (3) an auto-regressive WaveNet-based vocoder that converts the mel spectrogram into a sequence of time domain waveform samples. We demonstrate that the proposed model is able to transfer the knowledge of speaker variability learned by the discriminatively-trained speaker encoder to the new task, and is able to synthesize natural speech from speakers that were not seen during training. We quantify the importance of training the speaker encoder on a large and diverse speaker set in order to obtain the best generalization performance. Finally, we show that randomly sampled speaker embeddings can be used to synthesize speech in the voice of novel speakers dissimilar from those used in training, indicating that the model has learned a high quality speaker representation. △ Less

Submitted 2 January, 2019; v1 submitted 12 June, 2018; originally announced June 2018.

Comments: NeurIPS 2018

Journal ref: Advances in Neural Information Processing Systems 31 (2018), 4485-4495

arXiv:1712.08335 [pdf]

An Efficient Spectral Leakage Filtering for IEEE 802.11af in TV White Space

Authors: Phu Xuan Nguyen, Thinh Hung Pham, Trang Hoang, Oh-Soon Shin

Abstract: Orthogonal frequency division multiplexing (OFDM) has been widely adopted for modern wireless standards and become a key enabling technology for cognitive radios. However, one of its main drawbacks is significant spectral leakage due to the accumulation of multiple sinc-shaped subcarriers. In this paper, we present a novel pulse sha** scheme for efficient spectral leakage suppression in OFDM bas… ▽ More Orthogonal frequency division multiplexing (OFDM) has been widely adopted for modern wireless standards and become a key enabling technology for cognitive radios. However, one of its main drawbacks is significant spectral leakage due to the accumulation of multiple sinc-shaped subcarriers. In this paper, we present a novel pulse sha** scheme for efficient spectral leakage suppression in OFDM based physical layer of IEEE 802.11af standard. With conventional pulse sha** filters such as a raised-cosine filter, vestigial symmetry can be used to reduce spectral leakage very effectively. However, these pulse sha** filters require long guard interval, i.e., cyclic prefix in an OFDM system, to avoid inter-symbol interference (ISI), resulting in a loss of spectral efficiency. The proposed pulse sha** method based on asymmetric pulse sha** achieves better spectral leakage suppression and decreases ISI caused by filtering as compared to conventional pulse sha** filters. △ Less

Submitted 22 December, 2017; originally announced December 2017.

arXiv:1712.01996 [pdf, other]

An analysis of incorporating an external language model into a sequence-to-sequence model

Authors: Anjuli Kannan, Yonghui Wu, Patrick Nguyen, Tara N. Sainath, Zhifeng Chen, Rohit Prabhavalkar

Abstract: Attention-based sequence-to-sequence models for automatic speech recognition jointly train an acoustic model, language model, and alignment mechanism. Thus, the language model component is only trained on transcribed audio-text pairs. This leads to the use of shallow fusion with an external language model at inference time. Shallow fusion refers to log-linear interpolation with a separately traine… ▽ More Attention-based sequence-to-sequence models for automatic speech recognition jointly train an acoustic model, language model, and alignment mechanism. Thus, the language model component is only trained on transcribed audio-text pairs. This leads to the use of shallow fusion with an external language model at inference time. Shallow fusion refers to log-linear interpolation with a separately trained language model at each step of the beam search. In this work, we investigate the behavior of shallow fusion across a range of conditions: different types of language models, different decoding units, and different tasks. On Google Voice Search, we demonstrate that the use of shallow fusion with a neural LM with wordpieces yields a 9.1% relative word error rate reduction (WERR) over our competitive attention-based sequence-to-sequence model, obviating the need for second-pass rescoring. △ Less

Submitted 5 December, 2017; originally announced December 2017.

arXiv:1712.01864 [pdf, other]

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

Authors: Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee, Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li, Yonghui Wu, Zhifeng Chen, Chung-Cheng Chiu

Abstract: For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since th… ▽ More For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based units to words. However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units. In this work, we conduct detailed experiments which are aimed at quantifying the value of phoneme-based pronunciation lexica in the context of end-to-end models. We examine phoneme-based end-to-end models, which are contrasted against grapheme-based ones on a large vocabulary English Voice-search task, where we find that graphemes do indeed outperform phonemes. We also compare grapheme and phoneme-based approaches on a multi-dialect English task, which once again confirm the superiority of graphemes, greatly simplifying the system for recognizing multiple dialects. △ Less

Submitted 5 December, 2017; originally announced December 2017.

arXiv:1712.01818 [pdf, other]

Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models

Authors: Rohit Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Kannan

Abstract: Sequence-to-sequence models, such as attention-based models in automatic speech recognition (ASR), are typically trained to optimize the cross-entropy criterion which corresponds to improving the log-likelihood of the data. However, system performance is usually measured in terms of word error rate (WER), not log-likelihood. Traditional ASR systems benefit from discriminative sequence training whi… ▽ More Sequence-to-sequence models, such as attention-based models in automatic speech recognition (ASR), are typically trained to optimize the cross-entropy criterion which corresponds to improving the log-likelihood of the data. However, system performance is usually measured in terms of word error rate (WER), not log-likelihood. Traditional ASR systems benefit from discriminative sequence training which optimizes criteria such as the state-level minimum Bayes risk (sMBR) which are more closely related to WER. In the present work, we explore techniques to train attention-based models to directly minimize expected word error rate. We consider two loss functions which approximate the expected number of word errors: either by sampling from the model, or by using N-best lists of decoded hypotheses, which we find to be more effective than the sampling-based method. In experimental evaluations, we find that the proposed training procedure improves performance by up to 8.2% relative to the baseline system. This allows us to train grapheme-based, uni-directional attention-based models which match the performance of a traditional, state-of-the-art, discriminative sequence-trained system on a mobile voice-search task. △ Less

Submitted 5 December, 2017; originally announced December 2017.

arXiv:1712.01807 [pdf, other]

Improving the Performance of Online Neural Transducer Models

Authors: Tara N. Sainath, Chung-Cheng Chiu, Rohit Prabhavalkar, Anjuli Kannan, Yonghui Wu, Patrick Nguyen, Zhifeng Chen

Abstract: Having a sequence-to-sequence model which can operate in an online fashion is important for streaming applications such as Voice Search. Neural transducer is a streaming sequence-to-sequence model, but has shown a significant degradation in performance compared to non-streaming models such as Listen, Attend and Spell (LAS). In this paper, we present various improvements to NT. Specifically, we loo… ▽ More Having a sequence-to-sequence model which can operate in an online fashion is important for streaming applications such as Voice Search. Neural transducer is a streaming sequence-to-sequence model, but has shown a significant degradation in performance compared to non-streaming models such as Listen, Attend and Spell (LAS). In this paper, we present various improvements to NT. Specifically, we look at increasing the window over which NT computes attention, mainly by looking backwards in time so the model still remains online. In addition, we explore initializing a NT model from a LAS-trained model so that it is guided with a better alignment. Finally, we explore including stronger language models such as using wordpiece models, and applying an external LM during the beam search. On a Voice Search task, we find with these improvements we can get NT to match the performance of LAS. △ Less

Submitted 5 December, 2017; originally announced December 2017.

arXiv:1712.01769 [pdf, other]

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Authors: Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani

Abstract: Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. In previous work, we have shown that such architectures are comparable to state-of-theart ASR systems on dictation tasks, but it was not clear if such archite… ▽ More Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. In previous work, we have shown that such architectures are comparable to state-of-theart ASR systems on dictation tasks, but it was not clear if such architectures would be practical for more challenging tasks such as voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural side, we show that word piece models can be used instead of graphemes. We also introduce a multi-head attention architecture, which offers improvements over the commonly-used single-head attention. On the optimization side, we explore synchronous training, scheduled sampling, label smoothing, and minimum word error rate optimization, which are all shown to improve accuracy. We present results with a unidirectional LSTM encoder for streaming recognition. On a 12, 500 hour voice search task, we find that the proposed changes improve the WER from 9.2% to 5.6%, while the best conventional system achieves 6.7%; on a dictation task our model achieves a WER of 4.1% compared to 5% for the conventional system. △ Less

Submitted 23 February, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

Comments: ICASSP camera-ready version

arXiv:1712.01541 [pdf, other]

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

Authors: Bo Li, Tara N. Sainath, Khe Chai Sim, Michiel Bacchiani, Eugene Weinstein, Patrick Nguyen, Zhifeng Chen, Yonghui Wu, Kanishka Rao

Abstract: Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network. In this work, we look at one such sequence-to-sequence model, namely listen, attend and spell (LAS), and explore the possibility of training a sin… ▽ More Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network. In this work, we look at one such sequence-to-sequence model, namely listen, attend and spell (LAS), and explore the possibility of training a single model to serve different English dialects, which simplifies the process of training multi-dialect systems without the need for separate AM, PM and LMs for each dialect. We show that simply pooling the data from all dialects into one LAS model falls behind the performance of a model fine-tuned on each dialect. We then look at incorporating dialect-specific information into the model, both by modifying the training targets by inserting the dialect symbol at the end of the original grapheme sequence and also feeding a 1-hot representation of the dialect information into all layers of the model. Experimental results on seven English dialects show that our proposed system is effective in modeling dialect variations within a single LAS model, outperforming a LAS model trained individually on each of the seven dialects by 3.1 ~ 16.5% relative. △ Less

Submitted 5 December, 2017; originally announced December 2017.

Comments: submitted to ICASSP 2018

arXiv:1711.07274 [pdf, ps, other]

Speech recognition for medical conversations

Authors: Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu, Xuedong Zhang

Abstract: In this work we explored building automatic speech recognition models for transcribing doctor patient conversation. We collected a large scale dataset of clinical conversations ($14,000$ hr), designed the task to represent the real word scenario, and explored several alignment approaches to iteratively improve data quality. We explored both CTC and LAS systems for building speech recognition model… ▽ More In this work we explored building automatic speech recognition models for transcribing doctor patient conversation. We collected a large scale dataset of clinical conversations ($14,000$ hr), designed the task to represent the real word scenario, and explored several alignment approaches to iteratively improve data quality. We explored both CTC and LAS systems for building speech recognition models. The LAS was more resilient to noisy data and CTC required more data clean up. A detailed analysis is provided for understanding the performance for clinical tasks. Our analysis showed the speech recognition models performed well on important medical utterances, while errors occurred in causal conversations. Overall we believe the resulting models can provide reasonable quality in practice. △ Less

Submitted 20 June, 2018; v1 submitted 20 November, 2017; originally announced November 2017.

Comments: Interspeech 2018 camera ready

arXiv:1710.02928 [pdf, ps, other]

Range-Spread Targets Detection in Unknown Doppler Shift via Semi-Definite Programming

Authors: Mai. P. T. Nguyen, I. Song, S. Lee, S. Yoon

Abstract: Based on the technique of generalized likelihood ratio test, we address detection schemes for Doppler-shifted range-spread targets in Gaussian noise. First, a detection scheme is derived by solving the maximization associated with the estimation of unknown Doppler frequency with semi-definite programming. To lower the computational complexity of the detector, we then consider a simplification of t… ▽ More Based on the technique of generalized likelihood ratio test, we address detection schemes for Doppler-shifted range-spread targets in Gaussian noise. First, a detection scheme is derived by solving the maximization associated with the estimation of unknown Doppler frequency with semi-definite programming. To lower the computational complexity of the detector, we then consider a simplification of the detector by adopting maximization over a relaxed space. Both of the proposed detectors are shown to have constant false alarm rate via numerical or theoretical analysis. The detection performance of the proposed detector based on the semi-definite programming is shown to be almost the same as that of the conventional detector designed for known Doppler frequency. △ Less

Submitted 8 October, 2017; originally announced October 2017.

Comments: First author is Mai P. T. Nguyen

arXiv:1710.02656 [pdf, ps, other]

Robust Radar Detection of a Mismatched Steering Vector Embedded in Compound Gaussian Clutter

Authors: Mai P. T. Nguyen, I. Song

Abstract: The problem of radar detection in compound Gaussian clutter when a radar signature is not completely known has not been considered yet and is addressed in this paper. We proposed a robust technique to detect, based on the generalized likelihood ratio test, a point-like target embedded in compound Gaussian clutter. Employing an array of antennas, we assume that the actual steering vector departs fr… ▽ More The problem of radar detection in compound Gaussian clutter when a radar signature is not completely known has not been considered yet and is addressed in this paper. We proposed a robust technique to detect, based on the generalized likelihood ratio test, a point-like target embedded in compound Gaussian clutter. Employing an array of antennas, we assume that the actual steering vector departs from the nominal one, but lies in a known interval. The detection is then secured by employing a semi-definite programming. It is confirmed via simulation that the proposed detector experiences a negligible detection loss compared to an adaptive normalized matched filter in a perfectly matched case, but outperforms in cases of mismatched signal. Remarkably, the proposed detector possesses constant false alarm rate with respect to the clutter covariance matrix. △ Less

Submitted 7 October, 2017; originally announced October 2017.

Comments: 7 pages, 5 figures

arXiv:1602.06667 [pdf, other]

doi 10.1109/TASE.2017.2762088

A Motion Planning Strategy for the Active Vision-Based Map** of Ground-Level Structures

Authors: Manikandasriram Srinivasan Ramanagopal, André Phu-Van Nguyen, Jerome Le Ny

Abstract: This paper presents a strategy to guide a mobile ground robot equipped with a camera or depth sensor, in order to autonomously map the visible part of a bounded three-dimensional structure. We describe motion planning algorithms that determine appropriate successive viewpoints and attempt to fill holes automatically in a point cloud produced by the sensing and perception layer. The emphasis is on… ▽ More This paper presents a strategy to guide a mobile ground robot equipped with a camera or depth sensor, in order to autonomously map the visible part of a bounded three-dimensional structure. We describe motion planning algorithms that determine appropriate successive viewpoints and attempt to fill holes automatically in a point cloud produced by the sensing and perception layer. The emphasis is on accurately reconstructing a 3D model of a structure of moderate size rather than map** large open environments, with applications for example in architecture, construction and inspection. The proposed algorithms do not require any initialization in the form of a mesh model or a bounding box, and the paths generated are well adapted to situations where the vision sensor is used simultaneously for map** and for localizing the robot, in the absence of additional absolute positioning system. We analyze the coverage properties of our policy, and compare its performance to the classic frontier based exploration algorithm. We illustrate its efficacy for different structure sizes, levels of localization accuracy and range of the depth sensor, and validate our design on a real-world experiment. △ Less

Submitted 10 November, 2017; v1 submitted 22 February, 2016; originally announced February 2016.

Comments: Accepted for publication in IEEE Transactions on Automation Science and Engineering. Available in IEEE Xplore at http://ieeexplore.ieee.org/document/8093664

Showing 1–48 of 48 results for author: Nguyen, P