Search | arXiv e-print repository

HeR-DRL:Heterogeneous Relational Deep Reinforcement Learning for Decentralized Multi-Robot Crowd Navigation

Authors: Xinyu Zhou, Songhao Piao, Wenzheng Chi, Liguo Chen, Wei Li

Abstract: Crowd navigation has received significant research attention in recent years, especially DRL-based methods. While single-robot crowd scenarios have dominated research, they offer limited applicability to real-world complexities. The heterogeneity of interaction among multiple agent categories, like in decentralized multi-robot pedestrian scenarios, are frequently disregarded. This "interaction bli… ▽ More Crowd navigation has received significant research attention in recent years, especially DRL-based methods. While single-robot crowd scenarios have dominated research, they offer limited applicability to real-world complexities. The heterogeneity of interaction among multiple agent categories, like in decentralized multi-robot pedestrian scenarios, are frequently disregarded. This "interaction blind spot" hinders generalizability and restricts progress towards robust navigation algorithms. In this paper, we propose a heterogeneous relational deep reinforcement learning(HeR-DRL), based on customised heterogeneous GNN, in order to improve navigation strategies in decentralized multi-robot crowd navigation. Firstly, we devised a method for constructing robot-crowd heterogenous relation graph that effectively simulates the heterogeneous pair-wise interaction relationships. We proposed a new heterogeneous graph neural network for transferring and aggregating the heterogeneous state information. Finally, we incorporate the encoded information into deep reinforcement learning to explore the optimal policy. HeR-DRL are rigorously evaluated through comparing it to state-of-the-art algorithms in both single-robot and multi-robot circle crowssing scenario. The experimental results demonstrate that HeR-DRL surpasses the state-of-the-art approaches in overall performance, particularly excelling in safety and comfort metrics. This underscores the significance of interaction heterogeneity for crowd navigation. The source code will be publicly released in https://github.com/Zhouxy-Debugging-Den/HeR-DRL. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2308.06436 [pdf, other]

A Domain-adaptive Physics-informed Neural Network for Inverse Problems of Maxwell's Equations in Heterogeneous Media

Authors: Shiyuan Piao, Hong Gu, Aina Wang, Pan Qin

Abstract: Maxwell's equations are a collection of coupled partial differential equations (PDEs) that, together with the Lorentz force law, constitute the basis of classical electromagnetism and electric circuits. Effectively solving Maxwell's equations is crucial in various fields, like electromagnetic scattering and antenna design optimization. Physics-informed neural networks (PINNs) have shown powerful a… ▽ More Maxwell's equations are a collection of coupled partial differential equations (PDEs) that, together with the Lorentz force law, constitute the basis of classical electromagnetism and electric circuits. Effectively solving Maxwell's equations is crucial in various fields, like electromagnetic scattering and antenna design optimization. Physics-informed neural networks (PINNs) have shown powerful ability in solving PDEs. However, PINNs still struggle to solve Maxwell's equations in heterogeneous media. To this end, we propose a domain-adaptive PINN (da-PINN) to solve inverse problems of Maxwell's equations in heterogeneous media. First, we propose a location parameter of media interface to decompose the whole domain into several sub-domains. Furthermore, the electromagnetic interface conditions are incorporated into a loss function to improve the prediction performance near the interface. Then, we propose a domain-adaptive training strategy for da-PINN. Finally, the effectiveness of da-PINN is verified with two case studies. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 5 pages,4 figures

arXiv:2110.13432 [pdf]

Deep Learning-based Segmentation of Cerebral Aneurysms in 3D TOF-MRA using Coarse-to-Fine Framework

Authors: Meng Chen, Chen Geng, Dongdong Wang, Jiajun Zhang, Ruoyu Di, Fengmei Li, Zhiyong Zhou, Sirong Piao, Yuxin Li, Yaikang Dai

Abstract: BACKGROUND AND PURPOSE: Cerebral aneurysm is one of the most common cerebrovascular diseases, and SAH caused by its rupture has a very high mortality and disability rate. Existing automatic segmentation methods based on DLMs with TOF-MRA modality could not segment edge voxels very well, so that our goal is to realize more accurate segmentation of cerebral aneurysms in 3D TOF-MRA with the help of D… ▽ More BACKGROUND AND PURPOSE: Cerebral aneurysm is one of the most common cerebrovascular diseases, and SAH caused by its rupture has a very high mortality and disability rate. Existing automatic segmentation methods based on DLMs with TOF-MRA modality could not segment edge voxels very well, so that our goal is to realize more accurate segmentation of cerebral aneurysms in 3D TOF-MRA with the help of DLMs. MATERIALS AND METHODS: In this research, we proposed an automatic segmentation framework of cerebral aneurysm in 3D TOF-MRA. The framework was composed of two segmentation networks ranging from coarse to fine. The coarse segmentation network, namely DeepMedic, completed the coarse segmentation of cerebral aneurysms, and the processed results were fed into the fine segmentation network, namely dual-channel SE_3D U-Net trained with weighted loss function, for fine segmentation. Images from ADAM2020 (n=113) were used for training and validation and images from another center (n=45) were used for testing. The segmentation metrics we used include DSC, HD, and VS. RESULTS: The trained cerebral aneurysm segmentation model achieved DSC of 0.75, HD of 1.52, and VS of 0.91 on validation cohort. On the totally independent test cohort, our method achieved the highest DSC of 0.12, the lowest HD of 11.61, and the highest VS of 0.16 in comparison with state-of-the-art segmentation networks. CONCLUSIONS: The coarse-to-fine framework, which composed of DeepMedic and dual-channel SE_3D U-Net can segment cerebral aneurysms in 3D TOF-MRA with a superior accuracy. △ Less

Submitted 26 October, 2021; originally announced October 2021.

arXiv:2106.08254 [pdf, other]

BEiT: BERT Pre-Training of Image Transformers

Authors: Hangbo Bao, Li Dong, Songhao Piao, Furu Wei

Abstract: We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation from Image Transformers. Following BERT developed in the natural language processing area, we propose a masked image modeling task to pretrain vision Transformers. Specifically, each image has two views in our pre-training, i.e, image patches (such as 16x16 pixels), and visual tok… ▽ More We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation from Image Transformers. Following BERT developed in the natural language processing area, we propose a masked image modeling task to pretrain vision Transformers. Specifically, each image has two views in our pre-training, i.e, image patches (such as 16x16 pixels), and visual tokens (i.e., discrete tokens). We first "tokenize" the original image into visual tokens. Then we randomly mask some image patches and fed them into the backbone Transformer. The pre-training objective is to recover the original visual tokens based on the corrupted image patches. After pre-training BEiT, we directly fine-tune the model parameters on downstream tasks by appending task layers upon the pretrained encoder. Experimental results on image classification and semantic segmentation show that our model achieves competitive results with previous pre-training methods. For example, base-size BEiT achieves 83.2% top-1 accuracy on ImageNet-1K, significantly outperforming from-scratch DeiT training (81.8%) with the same setup. Moreover, large-size BEiT obtains 86.3% only using ImageNet-1K, even outperforming ViT-L with supervised pre-training on ImageNet-22K (85.2%). The code and pretrained models are available at https://aka.ms/beit. △ Less

Submitted 3 September, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

Comments: A Path to the BERT Moment of CV

arXiv:2006.01022 [pdf]

doi 10.1007/s11227-018-2591-3

A novel approach for multi-agent cooperative pursuit to capture grouped evaders

Authors: Muhammad Zuhair Qadir, Songhao Piao, Haiyang Jiang, Mohammed El Habib Souidi

Abstract: An approach of mobile multi-agent pursuit based on application of self-organizing feature map (SOFM) and along with that reinforcement learning based on agent group role membership function (AGRMF) model is proposed. This method promotes dynamic organization of the pursuers' groups and also makes pursuers' group evader according to their desire based on SOFM and AGRMF techniques. This helps to ove… ▽ More An approach of mobile multi-agent pursuit based on application of self-organizing feature map (SOFM) and along with that reinforcement learning based on agent group role membership function (AGRMF) model is proposed. This method promotes dynamic organization of the pursuers' groups and also makes pursuers' group evader according to their desire based on SOFM and AGRMF techniques. This helps to overcome the shortcomings of the pursuers that they cannot fully reorganize when the goal is too independent in process of AGRMF models operation. Besides, we also discuss a new reward function. After the formation of the group, reinforcement learning is applied to get the optimal solution for each agent. The results of each step in capturing process will finally affect the AGR membership function to speed up the convergence of the competitive neural network. The experiments result shows that this approach is more effective for the mobile agents to capture evaders. △ Less

Submitted 27 June, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: published paper's draft version

Journal ref: Journal of Supercomputing, J Supercomput 76 (2020)

arXiv:2002.12804 [pdf, other]

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Authors: Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

Abstract: We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM). Given an input text with masked tokens, we rely on conventional masks to learn inter-relations between corrupted tokens and context via autoencoding, and pseudo masks to learn intra-relati… ▽ More We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM). Given an input text with masked tokens, we rely on conventional masks to learn inter-relations between corrupted tokens and context via autoencoding, and pseudo masks to learn intra-relations between masked spans via partially autoregressive modeling. With well-designed position embeddings and self-attention masks, the context encodings are reused to avoid redundant computation. Moreover, conventional masks used for autoencoding provide global masking information, so that all the position embeddings are accessible in partially autoregressive language modeling. In addition, the two tasks pre-train a unified language model as a bidirectional encoder and a sequence-to-sequence decoder, respectively. Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks across several widely used benchmarks. △ Less

Submitted 28 February, 2020; originally announced February 2020.

Comments: 11 pages

arXiv:1809.04318 [pdf, other]

Neural Melody Composition from Lyrics

Authors: Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, Ming Zhou

Abstract: In this paper, we study a novel task that learns to compose music from natural language. Given the lyrics as input, we propose a melody composition model that generates lyrics-conditional melody as well as the exact alignment between the generated melody and the given lyrics simultaneously. More specifically, we develop the melody composition model based on the sequence-to-sequence framework. It c… ▽ More In this paper, we study a novel task that learns to compose music from natural language. Given the lyrics as input, we propose a melody composition model that generates lyrics-conditional melody as well as the exact alignment between the generated melody and the given lyrics simultaneously. More specifically, we develop the melody composition model based on the sequence-to-sequence framework. It consists of two neural encoders to encode the current lyrics and the context melody respectively, and a hierarchical decoder to jointly produce musical notes and the corresponding alignment. Experimental results on lyrics-melody pairs of 18,451 pop songs demonstrate the effectiveness of our proposed methods. In addition, we apply a singing voice synthesizer software to synthesize the "singing" of the lyrics and melodies for human evaluation. Results indicate that our generated melodies are more melodious and tuneful compared with the baseline method. △ Less

Submitted 12 September, 2018; originally announced September 2018.

arXiv:1804.04854 [pdf, other]

doi 10.1109/ACCESS.2019.2930201

Tightly-coupled Monocular Visual-odometric SLAM using Wheels and a MEMS Gyroscope

Authors: Meixiang Quan, Songhao Piao, Minglang Tan, Shi-Sheng Huang

Abstract: In this paper, we present a novel tightly-coupled probabilistic monocular visual-odometric Simultaneous Localization and Map** algorithm using wheels and a MEMS gyroscope, which can provide accurate, robust and long-term localization for the ground robot moving on a plane. Firstly, we present an odometer preintegration theory that integrates the wheel encoder measurements and gyroscope measureme… ▽ More In this paper, we present a novel tightly-coupled probabilistic monocular visual-odometric Simultaneous Localization and Map** algorithm using wheels and a MEMS gyroscope, which can provide accurate, robust and long-term localization for the ground robot moving on a plane. Firstly, we present an odometer preintegration theory that integrates the wheel encoder measurements and gyroscope measurements to a local frame. The preintegration theory properly addresses the manifold structure of the rotation group SO(3) and carefully deals with uncertainty propagation and bias correction. Then the novel odometer error term is formulated using the odometer preintegration model and it is tightly integrated into the visual optimization framework. Furthermore, we introduce a complete tracking framework to provide different strategies for motion tracking when (1) both measurements are available, (2) visual measurements are not available, and (3) wheel encoder experiences slippage, which leads the system to be accurate and robust. Finally, the proposed algorithm is evaluated by performing extensive experiments, the experimental results demonstrate the superiority of the proposed system. △ Less

Submitted 13 April, 2018; originally announced April 2018.

Comments: 13 pages, 31 figures

Journal ref: IEEE Access (2019)

arXiv:1707.05001 [pdf, other]

Coalition formation for Multi-agent Pursuit based on Neural Network and AGRMF Model

Authors: Zhaoyi Pei, Songhao Piao, Mohammed Ei Souidi

Abstract: An approach for coalition formation of multi-agent pursuit based on neural network and AGRMF model is proposed.This paper constructs a novel neural work called AGRMF-ANN which consists of feature extraction part and group generation part. On one hand,The convolutional layers of feature extraction part can abstract the features of agent group role membership function(AGRMF) for all of the groups,on… ▽ More An approach for coalition formation of multi-agent pursuit based on neural network and AGRMF model is proposed.This paper constructs a novel neural work called AGRMF-ANN which consists of feature extraction part and group generation part. On one hand,The convolutional layers of feature extraction part can abstract the features of agent group role membership function(AGRMF) for all of the groups,on the other hand,those features will be fed to the group generation part based on self-organizing map(SOM) layer which is used to group the pursuers with similar features in the same group. Besides, we also come up the group attractiveness function(GAF) to evaluate the quality of groups and the pursuers contribution in order to adjust the main ability indicators of AGRMF and other weight of all neural network. The simulation experiment showed that this proposal can improve the effectiveness of coalition formation for multi-agent pursuit and ability to adopt pursuit-evasion problem with the scale of pursuer team growing. △ Less

Submitted 17 July, 2017; originally announced July 2017.

arXiv:1706.03648 [pdf, other]

doi 10.1109/ACCESS.2019.2904512

Accurate Monocular Visual-inertial SLAM using a Map-assisted EKF Approach

Authors: Meixiang Quan, Songhao Piao, Minglang Tan, Shi-Sheng Huang

Abstract: This paper presents a novel tightly-coupled monocular visual-inertial Simultaneous Localization and Map** algorithm, which provides accurate and robust localization within the globally consistent map in real time on a standard CPU. This is achieved by firstly performing the visual-inertial extended kalman filter(EKF) to provide motion estimate at a high rate. However the filter becomes inconsist… ▽ More This paper presents a novel tightly-coupled monocular visual-inertial Simultaneous Localization and Map** algorithm, which provides accurate and robust localization within the globally consistent map in real time on a standard CPU. This is achieved by firstly performing the visual-inertial extended kalman filter(EKF) to provide motion estimate at a high rate. However the filter becomes inconsistent due to the well known linearization issues. So we perform a keyframe-based visual-inertial bundle adjustment to improve the consistency and accuracy of the system. In addition, a loop closure detection and correction module is also added to eliminate the accumulated drift when revisiting an area. Finally, the optimized motion estimates and map are fed back to the EKF-based visual-inertial odometry module, thus the inconsistency and estimation error of the EKF estimator are reduced. In this way, the system can continuously provide reliable motion estimates for the long-term operation. The performance of the algorithm is validated on public datasets and real-world experiments, which proves the superiority of the proposed algorithm. △ Less

Submitted 31 March, 2018; v1 submitted 12 June, 2017; originally announced June 2017.

Comments: 12 pages, 10 figures

Journal ref: IEEE Access (2019)

arXiv:1510.08485 [pdf, other]

Wireless Physical-Layer Identification: Modeling and Validation

Authors: Wenhao Wang, Zhi Sun, Kui Ren, Bocheng Zhu, Sixu Piao

Abstract: The wireless physical-layer identification (WPLI) techniques utilize the unique features of the physical waveforms of wireless signals to identify and classify authorized devices. As the inherent physical layer features are difficult to forge, WPLI is deemed as a promising technique for wireless security solutions. However, as of today it still remains unclear whether existing WPLI techniques can… ▽ More The wireless physical-layer identification (WPLI) techniques utilize the unique features of the physical waveforms of wireless signals to identify and classify authorized devices. As the inherent physical layer features are difficult to forge, WPLI is deemed as a promising technique for wireless security solutions. However, as of today it still remains unclear whether existing WPLI techniques can be applied under real-world requirements and constraints. In this paper, through both theoretical modeling and experiment validation, the reliability and differentiability of WPLI techniques are rigorously evaluated, especially under the constraints of state-of-art wireless devices, real operation environments, as well as wireless protocols and regulations. Specifically, a theoretical model is first established to systematically describe the complete procedure of WPLI. More importantly, the proposed model is then implemented to thoroughly characterize various WPLI techniques that utilize the spectrum features coming from the non-linear RF-front-end, under the influences from different transmitters, receivers, and wireless channels. Subsequently, the limitations of existing WPLI techniques are revealed and evaluated in details using both the developed theoretical model and in-lab experiments. The real-world requirements and constraints are characterized along each step in WPLI, including i) the signal processing at the transmitter (device to be identified), ii) the various physical layer features that originate from circuits, antenna, and environments, iii) the signal propagation in various wireless channels, iv) the signal reception and processing at the receiver (the identifier), and v) the fingerprint extraction and classification at the receiver. △ Less

Submitted 14 March, 2016; v1 submitted 28 October, 2015; originally announced October 2015.

Showing 1–11 of 11 results for author: Piao, S