Search | arXiv e-print repository

Virtual Elastic Tether: a New Approach for Multi-agent Navigation in Confined Aquatic Environments

Authors: Kanzhong Yao, Xueliang Cheng, Keir Groves, Barry Lennox, Ognjen Marjanovic, Simon Watson

Abstract: Underwater navigation is a challenging area in the field of mobile robotics due to inherent constraints in self-localisation and communication in underwater environments. Some of these challenges can be mitigated by using collaborative multi-agent teams. However, when applied underwater, the robustness of traditional multi-agent collaborative control approaches is highly limited due to the unavail… ▽ More Underwater navigation is a challenging area in the field of mobile robotics due to inherent constraints in self-localisation and communication in underwater environments. Some of these challenges can be mitigated by using collaborative multi-agent teams. However, when applied underwater, the robustness of traditional multi-agent collaborative control approaches is highly limited due to the unavailability of reliable measurements. In this paper, the concept of a Virtual Elastic Tether (VET) is introduced in the context of incomplete state measurements, which represents an innovative approach to underwater navigation in confined spaces. The concept of VET is formulated and validated using the Cooperative Aquatic Vehicle Exploration System (CAVES), which is a sim-to-real multi-agent aquatic robotic platform. Within this framework, a vision-based Autonomous Underwater Vehicle-Autonomous Surface Vehicle leader-follower formulation is developed. Experiments were conducted in both simulation and on a physical platform, benchmarked against a traditional Image-Based Visual Servoing approach. Results indicate that the formation of the baseline approach fails under discrete disturbances, when induced distances between the robots exceeds 0.6 m in simulation and 0.3 m in the real world. In contrast, the VET-enhanced system recovers to pre-perturbation distances within 5 seconds. Furthermore, results illustrate the successful navigation of VET-enhanced CAVES in a confined water pond where the baseline approach fails to perform adequately. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2312.05829 [pdf, other]

EM Based p-norm-like Constraint RLS Algorithm for Sparse System Identification

Authors: Shuyang Jiang, Kung Yao

Abstract: In this paper, the recursive least squares (RLS) algorithm is considered in the sparse system identification setting. The cost function of RLS algorithm is regularized by a $p$-norm-like ($0 \leq p \leq 1$) constraint of the estimated system parameters. In order to minimize the regularized cost function, we transform it into a penalized maximum likelihood (ML) problem, which is solved by the expec… ▽ More In this paper, the recursive least squares (RLS) algorithm is considered in the sparse system identification setting. The cost function of RLS algorithm is regularized by a $p$-norm-like ($0 \leq p \leq 1$) constraint of the estimated system parameters. In order to minimize the regularized cost function, we transform it into a penalized maximum likelihood (ML) problem, which is solved by the expectation-maximization (EM) algorithm. With the introduction of a thresholding operator, the update equation of the tap-weight vector is derived. We also exploit the underlying sparsity to implement the proposed algorithm in a low computational complexity fashion. Numerical simulations demonstrate the superiority of the new algorithm over conventional sparse RLS algorithms, as well as regular RLS algorithm. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 11 pages, 3 figures, journal manuscript

arXiv:2305.12793 [pdf, other]

Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training

Authors: Jianfeng He, Julian Salazar, Kaisheng Yao, Haoqi Li, **glun Cai

Abstract: End-to-end (E2E) spoken language understanding (SLU) is constrained by the cost of collecting speech-semantics pairs, especially when label domains change. Hence, we explore \textit{zero-shot} E2E SLU, which learns E2E SLU without speech-semantics pairs, instead using only speech-text and text-semantics pairs. Previous work achieved zero-shot by pseudolabeling all speech-text transcripts with a na… ▽ More End-to-end (E2E) spoken language understanding (SLU) is constrained by the cost of collecting speech-semantics pairs, especially when label domains change. Hence, we explore \textit{zero-shot} E2E SLU, which learns E2E SLU without speech-semantics pairs, instead using only speech-text and text-semantics pairs. Previous work achieved zero-shot by pseudolabeling all speech-text transcripts with a natural language understanding (NLU) model learned on text-semantics corpora. However, this method requires the domains of speech-text and text-semantics to match, which often mismatch due to separate collections. Furthermore, using the entire collected speech-text corpus from any domains leads to \textit{imbalance} and \textit{noise} issues. To address these, we propose \textit{cross-modal selective self-training} (CMSST). CMSST tackles imbalance by clustering in a joint space of the three modalities (speech, text, and semantics) and handles label noise with a selection network. We also introduce two benchmarks for zero-shot E2E SLU, covering matched and found speech (mismatched) settings. Experiments show that CMSST improves performance in both two settings, with significantly reduced sample sizes and training time. Our code and data are released in https://github.com/amazon-science/zero-shot-E2E-slu. △ Less

Submitted 2 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 18 pages, 7 figures

arXiv:2303.01868 [pdf, other]

doi 10.1109/TRO.2023.3253249

Exploiting Kinematic Redundancy for Robotic Gras** of Multiple Objects

Authors: Kunpeng Yao, Aude Billard

Abstract: Humans coordinate the abundant degrees of freedom (DoFs) of hands to dexterously perform tasks in everyday life. We imitate human strategies to advance the dexterity of multi-DoF robotic hands. Specifically, we enable a robot hand to grasp multiple objects by exploiting its kinematic redundancy, referring to all its controllable DoFs. We propose a human-like grasp synthesis algorithm to generate g… ▽ More Humans coordinate the abundant degrees of freedom (DoFs) of hands to dexterously perform tasks in everyday life. We imitate human strategies to advance the dexterity of multi-DoF robotic hands. Specifically, we enable a robot hand to grasp multiple objects by exploiting its kinematic redundancy, referring to all its controllable DoFs. We propose a human-like grasp synthesis algorithm to generate grasps using pairwise contacts on arbitrary opposing hand surface regions, no longer limited to fingertips or hand inner surface. To model the available space of the hand for grasp, we construct a reachability map, consisting of reachable spaces of all finger phalanges and the palm. It guides the formulation of a constrained optimization problem, solving for feasible and stable grasps. We formulate an iterative process to empower robotic hands to grasp multiple objects in sequence. Moreover, we propose a kinematic efficiency metric and an associated strategy to facilitate exploiting kinematic redundancy. We validated our approaches by generating grasps of single and multiple objects using various hand surface regions. Such grasps can be successfully replicated on a real robotic hand. △ Less

Submitted 30 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2201.02831 [pdf, other]

doi 10.1016/j.media.2022.102628

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation

Authors: Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen , et al. (15 additional authors not shown)

Abstract: Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality… ▽ More Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice - VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice - VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image. △ Less

Submitted 14 December, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

Comments: In Medical Image Analysis

arXiv:2111.01557 [pdf, other]

PointNu-Net: Keypoint-assisted Convolutional Neural Network for Simultaneous Multi-tissue Histology Nuclei Segmentation and Classification

Authors: Kai Yao, Kaizhu Huang, Jie Sun, Amir Hussain

Abstract: Automatic nuclei segmentation and classification play a vital role in digital pathology. However, previous works are mostly built on data with limited diversity and small sizes, making the results questionable or misleading in actual downstream tasks. In this paper, we aim to build a reliable and robust method capable of dealing with data from the 'the clinical wild'. Specifically, we study and de… ▽ More Automatic nuclei segmentation and classification play a vital role in digital pathology. However, previous works are mostly built on data with limited diversity and small sizes, making the results questionable or misleading in actual downstream tasks. In this paper, we aim to build a reliable and robust method capable of dealing with data from the 'the clinical wild'. Specifically, we study and design a new method to simultaneously detect, segment, and classify nuclei from Haematoxylin and Eosin (H&E) stained histopathology data, and evaluate our approach using the recent largest dataset: PanNuke. We address the detection and classification of each nuclei as a novel semantic keypoint estimation problem to determine the center point of each nuclei. Next, the corresponding class-agnostic masks for nuclei center points are obtained using dynamic instance segmentation. Meanwhile, we proposed a novel Joint Pyramid Fusion Module (JPFM) to model the cross-scale dependencies, thus enhancing the local feature for better nuclei detection and classification. By decoupling two simultaneous challenging tasks and taking advantage of JPFM, our method can benefit from class-aware detection and class-agnostic segmentation, thus leading to a significant performance boost. We demonstrate the superior performance of our proposed approach for nuclei segmentation and classification across 19 different tissue types, delivering new benchmark results. △ Less

Submitted 30 May, 2023; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: 12 pages,7 figures, journal

arXiv:2107.11022 [pdf, other]

AD-GAN: End-to-end Unsupervised Nuclei Segmentation with Aligned Disentangling Training

Authors: Kai Yao, Kaizhu Huang, Jie Sun, Curran Jude

Abstract: We consider unsupervised cell nuclei segmentation in this paper. Exploiting the recently-proposed unpaired image-to-image translation between cell nuclei images and randomly synthetic masks, existing approaches, e.g., CycleGAN, have achieved encouraging results. However, these methods usually take a two-stage pipeline and fail to learn end-to-end in cell nuclei images. More seriously, they could l… ▽ More We consider unsupervised cell nuclei segmentation in this paper. Exploiting the recently-proposed unpaired image-to-image translation between cell nuclei images and randomly synthetic masks, existing approaches, e.g., CycleGAN, have achieved encouraging results. However, these methods usually take a two-stage pipeline and fail to learn end-to-end in cell nuclei images. More seriously, they could lead to the lossy transformation problem, i.e., the content inconsistency between the original images and the corresponding segmentation output. To address these limitations, we propose a novel end-to-end unsupervised framework called Aligned Disentangling Generative Adversarial Network (AD-GAN). Distinctively, AD-GAN introduces representation disentanglement to separate content representation (the underling spatial structure) from style representation (the rendering of the structure). With this framework, spatial structure can be preserved explicitly, enabling a significant reduction of macro-level lossy transformation. We also propose a novel training algorithm able to align the disentangled content in the latent space to reduce micro-level lossy transformation. Evaluations on real-world 2D and 3D datasets show that AD-GAN substantially outperforms the other comparison methods and the professional software both quantitatively and qualitatively. Specifically, the proposed AD-GAN leads to significant improvement over the current best unsupervised methods by an average 17.8% relatively (w.r.t. the metric DICE) on four cell nuclei datasets. As an unsupervised method, AD-GAN even performs competitive with the best supervised models, taking a further leap towards end-to-end unsupervised nuclei segmentation. △ Less

Submitted 10 March, 2022; v1 submitted 23 July, 2021; originally announced July 2021.

arXiv:2107.04856 [pdf]

Graphene-based Distributed 3D Sensing Electrodes for Map** Spatiotemporal Auricular Physiological Signals

Authors: Q. Huang, C. Wu, S. Hou, H. Sun, K. Yao, J. Law, M. Yang, A. L. R. Vellaisamy, X. Yu, H. Y. Chan, L. Lao, Y. Sun, W. J. Li

Abstract: Underneath the ear skin there are richly branching vascular and neural networks that ultimately connecting to our heart and brain. Hence, the three-dimensional (3D) map** of auricular electrophysiological signals could provide a new perspective for biomedical studies such as diagnosis of cardiovascular diseases and neurological disorders. However, it is still extremely challenging for current se… ▽ More Underneath the ear skin there are richly branching vascular and neural networks that ultimately connecting to our heart and brain. Hence, the three-dimensional (3D) map** of auricular electrophysiological signals could provide a new perspective for biomedical studies such as diagnosis of cardiovascular diseases and neurological disorders. However, it is still extremely challenging for current sensing techniques to cover the entire ultra-curved auricle. Here, we report a graphene-based ear-conformable sensing device with embedded and distributed 3D electrodes which enable full-auricle physiological monitoring. The sensing device, which incorporates programable 3D electrode thread array and personalized auricular mold, has 3D-conformable sensing interfaces with curved auricular skin, and was developed using one-step multi-material 3D-printing process. As a proof-of-concept, spatiotemporal auricular electrical skin resistance (AESR) map** was demonstrated. For the first time, 3D AESR contours were generated and human subject-specific AESR distributions among a population were observed. From the data of 17 volunteers, the auricular region-specific AESR changes after cycling exercise were observed in 98% of the tests and were validated via machine learning techniques. Correlations of AESR with heart rate and blood pressure were also studied using statistical analysis. This 3D electronic platform and AESR-based new biometrical findings show promising biomedical applications. △ Less

Submitted 10 July, 2021; originally announced July 2021.

Comments: 23 pages, 6 figures

arXiv:1510.08983 [pdf, other]

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

Authors: Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass

Abstract: In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers. These direct links, called highway connections, enable unimpeded information flow across different layers and thus alleviate the gradient vanishing problem when building deeper LSTMs. We further introduce the latency-controlled… ▽ More In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers. These direct links, called highway connections, enable unimpeded information flow across different layers and thus alleviate the gradient vanishing problem when building deeper LSTMs. We further introduce the latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole history while kee** the latency under control. Efficient algorithms are proposed to train these novel networks using both frame and sequence discriminative criteria. Experiments on the AMI distant speech recognition (DSR) task indicate that we can train deeper LSTMs and achieve better improvement from sequence training with highway LSTMs (HLSTMs). Our novel model obtains $43.9/47.7\%$ WER on AMI (SDM) dev and eval sets, outperforming all previous works. It beats the strong DNN and DLSTM baselines with $15.7\%$ and $5.3\%$ relative improvement respectively. △ Less

Submitted 11 January, 2016; v1 submitted 30 October, 2015; originally announced October 2015.

Showing 1–9 of 9 results for author: Yao, K