CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot
Authors:
Artem Lykov,
Mikhail Litvinov,
Mikhail Konenkov,
Rinat Prochii,
Nikita Burtsev,
Ali Alridha Abdulkarim,
Artem Bazhenov,
Vladimir Berman,
Dzmitry Tsetserukou
Abstract:
This paper introduces CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system was realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, in…
▽ More
This paper introduces CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system was realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, independently determining the most appropriate actions and interactions with various objects to fulfill user-defined tasks. These tasks do not necessarily include direct instructions, challenging the robot to comprehend and execute them based on natural language input and environmental cues. The paper delves into the intricacies of this system, dataset characteristics, and the software architecture. Key to this development is the robot's proficiency in navigating space using Visual-SLAM, effectively manipulating and transporting objects, and providing insightful natural language commentary during task execution. Experimental results highlight the robot's advanced task comprehension and adaptability, underscoring its potential in real-world applications. The dataset used to fine-tune the robot-dog behavior generation model is provided at the following link: huggingface.co/datasets/ArtemLykov/CognitiveDog_dataset
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
NuCLS: A scalable crowdsourcing, deep learning approach and dataset for nucleus classification, localization and segmentation
Authors:
Mohamed Amgad,
Lamees A. Atteya,
Hagar Hussein,
Kareem Hosny Mohammed,
Ehab Hafiz,
Maha A. T. Elsebaie,
Ahmed M. Alhusseiny,
Mohamed Atef AlMoslemany,
Abdelmagid M. Elmatboly,
Philip A. Pappalardo,
Rokia Adel Sakr,
Pooya Mobadersany,
Ahmad Rachid,
Anas M. Saad,
Ahmad M. Alkashash,
Inas A. Ruhban,
Anas Alrefai,
Nada M. Elgazar,
Ali Abdulkarim,
Abo-Alela Farag,
Amira Etman,
Ahmed G. Elsaeed,
Yahya Alagha,
Yomna A. Amer,
Ahmed M. Raslan
, et al. (12 additional authors not shown)
Abstract:
High-resolution map** of cells and tissue structures provides a foundation for develo** interpretable machine-learning models for computational pathology. Deep learning algorithms can provide accurate map**s given large numbers of labeled instances for training and validation. Generating adequate volume of quality labels has emerged as a critical barrier in computational pathology given the…
▽ More
High-resolution map** of cells and tissue structures provides a foundation for develo** interpretable machine-learning models for computational pathology. Deep learning algorithms can provide accurate map**s given large numbers of labeled instances for training and validation. Generating adequate volume of quality labels has emerged as a critical barrier in computational pathology given the time and effort required from pathologists. In this paper we describe an approach for engaging crowds of medical students and pathologists that was used to produce a dataset of over 220,000 annotations of cell nuclei in breast cancers. We show how suggested annotations generated by a weak algorithm can improve the accuracy of annotations generated by non-experts and can yield useful data for training segmentation algorithms without laborious manual tracing. We systematically examine interrater agreement and describe modifications to the MaskRCNN model to improve cell map**. We also describe a technique we call Decision Tree Approximation of Learned Embeddings (DTALE) that leverages nucleus segmentations and morphologic features to improve the transparency of nucleus classification models. The annotation data produced in this study are freely available for algorithm development and benchmarking at: https://sites.google.com/view/nucls.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.